106
Torturing Databases for Fun and Profit The Ohio State University HP Labs Mai Zheng , Joseph Tucek , Dachuan Huang , Feng Qin , Mark Lillibridge Elizabeth S Yang , Bill W Zhao , Shashank Singh

Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

Torturing Databases for Fun and Profit

† The Ohio State University ‡HP Labs

Mai Zheng†, Joseph Tucek‡, Dachuan Huang†, Feng Qin†, Mark Lillibridge‡ Elizabeth S Yang‡, Bill W Zhao‡, Shashank Singh†

Page 2: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

2

Page 3: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

3

database

Page 4: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

4

• ACID: atomicity, consistency, isolation, and durability - even under failures

Page 5: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

5

List of databases survived

Page 6: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

6

Database

File System

Workload W-1 W-2 W-3 W-4.1 W-4.2 W-4.3

TokyoCabinet ext3 D D D ACD ACD ACD XFS -- D D ACD D ACD

MariaDB ext3 D D D D D D XFS D D D D D D

LightningDB ext3 -- -- -- -- -- D XFS -- -- -- -- -- --

SQLite ext3 D D -- D D D XFS -- -- D D D D

KVS-A ext3 -- -- Hang -- -- -- XFS -- -- -- -- -- --

SQL-A ext3 D D D D D D XFS D D D D D D

SQL-B ext3 D D CD CD CD CD XFS CD D CD CD CD CD

SQL-C NTFS D D D D D D

Everything is broken under simulated power faults

Page 7: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

7

Power faults cannot happen nowadays, right?

Page 8: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

8

2013:“... POWER OUTAGE during Super Bowl ... because a RELAY DEVICE MALFUNCTIONED.”

Page 9: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

9

2012:“POWER OUTAGE Hits London Data Center ...”

2012:“... HUNAM ERROR was responsible for a data center POWER OUTAGE ...”

2012:“Amazon Data Center LOSES POWER During STORM …”

2011:“Colocation provider Colo4 experienced a POWER OUTAGE …”

2010:“CAR CRASH Triggers Amazon POWER OUTAGE …”

2010:“About 3,000 servers at Montreal web host iWeb experienced an OUTAGE …”

2013:“... POWER OUTAGE during Super Bowl ... because a RELAY DEVICE MALFUNCTIONED.”

2013:“POWER OUTAGE knocks DreamHost customers offline ...”

2013:“ A data center POWER OUTAGE is being blamed for ...Visa downtime ...”

2013:“ A POWER OUTAGE at a key New Jersey data center ...”

2014:“Internap Data Center OUTAGE Takes Down Livestream, StackExchange”

2014:“Data Center FIRE Leads to OUTAGE ...”

2014:“... ELECTRICAL FIRE took down ... primary data center... ALL POWER was OFF ...”

Page 10: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

Database Torture 101

Page 11: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

11

database

on-the-fly I/O blocks

minimum atomic transfer unit

(e.g., 512B/4KB)

blocks transferred to durable media

Fault Model: Clean termination of I/O stream

Page 12: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

12

database

on-the-fly I/O blocks

blocks transferred to durable media

a fault happens

blocks after the fault have no effect

blocks before the fault are NOT corrupted/ dropped/reordered

Fault Model: Clean termination of I/O stream

Page 13: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

13

• Unreasonable to require databases to handle arbitrary bad behavior introduced in the lower layers

• Simulated bad behavior w/o verification by real failures may be unrealistic

- I/O path in kernel & device puts constraints on failure states

Why not introduce corruption/dropping/reordering?

Page 14: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

14

How do we test DBs on multiple OSes w/ high fidelity?

database

No disturbance on thread scheduling No disturbance on interactions among DB, memory manager, FS, volume manager, I/O scheduler, …

Page 15: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

15

iSCSI iSCSI initiator target

decouple via iSCSI

SCSI commands

over network

torturing framework

database

How do we test DBs on multiple OSes w/ high fidelity?

Page 16: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

16

iSCSI iSCSI initiator target

decouple via iSCSI

SCSI commands

over network

torturing framework

database

How do we test DBs on multiple OSes w/ high fidelity?

Page 17: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

17

Record & Replayer

Worker & Checker

SCSI cmds

Framework Overview

database

Page 18: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

18

Record & Replayer

Worker & Checker

SCSI cmds

database

Framework Overview

Page 19: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

19

DB table

key value

Workload Example

meta rows

work rows

THR-1-TXN-1 v-init-THR-1-TXN-1

THR-1-TXN-2 v-init-THR-1-TXN-2

THR-2-TXN-1 v-init-THR-2-TXN-1

THR-2-TXN-2 v-init-THR-2-TXN-2

k-1 v-init-1

k-2 v-init-2

k-3 v-init-3

k-4 v-init-4

k-5 v-init-5

k-6 v-init-6

k-7 v-init-7

k-8 v-init-8

Page 20: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

20

DB table

meta rows

work rows

Workload Example

two parts

2 threads, 2 transactions

per thread

THR-1-TXN-1 v-init-THR-1-TXN-1

THR-1-TXN-2 v-init-THR-1-TXN-2

THR-2-TXN-1 v-init-THR-2-TXN-1

THR-2-TXN-2 v-init-THR-2-TXN-2

k-1 v-init-1

k-2 v-init-2

k-3 v-init-3

k-4 v-init-4

k-5 v-init-5

k-6 v-init-6

k-7 v-init-7

k-8 v-init-8

key value

Page 21: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

21

THR-1-TXN-1 v-init-THR-1-TXN-1

THR-1-TXN-2 v-init-THR-1-TXN-2

THR-2-TXN-1 v-init-THR-2-TXN-1

THR-2-TXN-2 v-init-THR-2-TXN-2

k-1 v-init-1

k-2 v-init-2

k-3 v-init-3

k-4 v-init-4

k-5 v-init-5

k-6 v-init-6

k-7 v-init-7

k-8 v-init-8

DB table

Has known initial state

meta rows

work rows

key value

Page 22: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

22

DB table

meta rows

work rows

Each transaction updates N random work rows + 1 meta row

THR-1-TXN-1 v-init-THR-1-TXN-1

THR-1-TXN-2 v-init-THR-1-TXN-2

THR-2-TXN-1 v-init-THR-2-TXN-1

THR-2-TXN-2 v-init-THR-2-TXN-2

k-1 v-init-1

k-2 v-init-2

k-3 v-init-3

k-4 v-init-4

k-5 v-init-5

k-6 v-init-6

k-7 v-init-7

k-8 v-init-8

key value

Page 23: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

23

THR-1-TXN-1 v-init-THR-1-TXN-1

THR-1-TXN-2 v-init-THR-1-TXN-2

THR-2-TXN-1 v-init-THR-2-TXN-1

THR-2-TXN-2 v-init-THR-2-TXN-2

k-1 v-init-1

k-2 v-init-2

k-3 v-init-3

k-4 v-init-4

k-5 v-init-5

k-6 v-init-6

k-7 v-init-7

k-8 v-init-8

DB table

meta rows

work rows

Each transaction updates N random work rows + 1 meta row

key value

THR-1-TXN-1

Page 24: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

24

THR-1-TXN-1 v-init-THR-1-TXN-1

THR-1-TXN-2 v-init-THR-1-TXN-2

THR-2-TXN-1 v-init-THR-2-TXN-1

THR-2-TXN-2 v-init-THR-2-TXN-2

k-1 v-init-1

k-2 v-THR-1-TXN-1

k-3 v-init-3

k-4 v-init-4

k-5 v-THR-1-TXN-1

k-6 v-init-6

k-7 v-init-7

k-8 v-init-8

DB table

meta rows

work rows

Each transaction updates N random work rows + 1 meta row

key value

THR-1-TXN-1

save transaction ID

Page 25: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

25

THR-1-TXN-1 k-2-k-5-TS-00:01

THR-1-TXN-2 v-init-THR-1-TXN-2

THR-2-TXN-1 v-init-THR-2-TXN-1

THR-2-TXN-2 v-init-THR-2-TXN-2

k-1 v-init-1

k-2 v-THR-1-TXN-1

k-3 v-init-3

k-4 v-init-4

k-5 v-THR-1-TXN-1

k-6 v-init-6

k-7 v-init-7

k-8 v-init-8

DB table

meta rows

work rows

Each transaction updates N random work rows + 1 meta row

key value

THR-1-TXN-1

save transaction ID

save work-row keys & timestamp right before commit

Page 26: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

26

THR-1-TXN-1 k-2-k-5-TS-00:01

THR-1-TXN-2 v-init-THR-1-TXN-2

THR-2-TXN-1 v-init-THR-2-TXN-1

THR-2-TXN-2 v-init-THR-2-TXN-2

k-1 v-init-1

k-2 v-THR-1-TXN-1

k-3 v-init-3

k-4 v-init-4

k-5 v-THR-1-TXN-1

k-6 v-init-6

k-7 v-init-7

k-8 v-init-8

DB table

meta rows

work rows

key value

THR-1-TXN-1

Fully exercise concurrency control

Page 27: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

27

THR-1-TXN-1 k-2-k-5-TS-00:01

THR-1-TXN-2 v-init-THR-1-TXN-2

THR-2-TXN-1 k-7-k-6-TS-00:03

THR-2-TXN-2 v-init-THR-2-TXN-2

k-1 v-init-1

k-2 v-THR-1-TXN-1

k-3 v-init-3

k-4 v-init-4

k-5 v-THR-1-TXN-1

k-6 v-THR-2-TXN-1

k-7 v-THR-2-TXN-1

k-8 v-init-8

DB table

Fully exercise concurrency control

meta rows

work rows

key value

THR-2-TXN-1

Page 28: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

28

THR-1-TXN-1 k-2-k-5-TS-00:01

THR-1-TXN-2 k-6-k-8-TS-00:13

THR-2-TXN-1 k-7-k-6-TS-00:03

THR-2-TXN-2 v-init-THR-2-TXN-2

k-1 v-init-1

k-2 v-THR-1-TXN-1

k-3 v-init-3

k-4 v-init-4

k-5 v-THR-1-TXN-1

k-6 v-THR-1-TXN-2

k-7 v-THR-2-TXN-1

k-8 v-THR-1-TXN-2

DB table

meta rows

work rows

Fully exercise concurrency control

key value

THR-1-TXN-2

Page 29: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

29

THR-1-TXN-1 k-2-k-5-TS-00:01

THR-1-TXN-2 k-6-k-8-TS-00:13

THR-2-TXN-1 k-7-k-6-TS-00:03

THR-2-TXN-2 k-3-k-7-TS-00:14

k-1 v-init-1

k-2 v-THR-1-TXN-1

k-3 v-THR-2-TXN-2

k-4 v-init-4

k-5 v-THR-1-TXN-1

k-6 v-THR-1-TXN-2

k-7 v-THR-2-TXN-2

k-8 v-THR-1-TXN-2

DB table

meta rows

work rows

Fully exercise concurrency control

key value

THR-2-TXN-2

Page 30: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

30

Page 31: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

31

A power fault just happened during our workload ...

Page 32: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

32

meta rows

work rows

Is there any ACID violation after recovery?

recovered DB table

key value THR-1-TXN-1 k-2-k-5-TS-00:01

THR-1-TXN-2 k-6-k-8-TS-00:13

THR-2-TXN-1 k-7-k-6-TS-00:03

THR-2-TXN-2 v-init-THR-2-TXN-2

k-1 v-init-1

k-2 v-THR-1-TXN-1

k-3 v-THR-2-TXN-2

k-4 v-init-4

k-5 v-THR-1-TXN-1

k-6 v-THR-1-TXN-2

k-7 v-THR-2-TXN-2

k-8 v-THR-1-TXN-2

Page 33: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

33

meta rows

work rows

Is there any ACID violation after recovery?

recovered DB table

key value THR-1-TXN-1 k-2-k-5-TS-00:01

THR-1-TXN-2 k-6-k-8-TS-00:13

THR-2-TXN-1 k-7-k-6-TS-00:03

THR-2-TXN-2 v-init-THR-2-TXN-2

k-1 v-init-1

k-2 v-THR-1-TXN-1

k-3 v-THR-2-TXN-2

k-4 v-init-4

k-5 v-THR-1-TXN-1

k-6 v-THR-1-TXN-2

k-7 v-THR-2-TXN-2

k-8 v-THR-1-TXN-2

Page 34: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

34

meta rows

work rows

Is there any ACID violation after recovery?

recovered DB table

key value THR-1-TXN-1 k-2-k-5-TS-00:01

THR-1-TXN-2 k-6-k-8-TS-00:13

THR-2-TXN-1 k-7-k-6-TS-00:03

THR-2-TXN-2 v-init-THR-2-TXN-2

k-1 v-init-1

k-2 v-THR-1-TXN-1

k-3 v-THR-2-TXN-2

k-4 v-init-4

k-5 v-THR-1-TXN-1

k-6 v-THR-1-TXN-2

k-7 v-THR-2-TXN-2

k-8 v-THR-1-TXN-2

Page 35: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

35

meta rows

work rows

Is there any ACID violation after recovery?

recovered DB table

key value THR-1-TXN-1 k-2-k-5-TS-00:01

THR-1-TXN-2 k-6-k-8-TS-00:13

THR-2-TXN-1 k-7-k-6-TS-00:03

THR-2-TXN-2 v-init-THR-2-TXN-2

k-1 v-init-1

k-2 v-THR-1-TXN-1

k-3 v-THR-2-TXN-2

k-4 v-init-4

k-5 v-THR-1-TXN-1

k-6 v-THR-1-TXN-2

k-7 v-THR-2-TXN-2

k-8 v-THR-1-TXN-2

Page 36: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

36

meta rows

work rows

Is there any ACID violation after recovery?

recovered DB table

key value THR-1-TXN-1 k-2-k-5-TS-00:01

THR-1-TXN-2 k-6-k-8-TS-00:13

THR-2-TXN-1 k-7-k-6-TS-00:03

THR-2-TXN-2 v-init-THR-2-TXN-2

k-1 v-init-1

k-2 v-THR-1-TXN-1

k-3 v-THR-2-TXN-2

k-4 v-init-4

k-5 v-THR-1-TXN-1

k-6 v-THR-1-TXN-2

k-7 v-THR-2-TXN-2

k-8 v-THR-1-TXN-2

Atomicity violation! should have been updated w/ work rows

Page 37: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

37

meta rows

work rows

Is there any ACID violation after recovery?

recovered DB table

key value THR-1-TXN-1 k-2-k-5-TS-00:01

THR-1-TXN-2 k-6-k-8-TS-00:13

THR-2-TXN-1 k-7-k-6-TS-00:03

THR-2-TXN-2 v-init-THR-2-TXN-2

k-1 v-init-1

k-2 v-THR-1-TXN-1

k-3 v-THR-2-TXN-2

k-4 v-init-4

k-5 v-THR-1-TXN-1

k-6 v-THR-1-TXN-2

k-7 v-THR-2-TXN-2

k-8 v-THR-1-TXN-2

allow checking time & order related properties

Page 38: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

38

THR-1-TXN-1 k-2-k-5-TS-00:01

THR-1-TXN-2 k-6-k-8-TS-00:13

THR-2-TXN-1 k-7-k-6-TS-00:03

THR-2-TXN-2 v-init-THR-2-TXN-2

k-1 v-init-1

k-2 v-THR-1-TXN-1

k-3 v-THR-2-TXN-2

k-4 v-init-4

k-5 v-THR-1-TXN-1

k-6 v-THR-1-TXN-2

k-7 v-THR-2-TXN-2

k-8 v-THR-1-TXN-2

meta rows

work rows

key value

More workloads & ACID checking in the paper

Is there any ACID violation after recovery?

recovered DB table

Page 39: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

39

Record & Replayer

Worker & Checker

SCSI cmds

database

Framework Overview

Page 40: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

40

Record & Replayer

Worker & Checker

SCSI cmds

database

Framework Overview

Page 41: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

41

file system

target daemon

backing store

Worker

SCSI cmds

Capturing I/O trace without kernel modification

Page 42: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

42

file system

target daemon

SCSI Tracer

1

2

3

4

backing store

Worker

SCSI cmds

Capturing I/O trace without kernel modification

Worker’s block trace

minimum atomic block-transfer operations (mini ops)

Page 43: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

43

file system

target daemon

SCSI Tracer

Replayer

backing store clean image

1

2

3

4

SCSI cmds

Constructing a post-fault disk image

Worker’s block trace

Page 44: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

44

file system

target daemon

SCSI Tracer

Replayer

1

failure image backing store

1

2

3

4

SCSI cmds

fault point

Constructing a post-fault disk image

Worker’s block trace

Page 45: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

45

file system

target daemon

SCSI Tracer

Replayer

failure image

1

1

2

3

4

SCSI cmds

fault point

Checking the post-fault DB

Worker’s block trace Checker

Page 46: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

46

file system

Checker

target daemon

SCSI Tracer

Replayer

failure image

1

1

2

3

4

SCSI cmds

fault point

auto- recovery

fsck

Checking the post-fault DB

check log

Worker’s block trace

Page 47: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

47

file system

Checker

target daemon

SCSI Tracer

Replayer

clean image backing store

1

2

3

4

SCSI cmds

fault point Worker’s

block trace

Testing different fault points easily

Page 48: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

48

file system

Checker

target daemon

SCSI Tracer

Replayer

1 2 failure image backing store

1

2

3

4

SCSI cmds

fault point Worker’s

block trace

Testing different fault points easily

Page 49: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

49

file system

Checker

target daemon

SCSI Tracer

Replayer

1 2 3

failure image backing store

1

2

3

4

SCSI cmds

fault point

Worker’s block trace

Testing different fault points easily

Page 50: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

50

...

The framework is not good enough

• Sometimes need several days - too many mini operations, too many

potential fault points

...

Page 51: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

51

The framework is not good enough

• Sometimes need several days - too many mini operations, too many

potential fault points

• We tried sampling - but only a few fault points trigger ACID

violations

... ...

Page 52: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

52

The framework is not good enough

• Sometimes need several days - too many mini operations, too many

potential fault points

• We tried sampling - but only a few fault points trigger ACID

violations

• Don’t know why

... ...

Page 53: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

Enhanced Design

Page 54: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

54

Record & Replayer

Worker & Checker

SCSI cmds

database

Multi-layer Tracer

Framework Overview

Page 55: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

55

Record & Replayer

Worker & Checker

SCSI cmds

database

Multi-layer Tracer

Framework Overview

Page 56: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

56

file system

target daemon

SCSI Tracer

backing store

Worker

SCSI

op# content LBA

1 0a080101 ... 1012 2 0a080001 ... 6541 3 98393bc0 ... 9598 4 00000100 ... 9602

Worker’s block trace

Original trace provides little semantics

Page 57: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

57

file system

target daemon

SCSI Tracer

backing store

Worker

SCSI

op# content LBA timestamp SCSI cmd# file syscall

1 0a080101 ... 1012 139...013065 1 x.db msync(x.db)

2 0a080001 ... 6541 139...210438 2 x.log fsync(x.log)

3 98393bc0 ... 9598 139...355253 3 fs-j fsync(x.log)

4 00000100 ... 9602 139...506097 3 fs-j fsync(x.log)

Enhancing w/ more context

Worker’s multi-layer trace

multi-layer tracer

Page 58: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

58 ...

Checking result Worker’s multi-layer trace

What makes some fault points special?

op# content LBA ts cmd# file syscall

… ... … … … … ...

… … … … … … …

… ... … … … … ...

… ... ... … … … ...

… … … … … … …

… … … … … … … … … … … … … …

… … … … … … … … … … … … … … … … … … … … … … … … … … … …

...

anything special?

Page 59: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

59

• MMAPp : unintended update to mmap’ed blocks

op# LBA file syscall ... ... ... ...

463 1012 x.db fsync(x.log) ... ... ... ... ... ... ... ... ... ... ... ...

564 1012 x.db msync(x.db) ... ... ... ...

... ... ... ...

5 patterns found from 2 databases

...

Page 60: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

60

• MMAPp : unintended update to mmap’ed blocks

op# LBA file syscall ... ... ... ...

463 1012 x.db fsync(x.log) ... ... ... ... ... ... ... ... ... ... ... ...

564 1012 x.db msync(x.db) ... ... ... ...

... ... ... ...

5 patterns found from 2 databases

...

Page 61: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

61

• MMAPp : unintended update to mmap’ed blocks

op# LBA file syscall ... ... ... ...

463 1012 x.db fsync(x.log) ... ... ... ... ... ... ... ... ... ... ... ...

564 1012 x.db msync(x.db) ... ... ... ...

... ... ... ...

5 patterns found from 2 databases

...

intended

Page 62: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

62

• MMAPp : unintended update to mmap’ed blocks

op# LBA file syscall ... ... ... ...

463 1012 x.db fsync(x.log) ... ... ... ... ... ... ... ... ... ... ... ...

564 1012 x.db msync(x.db) ... ... ... ...

... ... ... ...

5 patterns found from 2 databases

...

intended

Page 63: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

63

• MMAPp : unintended update to mmap’ed blocks

op# LBA file syscall ... ... ... ...

463 1012 x.db fsync(x.log) ... ... ... ... ... ... ... ... ... ... ... ...

564 1012 x.db msync(x.db) ... ... ... ...

... ... ... ...

5 patterns found from 2 databases

...

implicit flush of dirty blocks by kernel or FS under heavy transactions

unintended

intended

Page 64: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

64

• MMAPp : unintended update to mmap’ed blocks

op# LBA file syscall ... ... ... ...

463 1012 x.db fsync(x.log) ... ... ... ... ... ... ... ... ... ... ... ...

564 1012 x.db msync(x.db) ... ... ... ...

... ... ... ...

5 patterns found from 2 databases

...

unintended

intended

special!

Page 65: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

65

• MMAPp : unintended update to mmap’ed blocks

op# LBA file syscall ... ... ... ...

463 1012 x.db fsync(x.log) ... ... ... ... ... ... ... ... ... ... ... ...

564 1012 x.db msync(x.db) ... ... ... ...

... ... ... ...

5 patterns found from 2 databases

...

• Four more patterns: REPp , JUMPp , HEADp , TRANp

unintended

intended

special!

Page 66: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

66

file system

target daemon

SCSI Tracer

Replayer

Worker’s blk trace

backing store

Checker

check log

Fault Injection

Policy SCSI

Add fault injection policy to determine where to inject faults

Page 67: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

67

5

4

3

2

6

1

op# LBA cmd# file syscall

1 348 1 x.db msyc(x.db)

2 352 2 x.log fsync(x.log)

3 356 2 x.log fsync(x.log)

4 360 2 x.log fsync(x.log)

5 364 2 x.log fsync(x.log)

6 370 3 x.log fsync(x.log)

7 348 4 x.db fsync(x.log) 8 906 5 fs-j fsync(x.log)

MMAPp REPp JUMPp HEADp TRANp total score

Alternative to Exhaustive: Pattern-based Ranking

7

8

Worker’s multi-layer trace Scoreboard

Page 68: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

68

5

4

3

2

6

1

op# LBA cmd# file syscall

1 348 1 x.db msyc(x.db)

2 352 2 x.log fsync(x.log)

3 356 2 x.log fsync(x.log)

4 360 2 x.log fsync(x.log)

5 364 2 x.log fsync(x.log)

6 370 3 x.log fsync(x.log)

7 348 4 x.db fsync(x.log) 8 906 5 fs-j fsync(x.log)

MMAPp REPp JUMPp HEADp TRANp total score

0

0

0

0

0

0

1

1

Alternative to Exhaustive: Pattern-based Ranking

7

8

Worker’s multi-layer trace Scoreboard

Page 69: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

69

5

4

3

2

6

1

op# LBA cmd# file syscall

1 348 1 x.db msyc(x.db)

2 352 2 x.log fsync(x.log)

3 356 2 x.log fsync(x.log)

4 360 2 x.log fsync(x.log)

5 364 2 x.log fsync(x.log)

6 370 3 x.log fsync(x.log)

7 348 4 x.db fsync(x.log) 8 906 5 fs-j fsync(x.log)

MMAPp REPp JUMPp HEADp TRANp total score

0 1

0 0

0 0

0 0

0 0

0 0 1 1 1 0

Alternative to Exhaustive: Pattern-based Ranking

7

8

Worker’s multi-layer trace Scoreboard

Page 70: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

70

5

4

3

2

6

1

op# LBA cmd# file syscall

1 348 1 x.db msyc(x.db)

2 352 2 x.log fsync(x.log)

3 356 2 x.log fsync(x.log)

4 360 2 x.log fsync(x.log)

5 364 2 x.log fsync(x.log)

6 370 3 x.log fsync(x.log)

7 348 4 x.db fsync(x.log) 8 906 5 fs-j fsync(x.log)

Alternative to Exhaustive: Pattern-based Ranking

7

8

MMAPp REPp JUMPp HEADp TRANp total score

0 1 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 1 1 1 1 1 0 1

Worker’s multi-layer trace Scoreboard

Page 71: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

71

5

4

3

2

6

1

op# LBA cmd# file syscall

1 348 1 x.db msyc(x.db)

2 352 2 x.log fsync(x.log)

3 356 2 x.log fsync(x.log)

4 360 2 x.log fsync(x.log)

5 364 2 x.log fsync(x.log)

6 370 3 x.log fsync(x.log)

7 348 4 x.db fsync(x.log) 8 906 5 fs-j fsync(x.log)

Alternative to Exhaustive: Pattern-based Ranking

7

8

MMAPp REPp JUMPp HEADp TRANp total score

0 1 0 0

0 0 0 1

0 0 0 0

0 0 0 0

0 0 0 0 0 0 1 0 1 1 1 0 1 0 1 0

Worker’s multi-layer trace Scoreboard

Page 72: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

72

5

4

3

2

6

1

op# LBA cmd# file syscall

1 348 1 x.db msyc(x.db)

2 352 2 x.log fsync(x.log)

3 356 2 x.log fsync(x.log)

4 360 2 x.log fsync(x.log)

5 364 2 x.log fsync(x.log)

6 370 3 x.log fsync(x.log)

7 348 4 x.db fsync(x.log) 8 906 5 fs-j fsync(x.log)

Alternative to Exhaustive: Pattern-based Ranking

7

8

MMAPp REPp JUMPp HEADp TRANp total score

0 1 0 0 1

0 0 0 1 1

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0 0 0 1 0 1 1 1 1 0 1 1 0 1 0 1

Worker’s multi-layer trace Scoreboard

Page 73: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

73

5

4

3

2

6

1

op# LBA cmd# file syscall

1 348 1 x.db msyc(x.db)

2 352 2 x.log fsync(x.log)

3 356 2 x.log fsync(x.log)

4 360 2 x.log fsync(x.log)

5 364 2 x.log fsync(x.log)

6 370 3 x.log fsync(x.log)

7 348 4 x.db fsync(x.log) 8 906 5 fs-j fsync(x.log)

Alternative to Exhaustive: Pattern-based Ranking

7

8

MMAPp REPp JUMPp HEADp TRANp total score

0 1 0 0 1 2

0 0 0 1 1 2

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0 0 0 1 0 1 2 1 1 1 0 1 4 1 0 1 0 1 3

Worker’s multi-layer trace Scoreboard

Page 74: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

74

5

4

3

2

6

1

op# LBA cmd# file syscall

1 348 1 x.db msyc(x.db)

2 352 2 x.log fsync(x.log)

3 356 2 x.log fsync(x.log)

4 360 2 x.log fsync(x.log)

5 364 2 x.log fsync(x.log)

6 370 3 x.log fsync(x.log)

7 348 4 x.db fsync(x.log) 8 906 5 fs-j fsync(x.log)

Alternative to Exhaustive: Pattern-based Ranking

1st-rank: 2nd-rank: 3rd-rank: 4th -rank:

7

1 6 8

2

7

8

MMAPp REPp JUMPp HEADp TRANp total score

0 1 0 0 1 2

0 0 0 1 1 2

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0 0 0 1 0 1 2 1 1 1 0 1 4 1 0 1 0 1 3

3 5 4

predicted most error-prone

Worker’s multi-layer trace Scoreboard

Page 75: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

75

Alternative to Exhaustive: Pattern-based Ranking

1st-rank: 2nd-rank: 3rd-rank: 4th -rank:

7

1 6 8

2 3 5 4

5

4

3

2

6

1

7

8

predicted most error-prone

Page 76: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

76

Alternative to Exhaustive: Pattern-based Ranking

1st-rank: 2nd-rank: 3rd-rank: 4th -rank:

7

1 6 8

2

predicted most error-prone

3 5 4

5

4

3

2

6

1

7

8

Page 77: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

Diagnosis Support

Page 78: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

78

file system

SCSI

target daemon

SCSI Tracer

backing store

Worker

Worker’s multi-layer trace helps understand what happened at fault time

Worker’s multi-layer trace:

op#,

content, LBA,

timestamp, SCSI cmd#,

file, syscall,

multi-layer tracer

Page 79: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

79

file system

SCSI

target daemon

SCSI Tracer

backing store

Worker

multi-layer tracer

Worker’s multi-layer trace:

op#,

content, LBA,

timestamp, SCSI cmd#,

file, syscall,

function call

Add function-call tracing to disclose more semantics for diagnosis

Page 80: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

80

Replayer

1

2

3

4

Checker

check log

SCSI

Worker’s block trace

Enable same tracing during checking to see why recovery didn’t work

a fault point triggering

ACID violation

1 2

Checker’s multi-layer trace:

op#,

content, LBA,

timestamp, SCSI cmd#,

file, syscall,

function call

target daemon

SCSI Tracer

file system

multi-layer tracer

Page 81: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

Results

Page 82: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

82

• 8 databases - Open-source: TokyoCabinet, MariaDB, LightningDB, SQLite - Commercial: KVS-A, SQL-A, SQL-B, SQL-C

• 4 workloads

• 3 file systems - ext3, XFS, NTFS

• Several operating systems - Linux: RHEL 6, Debian6, Ubuntu 12 LTS - Windows 7 Enterprise

Experimental Environment

Page 83: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

83

DB FS W-1 W-2 W-3 W-4.1 W-4.2 W-4.3 A C I D

TokyoCabinet ext3 D D D ACD ACD ACD 0.15% 0.14% 0 16.05% XFS -- D D ACD D ACD <0.01% 0.01% 0 4.38%

MariaDB ext3 D D D D D D 0 0 0 1.36% XFS D D D D D D 0 0 0 0.49%

LightningDB ext3 -- -- -- -- -- D 0 0 0 0.05% XFS -- -- -- -- -- -- 0 0 0 0

SQLite ext3 D D -- D D D 0 0 0 19.15% XFS -- -- D D D D 0 0 0 10.60%

KVS-A ext3 -- -- Hang -- -- -- 0 0 0 0 XFS -- -- -- -- -- -- 0 0 0 0

SQL-A ext3 D D D D D D 0 0 0 3.31% XFS D D D D D D 0 0 0 0.92%

SQL-B ext3 D D CD CD CD CD 0 8.96% 0 3.24% XFS CD D CD CD CD CD 0 7.77% 0 3.90%

SQL-C NTFS D D D D D D 0 0 0 8.08%

Not a single DB can survive all tests

Page 84: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

84

DB FS W-1 W-2 W-3 W-4.1 W-4.2 W-4.3 A C I D

TokyoCabinet ext3 D D D ACD ACD ACD 0.15% 0.14% 0 16.05% XFS -- D D ACD D ACD <0.01% 0.01% 0 4.38%

MariaDB ext3 D D D D D D 0 0 0 1.36% XFS D D D D D D 0 0 0 0.49%

LightningDB ext3 -- -- -- -- -- D 0 0 0 0.05% XFS -- -- -- -- -- -- 0 0 0 0

SQLite ext3 D D -- D D D 0 0 0 19.15% XFS -- -- D D D D 0 0 0 10.60%

KVS-A ext3 -- -- Hang -- -- -- 0 0 0 0 XFS -- -- -- -- -- -- 0 0 0 0

SQL-A ext3 D D D D D D 0 0 0 3.31% XFS D D D D D D 0 0 0 0.92%

SQL-B ext3 D D CD CD CD CD 0 8.96% 0 3.24% XFS CD D CD CD CD CD 0 7.77% 0 3.90%

SQL-C NTFS D D D D D D 0 0 0 8.08%

Durability violation is most common

Page 85: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

85

DB FS W-1 W-2 W-3 W-4.1 W-4.2 W-4.3 A C I D

TokyoCabinet ext3 D D D ACD ACD ACD 0.15% 0.14% 0 16.05% XFS -- D D ACD D ACD <0.01% 0.01% 0 4.38%

MariaDB ext3 D D D D D D 0 0 0 1.36% XFS D D D D D D 0 0 0 0.49%

LightningDB ext3 -- -- -- -- -- D 0 0 0 0.05% XFS -- -- -- -- -- -- 0 0 0 0

SQLite ext3 D D -- D D D 0 0 0 19.15% XFS -- -- D D D D 0 0 0 10.60%

KVS-A ext3 -- -- Hang -- -- -- 0 0 0 0 XFS -- -- -- -- -- -- 0 0 0 0

SQL-A ext3 D D D D D D 0 0 0 3.31% XFS D D D D D D 0 0 0 0.92%

SQL-B ext3 D D CD CD CD CD 0 8.96% 0 3.24% XFS CD D CD CD CD CD 0 7.77% 0 3.90%

SQL-C NTFS D D D D D D 0 0 0 8.08%

Some violations are difficult to trigger

Page 86: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

86

DB FS W-1 W-2 W-3 W-4.1 W-4.2 W-4.3 A C I D

TokyoCabinet ext3 D D D ACD ACD ACD 0.15% 0.14% 0 16.05% XFS -- D D ACD D ACD <0.01% 0.01% 0 4.38%

MariaDB ext3 D D D D D D 0 0 0 1.36% XFS D D D D D D 0 0 0 0.49%

LightningDB ext3 -- -- -- -- -- D 0 0 0 0.05% XFS -- -- -- -- -- -- 0 0 0 0

SQLite ext3 D D -- D D D 0 0 0 19.15% XFS -- -- D D D D 0 0 0 10.60%

KVS-A ext3 -- -- Hang -- -- -- 0 0 0 0 XFS -- -- -- -- -- -- 0 0 0 0

SQL-A ext3 D D D D D D 0 0 0 3.31% XFS D D D D D D 0 0 0 0.92%

SQL-B ext3 D D CD CD CD CD 0 8.96% 0 3.24% XFS CD D CD CD CD CD 0 7.77% 0 3.90%

SQL-C NTFS D D D D D D 0 0 0 8.08%

Some violations are difficult to trigger

Page 87: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

87

• Failure symptoms faults injected in a region of operations cause: - A violation: a transaction is partially committed - D violation: some rows are irretrievable - C violation: retrievable rows by range query and point

queries are different

Case Study: A TokyoCabinet Bug

Page 88: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

88

... tchdbopenimpl(x.tcb) ... open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, ...101..., x.tcb tchdbwalrestore() tcbdbget() ...

Case Study: A TokyoCabinet Bug

... tchdbopenimpl(x.tcb) ... open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, ...100..., x.tcb tcbdbget() ...

Why recovery didn’t work?

Checker’s trace when no violation

was found

Checker’s trace when ACID violations

were found

Delta Debugging [Zeller, SIGSOFT’02/FSE-10]

Page 89: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

89

... tchdbopenimpl(x.tcb) ... open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, ...101..., x.tcb tchdbwalrestore() tcbdbget() ...

Checker’s trace when no violation

was found

Checker’s trace when ACID violations

were found

Case Study: A TokyoCabinet Bug

... tchdbopenimpl(x.tcb) ... open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, ...100..., x.tcb //no tchdbwalrestore() tcbdbget() ...

Why recovery didn’t work?

Delta Debugging [Zeller, SIGSOFT’02/FSE-10]

Page 90: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

90

... tchdbopenimpl(x.tcb) ... open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, ...101..., x.tcb tchdbwalrestore() tcbdbget() ...

Checker’s trace when no violation

was found

Checker’s trace when ACID violations

were found

Case Study: A TokyoCabinet Bug

... tchdbopenimpl(x.tcb) ... open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, ...100..., x.tcb //no tchdbwalrestore() tcbdbget() ...

Why recovery didn’t work?

Delta Debugging [Zeller, SIGSOFT’02/FSE-10]

Page 91: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

91

... mmap(8192, x.tcb, 0) ... fsync(x.tcb.wal) //op#, LBA, content, file, syscall op#26, 630, ............, x.tcb.wal, fsync(x.tcb.wal) op#27, 960, ............, fs-j , fsync(x.tcb.wal) op#28, 964, ............, fs-j , fsync(x.tcb.wal) op#29, 480, ...100..., x.tcb , fsync(x.tcb.wal) ... msync(x.tcb) //op#, LBA, content, file, syscall op#91, 480, ...101..., x.tcb, msync(x.tcb) ...

... tchdbopenimpl(x.tcb) ... open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, ...101..., x.tcb tchdbwalrestore() tcbdbget() ...

Case Study: A TokyoCabinet Bug

... tchdbopenimpl(x.tcb) ... open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, ...100..., x.tcb //no tchdbwalrestore() tcbdbget() ...

Why recovery didn’t work? What happened at fault time?

Worker’s trace around the bug-triggering

fault points op#30–#90

Page 92: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

92

... mmap(8192, x.tcb, 0) ... fsync(x.tcb.wal) //op#, LBA, content, file, syscall op#26, 630, ............, x.tcb.wal, fsync(x.tcb.wal) op#27, 960, ............, fs-j , fsync(x.tcb.wal) op#28, 964, ............, fs-j , fsync(x.tcb.wal) op#29, 480, ...100..., x.tcb , fsync(x.tcb.wal) ... msync(x.tcb) //op#, LBA, content, file, syscall op#91, 480, ...101..., x.tcb, msync(x.tcb) ...

... tchdbopenimpl(x.tcb) ... open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, ...101..., x.tcb tchdbwalrestore() tcbdbget() ...

Case Study: A TokyoCabinet Bug

... tchdbopenimpl(x.tcb) ... open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, ...100..., x.tcb //no tchdbwalrestore() tcbdbget() ...

Why recovery didn’t work? What happened at fault time?

Worker’s trace around the bug-triggering

fault points op#30–#90

Page 93: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

93

... mmap(8192, x.tcb, 0) ... fsync(x.tcb.wal) //op#, LBA, content, file, syscall op#26, 630, ............, x.tcb.wal, fsync(x.tcb.wal) op#27, 960, ............, fs-j , fsync(x.tcb.wal) op#28, 964, ............, fs-j , fsync(x.tcb.wal) op#29, 480, ...100..., x.tcb , fsync(x.tcb.wal) ... msync(x.tcb) //op#, LBA, content, file, syscall op#91, 480, ...101..., x.tcb, msync(x.tcb) ...

... tchdbopenimpl(x.tcb) ... open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, ...101..., x.tcb tchdbwalrestore() tcbdbget() ...

Case Study: A TokyoCabinet Bug

... tchdbopenimpl(x.tcb) ... open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, ...100..., x.tcb //no tchdbwalrestore() tcbdbget() ...

Why recovery didn’t work? What happened at fault time?

Intended update

Worker’s trace around the bug-triggering

fault points op#30–#90

Page 94: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

94

... mmap(8192, x.tcb, 0) ... fsync(x.tcb.wal) //op#, LBA, content, file, syscall op#26, 630, ............, x.tcb.wal, fsync(x.tcb.wal) op#27, 960, ............, fs-j , fsync(x.tcb.wal) op#28, 964, ............, fs-j , fsync(x.tcb.wal) op#29, 480, ...100..., x.tcb , fsync(x.tcb.wal) ... msync(x.tcb) //op#, LBA, content, file, syscall op#91, 480, ...101..., x.tcb, msync(x.tcb) ...

... tchdbopenimpl(x.tcb) ... open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, ...101..., x.tcb tchdbwalrestore() tcbdbget() ...

Case Study: A TokyoCabinet Bug

... tchdbopenimpl(x.tcb) ... open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, ...100..., x.tcb //no tchdbwalrestore() tcbdbget() ...

Why recovery didn’t work? What happened at fault time?

Worker’s trace around the bug-triggering

fault points op#30–#90

Intended update

Page 95: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

95

... mmap(8192, x.tcb, 0) ... fsync(x.tcb.wal) //op#, LBA, content, file, syscall op#26, 630, ............, x.tcb.wal, fsync(x.tcb.wal) op#27, 960, ............, fs-j , fsync(x.tcb.wal) op#28, 964, ............, fs-j , fsync(x.tcb.wal) op#29, 480, ...100..., x.tcb , fsync(x.tcb.wal) ... msync(x.tcb) //op#, LBA, content, file, syscall op#91, 480, ...101..., x.tcb, msync(x.tcb) ...

... tchdbopenimpl(x.tcb) ... open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, ...101..., x.tcb tchdbwalrestore() tcbdbget() ...

Case Study: A TokyoCabinet Bug

... tchdbopenimpl(x.tcb) ... open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, ...100..., x.tcb //no tchdbwalrestore() tcbdbget() ...

Why recovery didn’t work? What happened at fault time?

Unintended update!

Worker’s trace around the bug-triggering

fault points op#30–#90

Intended update

Page 96: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

96

... mmap(8192, x.tcb, 0) ... fsync(x.tcb.wal) //op#, LBA, content, file, syscall op#26, 630, ............, x.tcb.wal, fsync(x.tcb.wal) op#27, 960, ............, fs-j , fsync(x.tcb.wal) op#28, 964, ............, fs-j , fsync(x.tcb.wal) op#29, 480, ...100..., x.tcb , fsync(x.tcb.wal) ... msync(x.tcb) //op#, LBA, content, file, syscall op#91, 480, ...101..., x.tcb, msync(x.tcb) ...

... tchdbopenimpl(x.tcb) ... open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, ...101..., x.tcb tchdbwalrestore() tcbdbget() ...

Case Study: A TokyoCabinet Bug

... tchdbopenimpl(x.tcb) ... open(x.tcb) = 3 read(x.tcb) = 256 //op#, LBA, content, file op#1, 480, ...100..., x.tcb //no tchdbwalrestore() tcbdbget() ...

Why recovery didn’t work? What happened at fault time?

One solution: Failure-atomic msync() [Park et.al., EuroSys’13]

Worker’s trace around the bug-triggering

fault points op#30–#90

Page 97: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

97

6

19

20

22

33

61

72

157

TokyoCabinet

SQLite

SQL-A

SQL-B

LightningDB

KVS-A

MariaDB

SQL-C

reduction factor =

Patterns reduce required test points greatly while achieving similar coverage

0 19 60

Page 98: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

98

6

19

20

22

33

61

72

157

TokyoCabinet

SQLite

SQL-A

SQL-B

LightningDB

KVS-A

MariaDB

SQL-C

reduction factor =

Patterns reduce required test points greatly while achieving similar coverage

0 19 60

the 2 databases from which patterns are extracted

Page 99: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

99

6

19

20

22

33

61

72

157

TokyoCabinet

SQLite

SQL-A

SQL-B

LightningDB

KVS-A

MariaDB

SQL-C

reduction factor =

Patterns reduce required test points greatly while achieving similar coverage

0 19 60

reduce testing time from > 2 months

to < 3 days!

Page 100: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

100

Conclusions & Future Work

• A wake-up call - traditional testing methodology may not be enough for today’s

complex storage systems - thorough testing requires purpose-built workloads and

intelligent fault injection techniques

Page 101: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

101

Conclusions & Future Work

• A wake-up call - traditional testing methodology may not be enough for today’s

complex storage systems - thorough testing requires purpose-built workloads and

intelligent fault injection techniques

• Different layers in OS can help in different ways - iSCSI: fault injection w/ high portability & high fidelity - LBA & syscall: generic behavior patterns - combined multi-layer info: clear whole picture of complicated

scenarios

Page 102: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

102

Conclusions & Future Work

• We should bridge the gaps of understanding/assumptions!

between User & DB

between DB & OS

Page 103: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

103

Conclusions & Future Work

• We should bridge the gaps of understanding/assumptions!

Pop Quiz: true or false? “mmap'ed files are not updated until msync()” “file-length update are persistent after fdatasync()” “durability is provided by the default configuration” “transactions are durable after COMMIT”

between User & DB

between DB & OS

Page 104: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

104

Conclusions & Future Work

• We should bridge the gaps of understanding/assumptions!

Pop Quiz: true or false? “mmap'ed files are not updated until msync()” false “file-length update are persistent after fdatasync()” “durability is provided by the default configuration” “transactions are durable after COMMIT”

between User & DB

between DB & OS

Page 105: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

105

Conclusions & Future Work

• We should bridge the gaps of understanding/assumptions!

Pop Quiz: true or false? “mmap'ed files are not updated until msync()” false “file-length update are persistent after fdatasync()” depends! “durability is provided by the default configuration” depends! “transactions are durable after COMMIT” depends!

between User & DB

between DB & OS

Page 106: Torturing Databases for Fun and Profit...Torturing Databases for Fun and Profit † The Ohio State University ‡ HP Labs Mai Zheng †, Joseph Tucek ‡, Dachuan Huang †, Feng Qin

106

Conclusions & Future Work

• We should bridge the gaps of understanding/assumptions!

Thank you!

Pop Quiz: true or false? “mmap'ed files are not updated until msync()” false “file-length update are persistent after fdatasync()” depends! “durability is provided by the default configuration” depends! “transactions are durable after COMMIT” depends!

between User & DB

between DB & OS