41
Eidetic Systems David Devecsery, Michael Chow, Xianzheng Dou, Jason Flinn, Peter Chen University of Michigan

Eidetic Systems

Embed Size (px)

DESCRIPTION

Eidetic Systems. David Devecsery , Michael Chow, Xianzheng Dou, Jason Flinn, Peter Chen University of Michigan. What is an Eidetic System?. Eidetic – Having “Perfect memory” or “Total Recall” - PowerPoint PPT Presentation

Citation preview

Eidetic SystemsDavid Devecsery, Michael Chow, Xianzheng Dou,

Jason Flinn, Peter ChenUniversity of Michigan

What is an Eidetic System?

Eidetic – Having “Perfect memory” or “Total Recall”

Eidetic System – A system which can recall and trace through the lineage of any past computation

2David Devecsery

Motivation - Heartbleed

3

• Was Heartbleed exploited?• What data was leaked?

David Devecsery

Motivation - Heartbleed

4

HeartbleedMessage

Leaked Data

• Was Heartbleed exploited? - Yes• What data was leaked?

David Devecsery

Motivation - Heartbleed

5

• Was Heartbleed exploited? - Yes• What data was leaked?

Leaked Database Rows

David Devecsery

HeartbleedMessage

Leaked Data

Motivation – Wrong Reference

6

Bad Citation

• How did I get the wrong citation?

David Devecsery

Motivation – Wrong Reference

7

• How did I get the wrong citation?

David Devecsery

Motivation – Wrong Reference

8

• How did I get the wrong citation?

David Devecsery

Motivation – Wrong Reference

9

• How did I get the wrong citation?• What else did this affect?

David Devecsery

Motivation

10

• How did I get the wrong citation?• What else did this affect?

David Devecsery

Arnold

•First practical eidetic computer system• Efficiently records & recalls all user-space computation• Process register/memory state• Inter-process communication

• Handles lineage queries• What data was affected?• What states and outputs were affected?

• Targeted towards desktop/workstation use•Reasonable overheads• Record 4 years of data on $150 commodity HD• Under 8% performance overhead on most benchmarks

11David Devecsery

Overview

• Introduction•Motivation•How Arnold remembers all state•How Arnold supports lineage queries•Conclusion

12David Devecsery

Remembering State

•Requirements:• Store years of state on a single disk• Memory/register space within a process• Inter process communication• File state

• Recall any state in reasonable time•Solution:• Deterministic record & replay• “Process group” based replay• “Process graph” to track inter-process lineage

• Log compression

13David Devecsery

Recording Granularity

•What granularity is best to record our system?

14

Pipe

1

2Read 1

Pipe

1

2Read 1 Pipe

1

2Read 1

Pipe

1

2Read 1

Pipe

1

2Read 1

ExternalInputs

David Devecsery

Recording Granularity

• Whole system recordingLow space overhead× Costly to replay

15

Pipe

1

2Read 1

Pipe

1

2Read 1 Pipe

1

2Read 1

Pipe

1

2Read 1

Pipe

1

2Read 1

ExternalInputs

David Devecsery

Recording Granularity

•Process level recordingEfficient to replay×Uses extra disk space×No Inter-process tracking

16

Pipe

1

2Read 1

Pipe

1

2Read 1 Pipe

1

2Read 1

Pipe

1

2Read 1

Pipe

1

2Read 1

ExternalInputs

David Devecsery

Recording Granularity

•Process group recordingEfficient to replayReasonable disk space×No Inter-process tracking

17

Pipe

1

2Read 1

Pipe

1

2Read 1 Pipe

1

2Read 1

Pipe

1

2Read 1

Pipe

1

2Read 1

ExternalInputs

David Devecsery

Implementation – Process Graph

18

Record Log

Pipe

1

2Read 1

1

Pipe

1

2Read 1Pipe

1

2Read 1

2

IPC Read

David Devecsery

Implementation – Process Graph

19

Record Log

Pipe

1

2Read 1

1

Pipe

1

2Read 1Pipe

1

2Read 1

2

IPC ReadPipe

1

2Read 1Pipe

1

2Read 1

David Devecsery

Recording

•Process group recording + process graphEfficient to replayReasonable disk spaceInter-process tracking

20

Pipe

1

2Read 1

Pipe

1

2Read 1 Pipe

1

2Read 1

Pipe

1

2Read 1

Pipe

1

2Read 1

ExternalInputs

David Devecsery

Space Optimizations

21

Baselin

e

+Model-

Based Compres

sion

+Ded

uplicate

d File

Cache

+X Se

rver C

ompressio

n

+Sem

i-Dete

rministi

c Tim

e+G

zip

0

0.2

0.4

0.6

0.8

1

1.2

Log

Com

pres

sion

vs B

asel

ine

David Devecsery

Space Optimizations

22

Baselin

e

+Model-

Based Compres

sion

+Ded

uplicate

d File

Cache

+X Se

rver C

ompressio

n

+Sem

i-Dete

rministi

c Tim

e+G

zip0

0.2

0.4

0.6

0.8

1

1.2

Log

Com

pres

sion

vs B

asel

ine

411:1 Ratio

David Devecsery

Space Optimizations

23

Baselin

e

+Model-

Based Compres

sion

+Ded

uplicate

d File

Cache

+X Se

rver C

ompressio

n

+Sem

i-Dete

rministi

c Tim

e+G

zip

Only Gzip

0

0.2

0.4

0.6

0.8

1

1.2

Log

Com

pres

sion

vs B

asel

ine

411:1 Ratio

6:1 Ratio

David Devecsery

Space Optimizations

24

Baselin

e

+Model-

Based Compres

sion

+Ded

uplicate

d File

Cache

+X Se

rver C

ompressio

n

+Sem

i-Dete

rministi

c Tim

e+G

zip

Only Gzip

0

0.2

0.4

0.6

0.8

1

1.2

Log

Com

pres

sion

vs B

asel

ine

411:1 Ratio

6:1 Ratio

David Devecsery

4 years of data on a $150 4TB commodity HD

Model-Based Compression

• Formulate a model of a typical execution • Only record deviations from that model

ret_val = sys_read (fd, buffer, count);

• Idea: Partial determinism• Encourage the program to conform to the model

25

usually equal

David Devecsery

Semi-Deterministic Time

• Frequent time queries are non-deterministic• Use partially deterministic clock• Real time clock & deterministic clock• Bound deviation

26

if (deterministic_clock – real_time_clock < threshold) {adjust deterministic_clockrecord deviation

}return deterministic_clock

David Devecsery

Performance Evaluation

27

kern

el copy

cvs c

heckout

make

latex

apac

hege

dit

facebook

spread

sheet

0.9

0.95

1

1.05

1.1

1.15

1.2 Baseline Arnold

Nor

mal

ized

Runti

me

David Devecsery

Overview

• Introduction•Motivation•How Arnold remembers all state•How Arnold supports lineage queries•Conclusion

28David Devecsery

Querying Lineage

•Two types of queries:•Reverse: Where did this data come from?•Forward: What did this data affect?

•How does Arnold support these queries?•User specifies initial state•Trace the lineage of the computation• Intra-process tracking• Inter-process tracking

29David Devecsery

Intra-Process Lineage

• Use taint tracking for intra-process causality• Run retroactively, on recorded execution• Parallelizable

• Arnold supports several notions of causality:

30

Copy Only Data Flow Data+Index Flow

May miss relations Misses few relationsRecall

Strong input/output relation

Weak input/output Relation

Precision

Control Flow

David Devecsery

Intra-Process Lineage

31

Which linkage tool should Arnold use?

Inputs

Program

David Devecsery

Intra-Process Lineage

32

Data Flow Data+IndexFlow

May miss relations Misses few relationsRecall

Strong input/output relation

Weak input/output Relation

Precision

Copy

David Devecsery

Intra-Process Lineage

33

Data Flow Data+IndexFlow

May miss relations Misses few relationsRecall

Strong input/output relation

Weak input/output Relation

Precision

Copy

David Devecsery

Intra-Process Lineage

34

Data Flow Data+IndexFlow

May miss relations Misses few relationsRecall

Strong input/output relation

Weak input/output Relation

Precision

Copy

David Devecsery

Intra-Process Lineage

35

Data Flow Data+IndexFlow

May miss relations Misses few relationsRecall

Strong input/output relation

Weak input/output Relation

Precision

Copy

David Devecsery

Intra-Process Lineage

36

Data Flow Data+IndexFlow

May miss relations Misses few relationsRecall

Strong input/output relation

Weak input/output Relation

Precision

Copy

David Devecsery

Intra-Process Lineage

37

Data Flow

May miss relations Misses few relationsRecall

Strong input/output relation

Weak input/output Relation

Precision

Arnold selects themost precise tool withat least one result

David Devecsery

Inter-Process Lineage

• Two notions of inter-process linkage• Process graph• Tracks lineage through inter-process communication• Precise • Captures group to group communication

• Human linkage• Handles relations between user inputs and outputs• Infers linkages based on data content and time• Imprecise – may have false negatives and false positives• Can capture linkages the process graph can miss

38David Devecsery

Evaluation – Wrong Reference

39

Data + IndexDataCopyCopyData

• Few false positives (font files, latex sty files, libc.so, libXt.so)• No false negatives

Record Time Replay Time Replay + Pin Time

Query Time

96.1s 2.2s 70.0s 209.5s

HumanLinkage

David Devecsery

Evaluation – Heartbleed

40

• No false positives or negatives

Data + IndexData + Index Data + Index

Record Time Replay Time Replay + Pin Time

Query Time

230.3s 0.4s 139.5s 235.1s

David Devecsery

Conclusion

•Eidetic Systems are powerful tools• Complete vision into past computation• Answer powerful queries about state’s lineage

•Arnold – First practical Eidetic System• Low runtime overhead• 4 years of computation on a commodity HD• Supports powerful lineage queries

•Code is releasedhttps://github.com/endplay/omniplay

41David Devecsery