Upload
sandra-townsend
View
30
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Eidetic Systems. David Devecsery , Michael Chow, Xianzheng Dou, Jason Flinn, Peter Chen University of Michigan. What is an Eidetic System?. Eidetic – Having “Perfect memory” or “Total Recall” - PowerPoint PPT Presentation
Citation preview
Eidetic SystemsDavid Devecsery, Michael Chow, Xianzheng Dou,
Jason Flinn, Peter ChenUniversity of Michigan
What is an Eidetic System?
Eidetic – Having “Perfect memory” or “Total Recall”
Eidetic System – A system which can recall and trace through the lineage of any past computation
2David Devecsery
Motivation - Heartbleed
4
HeartbleedMessage
Leaked Data
• Was Heartbleed exploited? - Yes• What data was leaked?
David Devecsery
Motivation - Heartbleed
5
• Was Heartbleed exploited? - Yes• What data was leaked?
Leaked Database Rows
David Devecsery
HeartbleedMessage
Leaked Data
Motivation – Wrong Reference
9
• How did I get the wrong citation?• What else did this affect?
David Devecsery
Arnold
•First practical eidetic computer system• Efficiently records & recalls all user-space computation• Process register/memory state• Inter-process communication
• Handles lineage queries• What data was affected?• What states and outputs were affected?
• Targeted towards desktop/workstation use•Reasonable overheads• Record 4 years of data on $150 commodity HD• Under 8% performance overhead on most benchmarks
11David Devecsery
Overview
• Introduction•Motivation•How Arnold remembers all state•How Arnold supports lineage queries•Conclusion
12David Devecsery
Remembering State
•Requirements:• Store years of state on a single disk• Memory/register space within a process• Inter process communication• File state
• Recall any state in reasonable time•Solution:• Deterministic record & replay• “Process group” based replay• “Process graph” to track inter-process lineage
• Log compression
13David Devecsery
Recording Granularity
•What granularity is best to record our system?
14
Pipe
1
2Read 1
Pipe
1
2Read 1 Pipe
1
2Read 1
Pipe
1
2Read 1
Pipe
1
2Read 1
ExternalInputs
David Devecsery
Recording Granularity
• Whole system recordingLow space overhead× Costly to replay
15
Pipe
1
2Read 1
Pipe
1
2Read 1 Pipe
1
2Read 1
Pipe
1
2Read 1
Pipe
1
2Read 1
ExternalInputs
David Devecsery
Recording Granularity
•Process level recordingEfficient to replay×Uses extra disk space×No Inter-process tracking
16
Pipe
1
2Read 1
Pipe
1
2Read 1 Pipe
1
2Read 1
Pipe
1
2Read 1
Pipe
1
2Read 1
ExternalInputs
David Devecsery
Recording Granularity
•Process group recordingEfficient to replayReasonable disk space×No Inter-process tracking
17
Pipe
1
2Read 1
Pipe
1
2Read 1 Pipe
1
2Read 1
Pipe
1
2Read 1
Pipe
1
2Read 1
ExternalInputs
David Devecsery
Implementation – Process Graph
18
Record Log
Pipe
1
2Read 1
1
Pipe
1
2Read 1Pipe
1
2Read 1
2
IPC Read
David Devecsery
Implementation – Process Graph
19
Record Log
Pipe
1
2Read 1
1
Pipe
1
2Read 1Pipe
1
2Read 1
2
IPC ReadPipe
1
2Read 1Pipe
1
2Read 1
David Devecsery
Recording
•Process group recording + process graphEfficient to replayReasonable disk spaceInter-process tracking
20
Pipe
1
2Read 1
Pipe
1
2Read 1 Pipe
1
2Read 1
Pipe
1
2Read 1
Pipe
1
2Read 1
ExternalInputs
David Devecsery
Space Optimizations
21
Baselin
e
+Model-
Based Compres
sion
+Ded
uplicate
d File
Cache
+X Se
rver C
ompressio
n
+Sem
i-Dete
rministi
c Tim
e+G
zip
0
0.2
0.4
0.6
0.8
1
1.2
Log
Com
pres
sion
vs B
asel
ine
David Devecsery
Space Optimizations
22
Baselin
e
+Model-
Based Compres
sion
+Ded
uplicate
d File
Cache
+X Se
rver C
ompressio
n
+Sem
i-Dete
rministi
c Tim
e+G
zip0
0.2
0.4
0.6
0.8
1
1.2
Log
Com
pres
sion
vs B
asel
ine
411:1 Ratio
David Devecsery
Space Optimizations
23
Baselin
e
+Model-
Based Compres
sion
+Ded
uplicate
d File
Cache
+X Se
rver C
ompressio
n
+Sem
i-Dete
rministi
c Tim
e+G
zip
Only Gzip
0
0.2
0.4
0.6
0.8
1
1.2
Log
Com
pres
sion
vs B
asel
ine
411:1 Ratio
6:1 Ratio
David Devecsery
Space Optimizations
24
Baselin
e
+Model-
Based Compres
sion
+Ded
uplicate
d File
Cache
+X Se
rver C
ompressio
n
+Sem
i-Dete
rministi
c Tim
e+G
zip
Only Gzip
0
0.2
0.4
0.6
0.8
1
1.2
Log
Com
pres
sion
vs B
asel
ine
411:1 Ratio
6:1 Ratio
David Devecsery
4 years of data on a $150 4TB commodity HD
Model-Based Compression
• Formulate a model of a typical execution • Only record deviations from that model
ret_val = sys_read (fd, buffer, count);
• Idea: Partial determinism• Encourage the program to conform to the model
25
usually equal
David Devecsery
Semi-Deterministic Time
• Frequent time queries are non-deterministic• Use partially deterministic clock• Real time clock & deterministic clock• Bound deviation
26
if (deterministic_clock – real_time_clock < threshold) {adjust deterministic_clockrecord deviation
}return deterministic_clock
David Devecsery
Performance Evaluation
27
kern
el copy
cvs c
heckout
make
latex
apac
hege
dit
spread
sheet
0.9
0.95
1
1.05
1.1
1.15
1.2 Baseline Arnold
Nor
mal
ized
Runti
me
David Devecsery
Overview
• Introduction•Motivation•How Arnold remembers all state•How Arnold supports lineage queries•Conclusion
28David Devecsery
Querying Lineage
•Two types of queries:•Reverse: Where did this data come from?•Forward: What did this data affect?
•How does Arnold support these queries?•User specifies initial state•Trace the lineage of the computation• Intra-process tracking• Inter-process tracking
29David Devecsery
Intra-Process Lineage
• Use taint tracking for intra-process causality• Run retroactively, on recorded execution• Parallelizable
• Arnold supports several notions of causality:
30
Copy Only Data Flow Data+Index Flow
May miss relations Misses few relationsRecall
Strong input/output relation
Weak input/output Relation
Precision
Control Flow
David Devecsery
Intra-Process Lineage
32
Data Flow Data+IndexFlow
May miss relations Misses few relationsRecall
Strong input/output relation
Weak input/output Relation
Precision
Copy
David Devecsery
Intra-Process Lineage
33
Data Flow Data+IndexFlow
May miss relations Misses few relationsRecall
Strong input/output relation
Weak input/output Relation
Precision
Copy
David Devecsery
Intra-Process Lineage
34
Data Flow Data+IndexFlow
May miss relations Misses few relationsRecall
Strong input/output relation
Weak input/output Relation
Precision
Copy
David Devecsery
Intra-Process Lineage
35
Data Flow Data+IndexFlow
May miss relations Misses few relationsRecall
Strong input/output relation
Weak input/output Relation
Precision
Copy
David Devecsery
Intra-Process Lineage
36
Data Flow Data+IndexFlow
May miss relations Misses few relationsRecall
Strong input/output relation
Weak input/output Relation
Precision
Copy
David Devecsery
Intra-Process Lineage
37
Data Flow
May miss relations Misses few relationsRecall
Strong input/output relation
Weak input/output Relation
Precision
Arnold selects themost precise tool withat least one result
David Devecsery
Inter-Process Lineage
• Two notions of inter-process linkage• Process graph• Tracks lineage through inter-process communication• Precise • Captures group to group communication
• Human linkage• Handles relations between user inputs and outputs• Infers linkages based on data content and time• Imprecise – may have false negatives and false positives• Can capture linkages the process graph can miss
38David Devecsery
Evaluation – Wrong Reference
39
Data + IndexDataCopyCopyData
• Few false positives (font files, latex sty files, libc.so, libXt.so)• No false negatives
Record Time Replay Time Replay + Pin Time
Query Time
96.1s 2.2s 70.0s 209.5s
HumanLinkage
David Devecsery
Evaluation – Heartbleed
40
• No false positives or negatives
Data + IndexData + Index Data + Index
Record Time Replay Time Replay + Pin Time
Query Time
230.3s 0.4s 139.5s 235.1s
David Devecsery
Conclusion
•Eidetic Systems are powerful tools• Complete vision into past computation• Answer powerful queries about state’s lineage
•Arnold – First practical Eidetic System• Low runtime overhead• 4 years of computation on a commodity HD• Supports powerful lineage queries
•Code is releasedhttps://github.com/endplay/omniplay
41David Devecsery