Upload
amber-evans
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Tongping Liu, Charlie Curtsinger, Emery Berger
DTHREADS: Efficient Deterministic Multithreading
Insanity: Doing the same thing over and
over again and expecting different
results.
2
In the Beginning…
3
There was the Core.
4
And it was Good.
5
It gave us our Daily Speed.
6
Until the Apocalypse.
7
And the Speed was no Moore.
8
And then came a False Prophet…
9
10
Want speed?
11
I BRING YOU THE GIFT OF PARALLELISM!
12
color = ; row = 0; // globalsvoid nextStripe(){ for (c = 0; c < Width; c++) drawBox (c,row,color); color = (color == )? : ; row++;}for (n = 0; n < 9; n++) pthread_create(t[n], nextStripe);for (n = 0; n < 9; n++) pthread_join(t[n]);JUST USE THREADS…
13
14
15
16
17
18
pthreads
race conditions
atomicity violations
deadlock
order violations
19
Salvation?
20
21
pthreads
race conditions
atomicity violations
deadlock
order violations
DTHREADS
deterministic
race conditions
atomicity violations
deadlock
order violations
22DTHREADS Enables…
Race-free Executions
Replay Debugging w/o Logging
Replicated State Machines
23
PHOEN
IX
hist
ogra
m
kmea
ns
linea
r_re
gres
sion
mat
rix_m
ultip
lypc
a
reve
rse_
inde
x
strin
g_m
atch
wor
d_co
unt
PARSE
C
blac
ksch
oles
cann
eal
dedu
p
ferret
stre
amclu
ster
swap
tions
hmea
n0
1
2
3
4
5
6
CoreDet dthreads pthreads
run
tim
e r
ela
tive t
o p
thre
ad
s 8.4
Overhead with CoreDet
7.8
DTHREADS: Efficient Determinism
Usually faster than the state of the art
24
PHOEN
IX
hist
ogra
m
kmea
ns
linea
r_re
gres
sion
mat
rix_m
ultip
lypc
a
reve
rse_
inde
x
strin
g_m
atch
wor
d_co
unt
PARSE
C
blac
ksch
oles
cann
eal
dedu
p
ferret
stre
amclu
ster
swap
tions
hmea
n0
1
2
3
4
5
6
CoreDet dthreads pthreads
run
tim
e r
ela
tive t
o p
thre
ad
s 8.4
Overhead with CoreDet
7.8
DTHREADS: Efficient Determinism
Generally as fast or faster than pthreads
25
% g++ myprog.cpp –l thread
DTHREADS: Easy to Use
p
26
Isolation
shared address space disjoint address spaces
27
Performance: Processes vs. Threads
threadsprocesses
1 2 4 8 16 32 64 128 256 512 1024Thread Execution Time (ms)
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
Nor
mal
ized
Exec
ution
Tim
e
28
Performance: Processes vs. Threads
threadsprocesses
1 2 4 8 16 32 64 128 256 512 1024Thread Execution Time (ms)
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
Nor
mal
ized
Exec
ution
Tim
e
29
Performance: Processes vs. Threads
threadsprocesses
1 2 4 8 16 32 64 128 256 512 1024Thread Execution Time (ms)
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
Nor
mal
ized
Exec
ution
Tim
e
30
“Shared Memory”
31
Snapshot pagesbefore modifications
“Shared Memory”
32
Write back diffs
“Shared Memory”
33
“Thread” 1
“Thread” 2
“Thread” 3
Parallel Serial
Update in Deterministic Time & Order
Parallelmutex_lock
cond_wait
pthread_create
34
PHOENIX
histogra
m
kmea
ns
linea
r_reg
ressio
n
matrix_
multiply pca
revers
e_index
string_
match
word_count
PARSEC
blacksc
holes
cannea
l
dedup
ferret
strea
mcluste
r
swap
tions
hmean
0
1
2
3
4
dthreads pthreads
runti
me
rela
tive
to p
thre
ads
DTHREADS performance analysis
35
Thread 1
Main Memory
Core 1
Thread 2
Core 2
Invalidate
The Culprit: False Sharing
36
Thread 1 Thread 2
Invalidate
Main Memory
Core 1 Core 2
The Culprit: False Sharing
20x
37
Process 1 Process 2
Global State
Core 1 Core 2
Process 2
Process 1
DTHREADS: Eliminates False Sharing!
38
PHOEN
IX
hist
ogra
m
kmea
ns
linea
r_re
gres
sion
mat
rix_m
ultip
lypc
a
reve
rse_
inde
x
strin
g_m
atch
wor
d_co
unt
PARSE
C
blac
ksch
oles
cann
eal
dedu
p
ferret
stre
amclu
ster
swap
tions
hmea
n0
1
2
3
4
5
6
ordering only isolation only dthreads
run
tim
e r
ela
tive t
o p
thre
ad
s
Dthreads detailed analysis
DTHREADS: Detailed Analysis
39
PHOEN
IX
hist
ogra
m
kmea
ns
linea
r_re
gres
sion
mat
rix_m
ultip
lypc
a
reve
rse_
inde
x
strin
g_m
atch
wor
d_co
unt
PARSE
C
blac
ksch
oles
cann
eal
dedu
p
ferret
stre
amclu
ster
swap
tions
hmea
n0
1
2
3
4
5
6
ordering only isolation only dthreads
run
tim
e r
ela
tive t
o p
thre
ad
s
Dthreads detailed analysis
DTHREADS: Detailed Analysis
40
PHOEN
IX
hist
ogra
m
kmea
ns
linea
r_re
gres
sion
mat
rix_m
ultip
lypc
a
reve
rse_
inde
x
strin
g_m
atch
wor
d_co
unt
PARSE
C
blac
ksch
oles
cann
eal
dedu
p
ferret
stre
amclu
ster
swap
tions
hmea
n0
1
2
3
4
5
6
ordering only isolation only dthreads
run
tim
e r
ela
tive t
o p
thre
ad
s
Dthreads detailed analysis
DTHREADS: Detailed Analysis
41
PHOEN
IX
hist
ogra
m
kmea
ns
linea
r_re
gres
sion
mat
rix_m
ultip
lypc
a
reve
rse_
inde
x
strin
g_m
atch
wor
d_co
unt
PARSE
C
blac
ksch
oles
dedu
p
ferret
stre
amclu
ster
swap
tions
hmea
n0
1
2
3
4
CoreDet dthreads pthreads
spee
dup
of 8
cor
es o
ver 2
cor
es
Scalability
DTHREADS: Scalable Determinism
42
PHOEN
IX
hist
ogra
m
kmea
ns
linea
r_re
gres
sion
mat
rix_m
ultip
lypc
a
reve
rse_
inde
x
strin
g_m
atch
wor
d_co
unt
PARSE
C
blac
ksch
oles
dedu
p
ferret
stre
amclu
ster
swap
tions
hmea
n0
1
2
3
4
CoreDet dthreads pthreads
spee
dup
of 8
cor
es o
ver 2
cor
es
Scalability
DTHREADS: Scalable Determinism
43
PHOEN
IX
hist
ogra
m
kmea
ns
linea
r_re
gres
sion
mat
rix_m
ultip
lypc
a
reve
rse_
inde
x
strin
g_m
atch
wor
d_co
unt
PARSE
C
blac
ksch
oles
dedu
p
ferret
stre
amclu
ster
swap
tions
hmea
n0
1
2
3
4
CoreDet dthreads pthreads
spee
dup
of 8
cor
es o
ver 2
cor
es
Scalability
DTHREADS: Scalable Determinism
44
DTHREADS
% g++ myprog.cpp –l threadp
45
End
46
A a
47
PHOEN
IX
hist
ogra
m
kmea
ns
linea
r_re
gres
sion
mat
rix_m
ultip
lypc
a
reve
rse_
inde
x
strin
g_m
atch
wor
d_co
unt
PARSE
C
blac
ksch
oles
cann
eal
dedu
p
ferret
stre
amclu
ster
swap
tions
hmea
n
w/o
out
liers
0
1
2
3
4
5
6
dthreads pthreads
run
tim
e r
ela
tive t
o p
thre
ad
s
Excluding Outliers
DTHREADS: Without Outliers
Just 5% slower than pthreads
48
Commit Protocol
Time
Twin Page
Diff
Global State
LocalState
49
a 0
b 0
a 1
b 1
DTHREADS Example Execution
a 0
b 0
a 0
b 0
a 0
b 0
if(a == 0) b = 1;
if(b == 0) a = 1;
Global State
Committed State
a 1
b 1
50
No Problem
a 0
b 0
if(a == 0) b = 1;
if(b == 0) a = 1;
a 1
b 1
51
That’s Better.
a 0
b 0
lock();if(a == 0) b = 1;unlock();
lock();if(b == 0) a = 1;unlock();
b 1
52
a 0
b 0
a 1lock();if(a == 0) b = 1;unlock();
lock();if(b == 0) a = 1;unlock();
Or is it?
53
A aDeterminism
aA
A aIs this enough?
54
bBA a
C
Robust Determinism
55
External Nondeterminism
?
socket = open_socket(80);listen(socket);
56
http://www.gnu.org/s/pth/
57
PHOEN
IX
hist
ogra
m
kmea
ns
linea
r_re
gres
sion
mat
rix_m
ultip
lypc
a
reve
rse_
inde
x
strin
g_m
atch
wor
d_co
unt
PARSE
C
blac
ksch
oles
cann
eal
dedu
p
ferret
stre
amclu
ster
swap
tions
hmea
n0
1
2
3
4
dthreads pthreads
run
tim
e r
ela
tive t
o p
thre
ad
sOverhead
58
Wrap-Up
A a Determinism
Robust Determinism
Internal Determinism?
59
Wrap-Up
Threads to Processes
Commit Before Synch.
Commit In Token Order
60
PHOEN
IX
hist
ogra
m
kmea
ns
linea
r_re
gres
sion
mat
rix_m
ultip
lypc
a
reve
rse_
inde
x
strin
g_m
atch
wor
d_co
unt
PARSE
C
blac
ksch
oles
cann
eal
dedu
p
ferret
stre
amclu
ster
swap
tions
hmea
n0
1
2
3
4
5
6
CoreDet dthreads pthreads
run
tim
e r
ela
tive t
o p
thre
ad
s 8.4
Overhead with CoreDet
7.8
[ASPLOS 10]
Performance: DTHREADS & CoreDet vs. pthreads
61
How DTHREADS Provides Determinism
Isolation
Deterministic Time
Deterministic Order
62
Evaluation
Phoenixhttp://mapreduce.stanford.edu
http://parsec.cs.princeton.edu