Upload
phungtuyen
View
220
Download
0
Embed Size (px)
Citation preview
Computing with Unreliable Resources:Design, Analysis and Algorithms
Da Wang, Ph.D Candidate
Thesis Committee: Gregory W. Wornell (Advisor)Yury PolyanskiyDevavrat Shah
Signals, Informationand AlgorithmsLaboratory
Thesis Defense
May 8, 2014
Computing with unreliable resources: an emerging paradigm
Large amount of data requires large amount of computingresources
I VLSI circuitI cloud computing/data centersI . . .
Why are resources unreliable?
technological constraints cost constraints
1 / 49
A study via concrete applications
1 Circuit design with unreliable components
2 Scheduling parallel tasks with variableresponse times
3 Crowd-based ranking via noisy comparisons
2 / 49
Reliable circuit design withunreliable components
In the news—April 2013
AMD claims 20nm transitionsignals the end of Moore’s law
Chief product architect at AMD:
“If you print too fewtransistors your chip willcost too much pertransistor . . .
. . . and if you put too manyit will cost too much pertransistor.”
3 / 49
New challenge: fabrication flaws
Fabrication flaws
1 process variations2 fabrication defects
get worse aswe approach physical limits!
Flash ADC design with imprecise comparators
Captures process variations
Digital circuit design with faulty components
Captures fabrication defects
4 / 49
New challenge: fabrication flaws
Fabrication flaws
1 process variations2 fabrication defects
get worse aswe approach physical limits!
Flash ADC design with imprecise comparators
Captures process variationsthis talk X
Digital circuit design with faulty components
Captures fabrication defectsin thesis
4 / 49
Reliable Analog-to-Digital Converter Design
with imprecise comparators
A theoretical framework
What are the fundamental performance limits?
Is optimal design for the precise comparators case still optimal?
Should we use imprecise comparators?
Joint work with: Yury Polyanskiy, Gregory Wornell
Acknowledgment: Frank Yaul, Anantha Chandrakasan
5 / 49
Reliable Analog-to-Digital Converter Design
with imprecise comparators
A theoretical framework
What are the fundamental performance limits?
Is optimal design for the precise comparators case still optimal?
Should we use imprecise comparators?
Joint work with: Yury Polyanskiy, Gregory Wornell
Acknowledgment: Frank Yaul, Anantha Chandrakasan
5 / 49
Reliable Analog-to-Digital Converter Design
with imprecise comparators
A theoretical framework
What are the fundamental performance limits?
Is optimal design for the precise comparators case still optimal?
Should we use imprecise comparators?
Joint work with: Yury Polyanskiy, Gregory Wornell
Acknowledgment: Frank Yaul, Anantha Chandrakasan
5 / 49
Reliable Analog-to-Digital Converter Design
with imprecise comparators
A theoretical framework
What are the fundamental performance limits?
Is optimal design for the precise comparators case still optimal?
Should we use imprecise comparators?
Joint work with: Yury Polyanskiy, Gregory Wornell
Acknowledgment: Frank Yaul, Anantha Chandrakasan
5 / 49
Reliable Analog-to-Digital Converter Design
with imprecise comparators
A theoretical framework
What are the fundamental performance limits?
Is optimal design for the precise comparators case still optimal?
Should we use imprecise comparators?
Joint work with: Yury Polyanskiy, Gregory Wornell
Acknowledgment: Frank Yaul, Anantha Chandrakasan5 / 49
Analog-to-Digital Converter (ADC)
ADCx 010→ x
vLSB
c1
c2
c3
c4
c5
c6
c7
c8
000
001
010
011
100
101
110
111
ADC Code
v1 v2 v3 v4 v5 v6 v7vlo = v0 v8 = vhi
vin
2b reconstructionvaluesn = 2b − 1referencevoltages
6 / 49
Analog-to-Digital Converter (ADC)
ADCx 010→ x
c1 c2 c3 c4 c5 c6 c7 c8
reconstruction values
v1 v2 v3 v4 v5 v6 v7
reference voltages
vlo = v0 v8 = vhi
2b reconstructionvaluesn = 2b − 1referencevoltages
6 / 49
ADC and its key building block: comparator
c1 c2 · · · cn cn+1
v1 v2 · · · vn
Comparator
vin
vrefyout
yout =
{1 vin > vref
0 vin ≤ vref
The Flash ADC architecture
b
Enco
der
vin
......
...vn
yn
v2y2
v1y1
n = 2b − 1
7 / 49
The imprecise comparator due to process variation
vin
vref
Vin
Vref
+
+
Zref ∼ N(0, σ2
2)
Zin ∼ N(0, σ2
1)
Yout
Zin and Zref:offsets due to process variationvariation↗ as comparator size↘independent, zero-meanGaussian distributed [Kinget 2005,Nuzzo 2008]
Note:fixed after fabricationrandomness: over a collection ofcomparatorsaggregate variation:
Z = Zref − Zin ∼ N(0, σ2)
8 / 49
Reference voltages impacts ADC performance
2-bit ADC
v1 v2 v3
x
x
error function:
e(x) = x− x
9 / 49
Reference voltages impacts ADC performance
2-bit ADC
v1 v2 v3
x
x
error function:
e(x) = x− x
9 / 49
A call for mathematical framework
Existing theoretical error analysis (e.g., [Lundin 2005])
assumes small process variationdoes not attempt to change the design
ADC design with imprecise comparators
Practice ADC with redundancy [Flynn et al., 2003]
ADC with redundancy, calibration andreconfiguration [Daly et al., 2008]
Theory Little prior work
Related: scalar quantizer with random thresholds foruniform input [Goyal 2011]
10 / 49
A call for mathematical framework
Existing theoretical error analysis (e.g., [Lundin 2005])
assumes small process variationdoes not attempt to change the design
ADC design with imprecise comparators
Practice ADC with redundancy [Flynn et al., 2003]
ADC with redundancy, calibration andreconfiguration [Daly et al., 2008]
Theory Little prior work
Related: scalar quantizer with random thresholds foruniform input [Goyal 2011]
10 / 49
System model: ADC with redundancy and calibration
b-bit ADC
classical
comparatorbank
x
designed references vn
comparisonoutcomes yn encoder
x ∈{c1, . . . , c2b}
n = 2b − 1comparators
r: redundancy factor
11 / 49
System model: ADC with redundancy and calibration
b-bit ADC
with redundancy
comparatorbank
x comparisonoutcomes Yn
actual references Vn
fabrication
designed references vn
Vi = vi + Zi
Zii.i.d.∼ N
(0, σ2
)
i = 1, 2, . . . , n
encoderx ∈{c1, . . . , c2b}
n = r · (2b − 1)comparators
r: redundancy factor
11 / 49
System model: ADC with redundancy and calibration
b-bit ADC
with redundancy and calibration
comparatorbank
x comparisonoutcomes Yn
actual references Vn calibrationVn
fabrication
designed references vn
Vi = vi + Zi
Zii.i.d.∼ N
(0, σ2
)
i = 1, 2, . . . , n
encoderx ∈{c1, . . . , c2b}
n = r · (2b − 1)comparators
r: redundancy factor
11 / 49
Performance measures of ADC
v1 v2 v3
x
x
error function e(x) = x− x
mean-square error
MSE = EX[e(X)2]
maximum quantization error
emax = maxx|e(x)|
vn Vnanalyze
performanceMSE, emax
Is uniform v1, v2, . . . , vnstill optimal for uniforminput?
Is scaling down the sizeof comparators actuallybeneficial?
12 / 49
Challenge: randomness in reference voltages
What we design:
v0 v1 v2 v3 v4
After fabrication:
v0 v4v1 v2v3
v0 v4v1v2 v3
or . . .
ObservationsOrdering may change → order statisticsRandom interval sizes → ?
intervalsizes
density ofreferences
high resolutionapproximation
13 / 49
Challenge: randomness in reference voltages
What we design:
v0 v1 v2 v3 v4
After fabrication:
v0 v4v1 v2v3
v0 v4v1v2 v3
or . . .
ObservationsOrdering may change → order statisticsRandom interval sizes → ?
intervalsizes
density ofreferences
high resolutionapproximation
13 / 49
Challenge: randomness in reference voltages
What we design:
v0 v1 v2 v3 v4
After fabrication:
v0 v4v1 v2v3
v0 v4v1v2 v3
or . . .
ObservationsOrdering may change → order statisticsRandom interval sizes → ?
intervalsizes
density ofreferences
high resolutionapproximation
13 / 49
Challenge: randomness in reference voltages
What we design:
v0 v1 v2 v3 v4
After fabrication:
v0 v4v1 v2v3
v0 v4v1v2 v3
or . . .
ObservationsOrdering may change → order statisticsRandom interval sizes → ?
intervalsizes
density ofreferences
high resolutionapproximation
13 / 49
High resolution approximation
Assume n→ ∞Represent vn by point density functions τ(x)
τ(x) dx ≈ number of vn in [x, x + dx]n
v1v2 vnx x + dx
τ(·)
Vn: point density functions λ(x)
λ(x) dx ≈ E[number of Vn in [x, x + dx]
]
n
Point density function simplifies analysis!
14 / 49
High resolution approximation
Assume n→ ∞Represent vn by point density functions τ(x)
τ(x) dx ≈ number of vn in [x, x + dx]n
v1v2 vnx x + dx
τ(·)
Vn: point density functions λ(x)
λ(x) dx ≈ E[number of Vn in [x, x + dx]
]
n
Point density function simplifies analysis!14 / 49
Point density function guides references design
referencevoltages vn
point densityfunction τ(·)
high res. approx.
given n, “sample” τ(·)
Examples
τ ∼ Unif ([−1, 1])vn: n-point uniform grid on [-1, 1]
−a +a0
0.5
τ(x)
15 / 49
Point density function guides references design
referencevoltages vn
point densityfunction τ(·)
high res. approx.
given n, “sample” τ(·)
Examples
τ(x) = 0.5 · δ(x− a) + 0.5 · δ(x + a)vn:
I n/2 reference voltages at +aI n/2 reference voltages at −a −a +a
0
0.5
τ(x)
15 / 49
With process variation, fabricated references matters
τ(·) vn
λ(·) Vn
what we can design
what determinesperformance
Performance characterization in λ(·)Want to find the optimal τ(·)
high res. approx.
“sample”
high res. approx.
sample
Vi = vi + Zi
Zii.i.d.∼ φ(·) ∼ N
(0, σ2)
[W., Polyanskiy &Wornell, ISIT’14]:
λ(x) = (τ ∗ φ)(x)
(∗: convolution)
−1 −0.5 0 0.5 10
0.2
0.4
0.6
τ(x)
λ(x)
16 / 49
With process variation, fabricated references matters
τ(·) vn
λ(·) Vn
what we can design
what determinesperformance
Performance characterization in λ(·)Want to find the optimal τ(·)
high res. approx.
“sample”
high res. approx.
sample
Vi = vi + Zi
Zii.i.d.∼ φ(·) ∼ N
(0, σ2)
[W., Polyanskiy &Wornell, ISIT’14]:
λ(x) = (τ ∗ φ)(x)
(∗: convolution)
−1 −0.5 0 0.5 10
0.2
0.4
0.6
τ(x)
λ(x)
16 / 49
Process variation increases MSE 6-fold
MSE = EX[e(X)2]Input X ∼ fX(·),
classical case [Bennett 1948, Panter & Dite 1951]
MSE ' 112n2
∫ fX(x)λ2(x)
dx λ = τ
with process variations [W., Polyanskiy & Wornell, ISIT’14]
MSE ' 12n2
∫ fX(x)λ2(x)
dx λ = τ ∗ φ
Why 6 times?
uniform grid vs. random division of an interval(a topic in order statistics)
Optimal τ
a necessary and sufficient condition [W., Polyanskiy & Wornell, ISIT’14]17 / 49
MSE-optimal designs can be quite differentUniform input distribution [W., Polyanskiy & Wornell, ISIT’14]
fX ∼ Unif ([−1, 1])
σ0 ≈ 0.7228
σ < σ0
locally optimal iterativeoptimization⇒ τ∗(x)
σ ≥ σ0
the necessary and sufficientcondition⇒ τ∗(x) = δ(x)
σ = 0.1
−1 −0.5 0 0.5 10
0.2
0.4
0.6
τ∗(x)
λ∗(x)
18 / 49
MSE-optimal designs can be quite differentUniform input distribution [W., Polyanskiy & Wornell, ISIT’14]
fX ∼ Unif ([−1, 1])
σ0 ≈ 0.7228
σ < σ0
locally optimal iterativeoptimization⇒ τ∗(x)
σ ≥ σ0
the necessary and sufficientcondition⇒ τ∗(x) = δ(x)
σ = 0.2
−1 −0.5 0 0.5 10
0.2
0.4
0.6
τ∗(x)
λ∗(x)
18 / 49
MSE-optimal designs can be quite differentUniform input distribution [W., Polyanskiy & Wornell, ISIT’14]
fX ∼ Unif ([−1, 1])
σ0 ≈ 0.7228
σ < σ0
locally optimal iterativeoptimization⇒ τ∗(x)
σ ≥ σ0
the necessary and sufficientcondition⇒ τ∗(x) = δ(x)
σ = 0.3
−1 −0.5 0 0.5 10
0.2
0.4
0.6
τ∗(x)
λ∗(x)
18 / 49
MSE-optimal designs can be quite differentUniform input distribution [W., Polyanskiy & Wornell, ISIT’14]
fX ∼ Unif ([−1, 1])
σ0 ≈ 0.7228
σ < σ0
locally optimal iterativeoptimization⇒ τ∗(x)
σ ≥ σ0
the necessary and sufficientcondition⇒ τ∗(x) = δ(x)
σ = 0.4
−1 −0.5 0 0.5 10
0.2
0.4
0.6
τ∗(x)
λ∗(x)
18 / 49
MSE-optimal designs can be quite differentUniform input distribution [W., Polyanskiy & Wornell, ISIT’14]
fX ∼ Unif ([−1, 1])
σ0 ≈ 0.7228
σ < σ0
locally optimal iterativeoptimization⇒ τ∗(x)
σ ≥ σ0
the necessary and sufficientcondition⇒ τ∗(x) = δ(x)
σ = 0.5
−1 −0.5 0 0.5 10
0.2
0.4
0.6
τ∗(x)
λ∗(x)
18 / 49
MSE-optimal designs can be quite differentUniform input distribution [W., Polyanskiy & Wornell, ISIT’14]
fX ∼ Unif ([−1, 1])
σ0 ≈ 0.7228
σ < σ0
locally optimal iterativeoptimization⇒ τ∗(x)
σ ≥ σ0
the necessary and sufficientcondition⇒ τ∗(x) = δ(x)
σ = 0.6
−1 −0.5 0 0.5 10
0.2
0.4
0.6
τ∗(x)
λ∗(x)
18 / 49
MSE-optimal designs can be quite differentUniform input distribution [W., Polyanskiy & Wornell, ISIT’14]
fX ∼ Unif ([−1, 1])
σ0 ≈ 0.7228
σ < σ0
locally optimal iterativeoptimization⇒ τ∗(x)
σ ≥ σ0
the necessary and sufficientcondition⇒ τ∗(x) = δ(x)
σ = 0.7
−1 −0.5 0 0.5 10
0.2
0.4
0.6
τ∗(x)
λ∗(x)
18 / 49
MSE-optimal designs can be quite differentUniform input distribution [W., Polyanskiy & Wornell, ISIT’14]
fX ∼ Unif ([−1, 1])
σ0 ≈ 0.7228
σ < σ0
locally optimal iterativeoptimization⇒ τ∗(x)
σ ≥ σ0
the necessary and sufficientcondition⇒ τ∗(x) = δ(x)
σ = 0.8
−1 −0.5 0 0.5 10
0.2
0.4
0.6
0.8
1
τ∗(x)
λ∗(x)
18 / 49
Analyzing maximum quantization error
emax = maxx |e(x)|High resolution approximation for
c.d.f. P [emax ≤ x]Range [a, b] P [emax ∈ [a, b]] ≈ 1
Accurate for finite n (e.g., when n ≥ 100)emax often measured in
LSB ,input range
2b
19 / 49
MSE-optimal designs improve yield
5%–10% for 6-bit FlashADC Yield , P [emax ≤ ∆max]
e.g. ∆max = 1LSB
0 0.2 0.4 0.6 0.8 1 1.20
0.2
0.4
0.6
0.8
1
σ
yield
r = 8, MSE-optimalr = 8, uniformr = 6, MSE-optimalr = 6, uniform
20 / 49
Scaling down the size of comparators is beneficial
For circuit fabrication [Kinget 2005, Nuzzo 2008],
process variation σ2 ∝1
component area
Given a fixed silicon area,
# components n ∝1
component area
Uniform input distribution, when σ ≥ σ0,
MSE ≈ 2πσ2/n2
emaxd.= Θ
(√π/2σ
bn
) σ2 ∝ n====⇒
MSE = Θ (1/n)
emax = Θ(
b√n
)
Building an ADC with more smaller but less precise comparatorsimproves accuracy!
21 / 49
Scaling down the size of comparators is beneficial
For circuit fabrication [Kinget 2005, Nuzzo 2008],
process variation σ2 ∝1
component area
Given a fixed silicon area,
# components n ∝1
component area
Uniform input distribution, when σ ≥ σ0,
MSE ≈ 2πσ2/n2
emaxd.= Θ
(√π/2σ
bn
) σ2 ∝ n====⇒
MSE = Θ (1/n)
emax = Θ(
b√n
)
Building an ADC with more smaller but less precise comparatorsimproves accuracy!
21 / 49
Computing with unreliable resources: circuit design
mathematicalabstraction
performancetrade-offs
designguidance
redundancyvs.
yield
distributionof fabrication
variation
asymptoticanalysis
quantizationtheory, order
statistics
22 / 49
Scheduling parallel taskswith variable response times
Executing parallel tasks: simple yet important
Executing parallel tasks
“Map” stage of MapReducedistributed parallel algorithms
I ADMM, MCMC, . . .time series analysis
I yearly data→ weekly data
crowd sourcing. . .
task 1
task 2
task 3
task n
... ...
In commonGiven:
a large collection of tasks that can be run in parallel
Want:results for all tasks
But this could be slow!
23 / 49
Executing parallel tasks: simple yet important
Executing parallel tasks
“Map” stage of MapReducedistributed parallel algorithms
I ADMM, MCMC, . . .time series analysis
I yearly data→ weekly data
crowd sourcing. . .
task 1
task 2
task 3
task n
... ...
In commonGiven:
a large collection of tasks that can be run in parallel
Want:results for all tasks
But this could be slow!23 / 49
Issue: variability in data centers
ObservationsThe response time of acomputer in a data centervaries
Latency: determined by theslowest machine
I worse as we get moremachines
In Google’s data center
Shared machine [Dean 2012]
How to reduce latency?
24 / 49
Issue: variability in data centers
ObservationsThe response time of acomputer in a data centervaries
Latency: determined by theslowest machine
I worse as we get moremachines
In Google’s data center
Execution times [Dean 2013]for a large collection of tasks
median completion timefor one: 1msmedian completion timefor all: 40ms
How to reduce latency?
24 / 49
Backup tasks in Google MapReduceDean et al., 2008
The backup task option
1 Run n tasks on n machines in parallel2 Replicate: when 10% of the tasks left3 Take the earliest results
EffectivenessReduce latency significantly (e.g., 1/3 for distributed sort)Handles the issue of “stragglers”
25 / 49
A call for theoretical analysis
Considerable follow-up work in systems
1 [Zaharia et al., USENIX OSDI 2008]: adopted by Facebook2 [Ananthanarayanan et al., USENIX OSDI 2010]: adopted by Microsoft
Bing3 [Ananthanarayanan et al., USENIX NSDI 2013]
Existing theoretical work inadequate
Stochastic scheduling: task replication not consideredNeed to understand
I latency reduction vs. additional resource usageI when and how replication could be beneficial
26 / 49
Scheduling problem formulation
ProblemExecuting a collection of n parallel tasks.
n: hundreds or thousands [Reiss et al., 2012]
System model
Execution time of each task i.i.d.∼ FX.Scheduling actions
I Send a task to a new machine to runI Terminate all machines running a certain task
Feedback: instantaneous feedback upon completion
27 / 49
Latency
The j-th copy of task i{
launched at time ti,j
execution time Xi,ji.i.d.∼ FX
Completion time for task i:
Ti , minj(ti,j + Xi,j)
Latency:
T , maxi
Ti
N1
N2
N3
N4
X1,1
X1,2
X2,1
X2,2
T1 = 8 T2 = 100 t1,2 = 2 t2,2 = 5
t
T = max {T1, T2} = 10
28 / 49
Total machine time as cost measure
The j-th copy of task i{
launched at time ti,j
execution time Xi,ji.i.d.∼ FX
In data centersTotal machine time
C ,n
∑i=1
ri+1
∑j=1
∣∣Ti − ti,j∣∣+
N1
N2
N3
N4
X1,1
X1,2
X2,1
X2,2
T1 = 8 T2 = 100 t1,2 = 2 t2,2 = 5
t
Note: We focus on expected values E [T] and E [C]
29 / 49
Replication helps!
X =
{2 w.p. 0.97 w.p. 0.1
No replication:
E [T] = 2.5E [C] = E [T] = 2.5
2
0.9
7
0.1
0 t
Replicate task at t2 = 2:
E [T] = 2.23E [C] = 2.46
2
0.8
4
0.1 × 0.9
7
0.1 × 0.1
0 t
30 / 49
Execution time modeling
Discrete random variables [W., Joshi & Wornell, SIGMETRICS’14]
Arise directly from estimation (quantiles)Offers more flexible modeling
Continuous random variablesPareto, ExponentialAnalysis: an important class of policies
31 / 49
Execution time modeling
Discrete random variables [W., Joshi & Wornell, SIGMETRICS’14]
Arise directly from estimation (quantiles)Offers more flexible modeling
in thesis
Continuous random variablesPareto, ExponentialAnalysis: an important class of policies
this talk X
31 / 49
Single-fork policy
Run n tasks in parallel initiallyWhen there is p fraction of the tasks left, replicate the unfinishedtasks r times
I Let all the unfinished tasks keep runningI Relaunch all the unfinished tasks
Without relaunching
task koriginal copy
replica 1... ...replica r
“fork”
With relaunching
task k
new replica
replica 1... ...replica r
“fork”
Note:after replication, r + 1 tasks in totalwhen r = 0: only the relaunching case is interesting
32 / 49
Single-fork policy
Run n tasks in parallel initiallyWhen there is p fraction of the tasks left, replicate the unfinishedtasks r times
I Let all the unfinished tasks keep runningI Relaunch all the unfinished tasks
Without relaunching
task koriginal copy
replica 1... ...replica r
“fork”
With relaunching
task k
new replica
replica 1... ...replica r
“fork”
Note:after replication, r + 1 tasks in totalwhen r = 0: only the relaunching case is interesting
32 / 49
Latency analysis
Order statisticsGiven random variables X1, X2, . . . , Xn
Xmin = X1:n ≤ X2:n ≤ . . . ≤ Xn:n = Xmax
Latency
Replicated tasks: new execution time distribution FYI FY = g(FX , r, relaunch or not)
Replicate for the last p fraction of the tasks:
T = X(1−p)n:n + Ypn:pn
before forking after forking
33 / 49
Latency analysis
Order statisticsGiven random variables X1, X2, . . . , Xn
Xmin = X1:n ≤ X2:n ≤ . . . ≤ Xn:n = Xmax
Latency
Replicated tasks: new execution time distribution FYI FY = g(FX , r, relaunch or not)
Replicate for the last p fraction of the tasks:
T = X(1−p)n:n + Ypn:pn
before forking after forking
33 / 49
Central value theorem
p-quantile of adistribution FX
xp , F−1X (p)
xp
p
x
FX
As n→ ∞,
Xpn:n → N
(xp,
1n
p(1− p)f 2X(xp)
)
Time before forking:
E[
X(1−p)n:n
]= x1−p , F−1
X (1− p)
34 / 49
Extreme value theorem
Theorem (Fisher-Tippett-Gnedenko theorem)
Given X1, X2, . . . , Xni.i.d.∼ FX , if there exist sequences of constants an > 0 and
bn ∈ R such that
P
[Xn:n − bn
an≤ z]→ G(z)
as n→ ∞ and G is a non-degenerate distribution, then G belongs to one of thefollowing families:
Gumbel law G(z) = exp {− exp (−z)} for z ∈ R,
Frechet law G(z) =
{0 z ≤ 0
exp {−z−α} z > 0,
Weibull law G(z) =
{exp
{− (−z)α} z < 0
1 z ≥ 0,
where α > 0.35 / 49
Extreme value theorem
Theorem (Fisher-Tippett-Gnedenko theorem)
Given X1, X2, . . . , Xni.i.d.∼ FX , if there exist sequences of constants an > 0 and
bn ∈ R such that
P
[Xn:n − bn
an≤ z]→ G(z)
as n→ ∞ and G is a non-degenerate distribution, then G belongs to one of thefollowing families:
Gumbel law G(z) = exp {− exp (−z)} for z ∈ R,
Frechet law G(z) =
{0 z ≤ 0
exp {−z−α} z > 0,
Weibull law G(z) =
{exp
{− (−z)α} z < 0
1 z ≥ 0,
where α > 0.
Given FX, the distribution of Xn:n can be character-ized as n→ ∞.
Key: tail behavior of 1− FX
Also applicable to X1:n
35 / 49
Single-fork analysis
executiontime beforeforking: FX
Xn:n → GX
X1:r → G′X
time before forking:X(1−p)n:n →F−1
X (1− p)
executiontime afterforking: FY
time after fork-ing: Ypn:pn → GY
central value theorem
extreme value theorem
Y = g(FX, r, relaunch or not)
extreme value theorem
T = X(1−p)n:n + Ypn:pn
Cost can be analyzed similarly.36 / 49
Execution time: Pareto distribution
Pareto (α, xm).
FX(x; α, xm) ,
{1−
( xmx
)α x ≥ xm
0 x < xm
heavy-tail distributionobserved in data centers [Reisset al., 2012].
0 2 4 6 80
0.5
1
x
fPareto(2,2)(x)
Extremes
X1:n ∼ Pareto (nα, xm)
Xn:n
xmn1/α∼ Frechet
E [Xn:n] = xmn1/αΓ (1− 1/α) ∝ n1/α ·E [X]
37 / 49
Recall: single-fork policy
Parametersnumber of tasks: nfraction to replicate: padditional replicas: r
Without relaunching
task koriginal copy
replica 1... ...replica r
“fork”
With relaunching
task k
new replica
replica 1... ...replica r
“fork”
38 / 49
Asymptotic characterization accurate in finite regime
X ∼ Pareto (2, 2) and replication fraction p = 0.2
200 400 600 800 1,000
8
10
12
14
16
r = 1 no relaunch
r = 1 & relaunch
r = 2 no relaunch
r = 2 & relaunch
n
Latency
39 / 49
Latency & Cost vs. replication fraction p
X ∼ Pareto (2, 2) with n = 400
0 0.1 0.2 0.3 0.4 0.5 0.60
20
40
60
p
E [T]r = 0 & relaunchr = 1 & relaunchr = 1 no relaunchr = 2 & relaunchr = 2 no relaunch
40 / 49
Latency & Cost vs. replication fraction p
X ∼ Pareto (2, 2) with n = 400
0 0.1 0.2 0.3 0.4 0.5 0.63
4
5
6
7
p
E [Ccloud]
r = 0 & relaunchr = 1 & relaunchr = 1 no relaunchr = 2 & relaunchr = 2 no relaunch
40 / 49
Latency-cost trade-off
X ∼ Pareto (2, 2) with n = 400
3 4 5 6 7 8 90
10
20
30
40
50
60
70
Cost
Latency
r = 0 & relaunch
41 / 49
Latency-cost trade-off
X ∼ Pareto (2, 2) with n = 400
3 4 5 6 7 8 90
10
20
30
40
50
60
70
Cost
Latency
r = 0 & relaunchr = 1 & relaunchr = 1 no relaunch
41 / 49
Latency-cost trade-off
X ∼ Pareto (2, 2) with n = 400
3 4 5 6 7 8 90
10
20
30
40
50
60
70
Cost
Latency
r = 0 & relaunchr = 1 & relaunchr = 1 no relaunchr = 2 & relaunchr = 2 no relaunch
41 / 49
Latency-cost trade-off
X ∼ Pareto (2, 2) with n = 400
3 4 5 6 7 8 90
10
20
30
40
50
60
70
Cost
Latency
r = 0 & relaunchr = 1 & relaunchr = 1 no relaunchr = 2 & relaunchr = 2 no relaunch
3 4 5 6 7 8 90
10
20
30
40
50
60
70
Cost
Latency
r = 0 & relaunchr = 1 & relaunchr = 1 no relaunchr = 2 & relaunchr = 2 no relaunch
41 / 49
Recap
A framework for analyzing single-fork policiesI Can be extended to multi-fork
Applicable to any distributions that can be analyzed via EVTI Pareto, Exponential, Erlang, etc..
log files executiontime FX
latency-cost
trade-off
replicationstrategy
densityestimation
our frameworkapplication
scenario
42 / 49
Computing with unreliable resources: scheduling
mathematicalabstraction
performancetrade-offs
designguidance
latencyvs.
cost
tail behavior ofexecution time
distribution
asymptoticanalysis
order statistics,extreme
value theory
43 / 49
Crowd-based ranking vianoisy comparisons
Crowd-based ranking
Rank by score assignment
Examples: IMDb.com, Yelp.com
Rank by pairwise comparison
Examples: admission/recruiting, knockout stage of tournamentsChallenges:
I human comparisons are noisy: answer flipped w.p. εI human comparisons are expensive (economic cost, time, . . . )
What is the fundamental trade-off betweenranking accuracy and the number of comparisons?
approximate sorting via (noisy) comparisons
44 / 49
Information-theoretic lower bounds on #comparisons
Distortion measure`1 distance of permutations d`1 (π1, π2) ,
n
∑i=1|π1(i)− π2(i)|
To achieve distortion D ≤ Θ(n1+δ
)
Approximate sorting with noiseless comparisonsI [W., Mazumdar & Wornell, ISIT’14]: at least
(1− δ)n log n comparisons
I Tight: the multiple selection algorithm in [Kaligosi 2005] achievesthis bound
Approximate sorting with noisy comparisonsI [W., Mazumdar & Wornell, ISIT’14]: at least
(1− δ)
1− Hb (ε)n log n comparisons
I Existing algorithms only known to be O (n log n)
45 / 49
Information-theoretic lower bounds on #comparisons
Distortion measure`1 distance of permutations d`1 (π1, π2) ,
n
∑i=1|π1(i)− π2(i)|
To achieve distortion D ≤ Θ(n1+δ
)
Approximate sorting with noiseless comparisonsI [W., Mazumdar & Wornell, ISIT’14]: at least
(1− δ)n log n comparisons
I Tight: the multiple selection algorithm in [Kaligosi 2005] achievesthis bound
Approximate sorting with noisy comparisonsI [W., Mazumdar & Wornell, ISIT’14]: at least
(1− δ)
1− Hb (ε)n log n comparisons
I Existing algorithms only known to be O (n log n)45 / 49
More results
More distortion measures [W., Mazumdar & Wornell, ISIT’14]
Kendall tau distance, Chebyshev distance, . . .Relationships among distortion measures
Other distributional modelMallows distributional model
46 / 49
Computing with unreliable resources: ranking
mathematicalabstraction
performancetrade-offs
designguidance
accuracyvs.
#comparisons
error proba-bility of noisycomparisons
asymptoticanalysis
rate-distortiontheory,
combinatorics
47 / 49
Computing with unreliable resources
mathematicalabstraction
performancetrade-offs
designguidance
reliabilityvs.
resource usage
statisticalproperties ofunreliability
asymptoticanalysis
suitable forlarge scalesystems!
A unified “coding theory” for these computing problems?
A journey of a thousand miles begins with a single step.— Lao Tzu
48 / 49
Computing with unreliable resources
mathematicalabstraction
performancetrade-offs
designguidance
reliabilityvs.
resource usage
statisticalproperties ofunreliability
asymptoticanalysis
suitable forlarge scalesystems!
A unified “coding theory” for these computing problems?
A journey of a thousand miles begins with a single step.— Lao Tzu
48 / 49
Computing with unreliable resources
mathematicalabstraction
performancetrade-offs
designguidance
reliabilityvs.
resource usage
statisticalproperties ofunreliability
asymptoticanalysis
suitable forlarge scalesystems!
A unified “coding theory” for these computing problems?
A journey of a thousand miles begins with a single step.— Lao Tzu
48 / 49
This thesis is impossible without . . .
Thesis committee: Greg, Yury and Devavrat
Collaborators: Gauri Joshi, Arya Mazumdar
The community: SiA, RLE, LIDS, CSAIL, MTL, EECS, GSA, GSC,MIT, . . .
I friendsI colleaguesI constructive criticisersI poker opponentsI . . .
My parents
Hengni
Thank you!
49 / 49
This thesis is impossible without . . .
Thesis committee: Greg, Yury and Devavrat
Collaborators: Gauri Joshi, Arya Mazumdar
The community: SiA, RLE, LIDS, CSAIL, MTL, EECS, GSA, GSC,MIT, . . .
I friendsI colleaguesI constructive criticisersI poker opponentsI . . .
My parents
Hengni
Thank you!
49 / 49
This thesis is impossible without . . .
Thesis committee: Greg, Yury and Devavrat
Collaborators: Gauri Joshi, Arya Mazumdar
The community: SiA, RLE, LIDS, CSAIL, MTL, EECS, GSA, GSC,MIT, . . .
I friendsI colleaguesI constructive criticisersI poker opponentsI . . .
My parents
Hengni
Thank you!
49 / 49
This thesis is impossible without . . .
Thesis committee: Greg, Yury and Devavrat
Collaborators: Gauri Joshi, Arya Mazumdar
The community: SiA, RLE, LIDS, CSAIL, MTL, EECS, GSA, GSC,MIT, . . .
I friends
I colleaguesI constructive criticisersI poker opponentsI . . .
My parents
Hengni
Thank you!
49 / 49
This thesis is impossible without . . .
Thesis committee: Greg, Yury and Devavrat
Collaborators: Gauri Joshi, Arya Mazumdar
The community: SiA, RLE, LIDS, CSAIL, MTL, EECS, GSA, GSC,MIT, . . .
I friendsI colleagues
I constructive criticisersI poker opponentsI . . .
My parents
Hengni
Thank you!
49 / 49
This thesis is impossible without . . .
Thesis committee: Greg, Yury and Devavrat
Collaborators: Gauri Joshi, Arya Mazumdar
The community: SiA, RLE, LIDS, CSAIL, MTL, EECS, GSA, GSC,MIT, . . .
I friendsI colleaguesI constructive criticisers
I poker opponentsI . . .
My parents
Hengni
Thank you!
49 / 49
This thesis is impossible without . . .
Thesis committee: Greg, Yury and Devavrat
Collaborators: Gauri Joshi, Arya Mazumdar
The community: SiA, RLE, LIDS, CSAIL, MTL, EECS, GSA, GSC,MIT, . . .
I friendsI colleaguesI constructive criticisersI poker opponents
I . . .
My parents
Hengni
Thank you!
49 / 49
This thesis is impossible without . . .
Thesis committee: Greg, Yury and Devavrat
Collaborators: Gauri Joshi, Arya Mazumdar
The community: SiA, RLE, LIDS, CSAIL, MTL, EECS, GSA, GSC,MIT, . . .
I friendsI colleaguesI constructive criticisersI poker opponentsI . . .
My parents
Hengni
Thank you!
49 / 49
This thesis is impossible without . . .
Thesis committee: Greg, Yury and Devavrat
Collaborators: Gauri Joshi, Arya Mazumdar
The community: SiA, RLE, LIDS, CSAIL, MTL, EECS, GSA, GSC,MIT, . . .
I friendsI colleaguesI constructive criticisersI poker opponentsI . . .
My parents
Hengni
Thank you!
49 / 49
This thesis is impossible without . . .
Thesis committee: Greg, Yury and Devavrat
Collaborators: Gauri Joshi, Arya Mazumdar
The community: SiA, RLE, LIDS, CSAIL, MTL, EECS, GSA, GSC,MIT, . . .
I friendsI colleaguesI constructive criticisersI poker opponentsI . . .
My parents
Hengni
Thank you!
49 / 49
This thesis is impossible without . . .
Thesis committee: Greg, Yury and Devavrat
Collaborators: Gauri Joshi, Arya Mazumdar
The community: SiA, RLE, LIDS, CSAIL, MTL, EECS, GSA, GSC,MIT, . . .
I friendsI colleaguesI constructive criticisersI poker opponentsI . . .
My parents
Hengni
Thank you!
49 / 49