Upload
duane-park
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
1
A paper by Yi-Bing LinIEEE Transactions on Mobile Computing
Vol. 4, No. 2, March/April ’05
Presented by Derek Pennington
Per-User CheckpointingFor Mobility Database Failure Restoration
2
In GPRS & UMTS networks, the Home Location Register (HLR)maintains the central database of user information. For a given user,an HLR record might contain information such as…
• Mobile Station (MS) Information– telephone number– International Mobile Subscriber Identity
• Service Information– subscription info– service restrictions– supplementary services
• Location Information– address of Serving GPRS Support Node (SGSN)
3
But what happens in the event of an HLR failure?
• Luckily, we periodically backup all of this user data (each backup is called a “checkpoint”).
• However, the paper’s author argues that the established backup practices have room for improvement.
4
Various approaches to checkpointing:
• All-record checkpoint– backup all users at once (eg: midnight)– costly (bottleneck effect)
• Per-user checkpoint– each user has its own timing mechanism for backups
• The paper’s author discusses the existing per-user checkpointing algorithm (henceforth referred to as “Algorithm 1”), and then proposes a new, improved one (“Algorithm 2”)
5
Introduction
2) Algorithm 1 vs. Algorithm 2
3) Modeling the Algorithms (the math part)
4) Performance Evaluation of Algorithms 1 & 2
5) Conclusions / Comments
Order of Presentation Coverage
6
• checkpoints happen at random intervals (tc)
• a checkpoint may occur whether or not any user registrations have taken place
• In the event of an HLR failure, if the user updated the HLR database, but that update didn’t get backed up, the record becomes obsolete
• When the user’s record is obsolete, the user will lose calls until he performs a registration with the HLR.
Algorithm 1
7
• checkpoint timers are scheduled for random intervals (tc)
• However, checkpoints will only take place when BOTH of the following are true:
• tp timer expires• a registration has taken place
• Like Algorithm 1, user will lose calls if his record(s) is/are obsolete
Algorithm 2
a checkpoint occurs whenever we return to State 0
8
For each scenario, will the user’s record(s) be
valid after the HLR recovers from its failure?#
4
3
2
1
Scenario
NONO
YESNO
YESYES
YESYES
Algorithm 2Algorithm 1
CP timer fires =
registration =
failure =
LEGEND
RECAP
9
Introduction
Algorithm 1 vs. Algorithm 2
3) Modeling the Algorithms (the math part)
4) Performance Evaluation of Algorithms 1 & 2
5) Conclusions / Comments
Order of Presentation Coverage
10
Two metrics are used to measure checkpoint algorithmperformance:
• E[tc]: the expected checkpoint interval– the larger the interval, the less frequent checkpoints will occur– essentially, checkpoint cost is proportional to checkpoint
frequency
: the probability that the user’s HLR record is obsolete after an HLR failure/recovery– the smaller “” is, the better the checkpoint algorithm’s
performance
11
Setting the checkpoint timer tp:
• typical approaches have a fixed tp
• However, this can lead to congestion with large numbers of users
• Thus, in Algorithms 1 & 2, tp is a random variable with exponential distribution
• Density function:
• …and, in Algorithm 1, since tc = tp from checkpoint to checkpoint, the expected checkpoint interval is:
ptpp etf
1
pcI tEtE
checkpoints per unit time
time between checkpoints
12
tm – m
m
tp – p
p
= “residual time” of tm
= “reverse residual time” of tm
= residual time of tp
= reverse residual time of tp
Finding for Algorithm 1:
13
Consider random variable t:
• probability density function: f(t)
• probability distribution function:
• expected value: E[t]
• Laplace transform:
Let be the residual time of t:
• probability density function:
• probability distribution function:
• Laplace transform:
t
ydyyftF
0)()(
Finding for Algorithm 1 (cont’d):
0
*
t
stdtetfsf
stE
sfsr
** 1
tE
Fr
1
0)()(
ydyyrR
14
• Also, the density function is the same:
• In Algorithm 1, we say the backup record is obsolete if, at the moment of HLR failure, the time since the last checkpoint is greater than the time since the last registration
• In other words:
tpp etftr
Finding for Algorithm 1 (cont’d):
mcI Pr
mcmm ddermc
c
m
0
m
mm tE
fr
** 1
integrals of the two density functions
stE
sfsr
** 1
from r*(s) defined earlier
15
• One difference between Algorithm 2 and Algorithm 1 is that the checkpoint timer will be reset based on how the previous checkpoint took place
– If the previous checkpoint happened due to a timeout event, then the next checkpoint interval is:
– If the previous checkpoint happened due to a registration event, then the next checkpoint interval is:
– Thus, in our state machine example, we actually have two “State 0”s…
pmc tt ,max *
Finding E[t] and for Algorithm 2:
pmc ttt ,max
16
Checkpoint occurring due to timeout event
Checkpoint occurring due to registration
Probability that a timeout will occur after a timeout-caused checkpoint
Probability that a timeout will occur after a registration-caused checkpoint
Probability that a registration will occur after a timeout-caused checkpoint
Probability that a registration will occur after a registration-caused checkpoint
17
• The random variable tc is now essentially a combination of the probability that the last checkpoint happened due to a timeout and the probability that the last checkpoint happened due to a registration:
• …where:
(x is the probability of being in State “x”)
pmpmc ttptpt ,max,max 2*
1
0201
011
p1
0201
022 1 pp
and
18
remember that comes from
)(1
)(
)(1
)(
*
*
*
*
md
mc
mb
ma
rp
rp
fp
fp
)(* mr
• Therefore, we can say:
timeout-caused checkpoint
registration-caused checkpoint
mc Pr
19
• Based on the figure, we can deduce some limiting probabilities:
• …which means we know more about p1 and p2:
)()(1
)(**
*
1
mm
m
rf
fp
02012
02011
022
011
2102011
bd
ac
pp
pp
)()(1
)(1**
*
2
mm
m
rf
rp
and
20
• From…
• …the density function for tc is:
• …where:
• …thus:
c
c
c
c
tcmcm
cmt
tcmcm
cmt
etftfTERM4
tFeTERM3
etrtrTERM2
tReTERM1
)()(
)(
)()(
)(
TERM4TERM3pTERM2TERM1ptf cc 21)(
cccc tcmcmcm
ttcmcmcm
tcc etftftFepetrtrtReptf )()()()()()( 21
pmpmc ttptpt ,max,max 2*
1
21
• The relationships between tp, tm, and allow us to reinterpret fc(tc) into two different pieces:
– fc1(tc): the situation where tp > tm
– fc2(tc): the situation where tp < tm
• We can reexpress fc(tc) as:
• …where:
)*()*()( 212 TERM4pTERM2ptf cc
)()(1
)()()(1
)()(1
)()()(**
*
**
*
mm
tcmcmm
mm
tcmcmm
rf
etftfr
rf
etrtrf cc
)*()*()( 211 TERM3pTERM1ptf cc
)()(1
)()(1
)()(1
)()(**
*
**
*
mm
cmt
m
mm
cmt
m
rf
tFer
rf
tRef cc
c
c
c
c
tcmcm
cmt
tcmcm
cmt
etftfTERM4
tFeTERM3
etrtrTERM2
tReTERM1
)()(
)(
)()(
)(
)()( 21 cccccc tftftf
*m
22
0ctcccccII dttfttE
**
*2
**
*1
2211
1
1
1
**
mm
m
mm
m
rf
rA
rf
fA
ApAp
Expected checkpoint interval for Algorithm 2:
What are A1 and A2?......
integral of the density function
plug-in p1 and p2
23
0001
c
c
cc
c
t ct
cmct ccmct ccmt
c dtetrtdttrtdttRetA
mm
sm
ms
m
Er
ds
sdrE
ds
ssr
d
*
*
*
0002
c
c
cc
c
t ct
cmct ccmct ccmt
c dtetftdttftdttFetA
mm tE
f
*
24
][)(
][)(
][*
2
*
1 mm
mm
cII tEf
pEr
ptE
So we can also express the expected checkpoint interval for Algorithm 2 as:
plug-in A1 and A2
25
• To find the probability of getting an obsolete record, there is no close-form expression when arbitrary fm(tm) is used
• The paper uses a mix-Erlang density function
– proven as a good approximation to other functions as well as measured data
– …and, as a comparison, the regular Erlang density function:
j
i
ti
i
nmi
immmi
i
en
tqtf
1
1
!1
Now we need to find II…
mtn
mm e
n
ttnf
!1
,,1
26
• Continuing, we have the Erlang distribution function:
• …and the Laplace transform expressed as:
mt
n
j
jm
m ej
ttnF
1
0 !1,,
Now we need to find II (cont’d)
n
jmtjf
1
,,1
1
n
ssnf
,,*
27
• The reverse residual time m of tm has:
– density function:
– distribution function:
– Laplace transform:
• And, since E[tm]=n/, we can say the following:
mnr ,,
Now we need to find II (cont’d)
mnR ,,
snr ,,*
n
n
jmm
n
jmm
snssnr
jFn
nR
jfn
nr
1,,
,,1
,,
,,1
,,
*
1
1
28
• For Algorithm 2, consider the two scenarios:
– A registration happens before the timeout• Checkpoint happens at the time of the timeout
– A registration does NOT happen before the timeout• In this case, we wait until the next registration to checkpoint
• To derive II, we only need to consider the first case…
• …where:
ct
n
mc
tcc tnFeptmF
neptf cc ,,,,
12
111
Now we need to find II (cont’d)
c
n
mc tngptmg
n
p,,,,
12
1
tiFetig t ,,,,
i
kk
kt tkfe
1
1
,,
29
• Then, the density function for the reverse residual time corresponding to g(i,,t) is:
• If we say that c is the reverse residual time of tc, then the density function for c is:
i
k
k
jk
kt tjf
ketih
1 1
1
,,,,
Now we need to find II (cont’d)
c
n
mccc nhpmh
n
pr ,,,, 2
1
11
30
• Finally, we can derive II:
• …where……… (next slide)
Now we need to find II (cont’d)
mcII Pr
m mcmcccm ddrnr
1,,
nAnApmAmAn
p
ddnhnrpddmhnrn
p
n
m
mccm
n
mmccm
mcmmcm
4321
431
01
20
1 ,,,,,,,,
31
Now we need to find II (cont’d)
m
m
1
mcm ddemrmAmc
c
m
03 ,,
mc
k
jc
m
kk
k
m ddjfk
mrmAmcm
11
1
04 ,,,,
k
j
m
kk
k
jmBk 11
1
,
• …where:
mcmm ddjfmrjmBmcm
,,,,,0
mm
m
im
j
l
dlfifm m
,,,,1
10
1
32
Introduction
Algorithm 1 vs. Algorithm 2
Modeling the Algorithms (the math part)
4) Performance Evaluation of Algorithms 1 & 2
5) Conclusions / Comments
Order of Presentation Coverage
33
Algorithm 2: Checkpoint Freq.vs. Registration Freq.• as registration frequency increases,
so does checkpoint frequency
• this is what we’d expect
Algorithm 2’s Checkpoint CostImprovement over Algorithm 1• According to the graph, as
registrations increase, Algorithm 2 further improves over Algorithm 1.
• ??? This is not what I would expect– p. 189: “If registration activities are
very frequent, then Alg. 2 behaves exactly the same as Alg. 1”
34
Algorithm 2: Probability of ObsoleteRecords After HLR Failure• X axis is 1/2, where 1 represents
intervals with few registrations and 2 represents intervals with many registrations
• Thus, as we move right, registrations become less frequent
• Less registrations means less chance of obsolete records, thus, this cost decreases as we move right
Algorithm 2’s Obsolete Record CostImprovement over Algorithm 1• Shows that Alg. 2 has a 20-55%
improvement over Alg. 1
• This makes sense, because Alg. 2 will checkpoint when a registration occurs if the checkpoint timer has expired… whereas Alg. 1 will have obsolete records in those situations.
35
Introduction
Algorithm 1 vs. Algorithm 2
Modeling the Algorithms (the math part)
Performance Evaluation of Algorithms 1 & 2
5) Conclusions / Comments
Order of Presentation Coverage
36
• Per the analytical results, Algorithm 2 improves upon Algorithm 1 in the following ways:– 50+% savings in checkpoint cost (E[tc])– 20-55% improvement in terms of reducing occurrences of
obsolete records ()
• Note that this paper does NOT discuss SGSN / VLR failure and/or recovery– all SGSN-based mobile user records are temporary and not
backed-up– other papers discuss SGSN failure restoration (see paper’s
references)
Conclusion
37
• This paper was heavy on the math, light on the explanations from step to step– Granted, maybe IEEE gave the author a requirement to fit within
6 pages of their magazine? 2x or 3x as long would make it much easier to follow.
• Derek’s recommended prerequisites:– know the difference between probability density functions and
probability distribution functions– know what a Laplace transform is– refresh your memory on integrals and derivations
• If, in fact, simulations were performed, include the details! He apparently omitted them on purpose. Maybe they’re included in his dissertation, thesis, etc…?
Comments
38
Thanks!
Any questions?