1 A paper by Yi-Bing Lin IEEE Transactions on Mobile Computing Vol. 4, No. 2, March/April ’05 Presented by Derek Pennington Per-User Checkpointing For

1

A paper by Yi-Bing LinIEEE Transactions on Mobile Computing

Vol. 4, No. 2, March/April ’05

Presented by Derek Pennington

Per-User CheckpointingFor Mobility Database Failure Restoration

2

In GPRS & UMTS networks, the Home Location Register (HLR)maintains the central database of user information. For a given user,an HLR record might contain information such as…

• Mobile Station (MS) Information– telephone number– International Mobile Subscriber Identity

• Service Information– subscription info– service restrictions– supplementary services

• Location Information– address of Serving GPRS Support Node (SGSN)

3

But what happens in the event of an HLR failure?

• Luckily, we periodically backup all of this user data (each backup is called a “checkpoint”).

• However, the paper’s author argues that the established backup practices have room for improvement.

4

Various approaches to checkpointing:

• All-record checkpoint– backup all users at once (eg: midnight)– costly (bottleneck effect)

• Per-user checkpoint– each user has its own timing mechanism for backups

• The paper’s author discusses the existing per-user checkpointing algorithm (henceforth referred to as “Algorithm 1”), and then proposes a new, improved one (“Algorithm 2”)

5

Introduction

2) Algorithm 1 vs. Algorithm 2

3) Modeling the Algorithms (the math part)

4) Performance Evaluation of Algorithms 1 & 2

5) Conclusions / Comments

Order of Presentation Coverage

6

• checkpoints happen at random intervals (tc)

• a checkpoint may occur whether or not any user registrations have taken place

• In the event of an HLR failure, if the user updated the HLR database, but that update didn’t get backed up, the record becomes obsolete

• When the user’s record is obsolete, the user will lose calls until he performs a registration with the HLR.

Algorithm 1

7

• checkpoint timers are scheduled for random intervals (tc)

• However, checkpoints will only take place when BOTH of the following are true:

• tp timer expires• a registration has taken place

• Like Algorithm 1, user will lose calls if his record(s) is/are obsolete

Algorithm 2

a checkpoint occurs whenever we return to State 0

8

For each scenario, will the user’s record(s) be

valid after the HLR recovers from its failure?#

4

3

2

1

Scenario

NONO

YESNO

YESYES

YESYES

Algorithm 2Algorithm 1

CP timer fires =

registration =

failure =

LEGEND

RECAP

9

Introduction

Algorithm 1 vs. Algorithm 2

3) Modeling the Algorithms (the math part)




10

Two metrics are used to measure checkpoint algorithmperformance:

• E[tc]: the expected checkpoint interval– the larger the interval, the less frequent checkpoints will occur– essentially, checkpoint cost is proportional to checkpoint

frequency

: the probability that the user’s HLR record is obsolete after an HLR failure/recovery– the smaller “” is, the better the checkpoint algorithm’s

performance

11

Setting the checkpoint timer tp:

• typical approaches have a fixed tp

• However, this can lead to congestion with large numbers of users

• Thus, in Algorithms 1 & 2, tp is a random variable with exponential distribution

• Density function:

• …and, in Algorithm 1, since tc = tp from checkpoint to checkpoint, the expected checkpoint interval is:

ptpp etf

1

pcI tEtE

checkpoints per unit time

time between checkpoints

12

tm – m

m

tp – p

p

= “residual time” of tm

= “reverse residual time” of tm

= residual time of tp

= reverse residual time of tp

Finding for Algorithm 1:

13

Consider random variable t:

• probability density function: f(t)

• probability distribution function:

• expected value: E[t]

• Laplace transform:

Let be the residual time of t:

• probability density function:

• probability distribution function:

• Laplace transform:

t

ydyyftF

0)()(

Finding for Algorithm 1 (cont’d):

0

*

t

stdtetfsf

stE

sfsr

** 1

tE

Fr

1

0)()(

ydyyrR

14

• Also, the density function is the same:

• In Algorithm 1, we say the backup record is obsolete if, at the moment of HLR failure, the time since the last checkpoint is greater than the time since the last registration

• In other words:

tpp etftr

Finding for Algorithm 1 (cont’d):

mcI Pr

mcmm ddermc

c

m

0

m

mm tE

fr

** 1

integrals of the two density functions

stE

sfsr

** 1

from r*(s) defined earlier

15

• One difference between Algorithm 2 and Algorithm 1 is that the checkpoint timer will be reset based on how the previous checkpoint took place

– If the previous checkpoint happened due to a timeout event, then the next checkpoint interval is:

– If the previous checkpoint happened due to a registration event, then the next checkpoint interval is:

– Thus, in our state machine example, we actually have two “State 0”s…

pmc tt ,max *

Finding E[t] and for Algorithm 2:

pmc ttt ,max

16

Checkpoint occurring due to timeout event

Checkpoint occurring due to registration

Probability that a timeout will occur after a timeout-caused checkpoint

Probability that a timeout will occur after a registration-caused checkpoint

Probability that a registration will occur after a timeout-caused checkpoint

Probability that a registration will occur after a registration-caused checkpoint

17

• The random variable tc is now essentially a combination of the probability that the last checkpoint happened due to a timeout and the probability that the last checkpoint happened due to a registration:

• …where:

(x is the probability of being in State “x”)

pmpmc ttptpt ,max,max 2*

1

0201

011

p1

0201

022 1 pp

and

18

remember that comes from

)(1

)(

)(1

)(

*

*

*

*

md

mc

mb

ma

rp

rp

fp

fp

)(* mr

• Therefore, we can say:

timeout-caused checkpoint

registration-caused checkpoint

mc Pr

19

• Based on the figure, we can deduce some limiting probabilities:

• …which means we know more about p1 and p2:

)()(1

)(**

*

1

mm

m

rf

fp

02012

02011

022

011

2102011

bd

ac

pp

pp

)()(1

)(1**

*

2

mm

m

rf

rp

and

20

• From…

• …the density function for tc is:

• …where:

• …thus:

c

c

c

c

tcmcm

cmt

tcmcm

cmt

etftfTERM4

tFeTERM3

etrtrTERM2

tReTERM1

)()(

)(

)()(

)(

TERM4TERM3pTERM2TERM1ptf cc 21)(

cccc tcmcmcm

ttcmcmcm

tcc etftftFepetrtrtReptf )()()()()()( 21

pmpmc ttptpt ,max,max 2*

1

21

• The relationships between tp, tm, and allow us to reinterpret fc(tc) into two different pieces:

– fc1(tc): the situation where tp > tm

– fc2(tc): the situation where tp < tm

• We can reexpress fc(tc) as:

• …where:

)*()*()( 212 TERM4pTERM2ptf cc

)()(1

)()()(1

)()(1

)()()(**

*

**

*

mm

tcmcmm

mm

tcmcmm

rf

etftfr

rf

etrtrf cc

)*()*()( 211 TERM3pTERM1ptf cc

)()(1

)()(1

)()(1

)()(**

*

**

*

mm

cmt

m

mm

cmt

m

rf

tFer

rf

tRef cc

c

c

c

c

tcmcm

cmt

tcmcm

cmt

etftfTERM4

tFeTERM3

etrtrTERM2

tReTERM1

)()(

)(

)()(

)(

)()( 21 cccccc tftftf

*m

22

0ctcccccII dttfttE

**

*2

**

*1

2211

1

1

1

**

mm

m

mm

m

rf

rA

rf

fA

ApAp

Expected checkpoint interval for Algorithm 2:

What are A1 and A2?......

integral of the density function

plug-in p1 and p2

23

0001

c

c

cc

c

t ct

cmct ccmct ccmt

c dtetrtdttrtdttRetA

mm

sm

ms

m

Er

ds

sdrE

ds

ssr

d

*

*

*

0002

c

c

cc

c

t ct

cmct ccmct ccmt

c dtetftdttftdttFetA

mm tE

f

*

24

][)(

][)(

][*

2

*

1 mm

mm

cII tEf

pEr

ptE

So we can also express the expected checkpoint interval for Algorithm 2 as:

plug-in A1 and A2

25

• To find the probability of getting an obsolete record, there is no close-form expression when arbitrary fm(tm) is used

• The paper uses a mix-Erlang density function

– proven as a good approximation to other functions as well as measured data

– …and, as a comparison, the regular Erlang density function:

j

i

ti

i

nmi

immmi

i

en

tqtf

1

1

!1

Now we need to find II…

mtn

mm e

n

ttnf

!1

,,1

26

• Continuing, we have the Erlang distribution function:

• …and the Laplace transform expressed as:

mt

n

j

jm

m ej

ttnF

1

0 !1,,

Now we need to find II (cont’d)

n

jmtjf

1

,,1

1

n

ssnf

,,*

27

• The reverse residual time m of tm has:

– density function:

– distribution function:

– Laplace transform:

• And, since E[tm]=n/, we can say the following:

mnr ,,


mnR ,,

snr ,,*

n

n

jmm

n

jmm

snssnr

jFn

nR

jfn

nr

1,,

,,1

,,

,,1

,,

*

1

1

28

• For Algorithm 2, consider the two scenarios:

– A registration happens before the timeout• Checkpoint happens at the time of the timeout

– A registration does NOT happen before the timeout• In this case, we wait until the next registration to checkpoint

• To derive II, we only need to consider the first case…

• …where:

ct

n

mc

tcc tnFeptmF

neptf cc ,,,,

12

111


c

n

mc tngptmg

n

p,,,,

12

1

tiFetig t ,,,,

i

kk

kt tkfe

1

1

,,

29

• Then, the density function for the reverse residual time corresponding to g(i,,t) is:

• If we say that c is the reverse residual time of tc, then the density function for c is:

i

k

k

jk

kt tjf

ketih

1 1

1

,,,,


c

n

mccc nhpmh

n

pr ,,,, 2

1

11

30

• Finally, we can derive II:

• …where……… (next slide)


mcII Pr

m mcmcccm ddrnr

1,,

nAnApmAmAn

p

ddnhnrpddmhnrn

p

n

m

mccm

n

mmccm

mcmmcm

4321

431

01

20

1 ,,,,,,,,

31


m

m

1

mcm ddemrmAmc

c

m

03 ,,

mc

k

jc

m

kk

k

m ddjfk

mrmAmcm

11

1

04 ,,,,

k

j

m

kk

k

jmBk 11

1

,

• …where:

mcmm ddjfmrjmBmcm

,,,,,0

mm

m

im

j

l

dlfifm m

,,,,1

10

1

32

Introduction


Modeling the Algorithms (the math part)




33

Algorithm 2: Checkpoint Freq.vs. Registration Freq.• as registration frequency increases,

so does checkpoint frequency

• this is what we’d expect

Algorithm 2’s Checkpoint CostImprovement over Algorithm 1• According to the graph, as

registrations increase, Algorithm 2 further improves over Algorithm 1.

• ??? This is not what I would expect– p. 189: “If registration activities are

very frequent, then Alg. 2 behaves exactly the same as Alg. 1”

34

Algorithm 2: Probability of ObsoleteRecords After HLR Failure• X axis is 1/2, where 1 represents

intervals with few registrations and 2 represents intervals with many registrations

• Thus, as we move right, registrations become less frequent

• Less registrations means less chance of obsolete records, thus, this cost decreases as we move right

Algorithm 2’s Obsolete Record CostImprovement over Algorithm 1• Shows that Alg. 2 has a 20-55%

improvement over Alg. 1

• This makes sense, because Alg. 2 will checkpoint when a registration occurs if the checkpoint timer has expired… whereas Alg. 1 will have obsolete records in those situations.

35

Introduction


Modeling the Algorithms (the math part)

Performance Evaluation of Algorithms 1 & 2



36

• Per the analytical results, Algorithm 2 improves upon Algorithm 1 in the following ways:– 50+% savings in checkpoint cost (E[tc])– 20-55% improvement in terms of reducing occurrences of

obsolete records ()

• Note that this paper does NOT discuss SGSN / VLR failure and/or recovery– all SGSN-based mobile user records are temporary and not

backed-up– other papers discuss SGSN failure restoration (see paper’s

references)

Conclusion

37

• This paper was heavy on the math, light on the explanations from step to step– Granted, maybe IEEE gave the author a requirement to fit within

6 pages of their magazine? 2x or 3x as long would make it much easier to follow.

• Derek’s recommended prerequisites:– know the difference between probability density functions and

probability distribution functions– know what a Laplace transform is– refresh your memory on integrals and derivations

• If, in fact, simulations were performed, include the details! He apparently omitted them on purpose. Maybe they’re included in his dissertation, thesis, etc…?

Comments

38

Thanks!

Any questions?

Documents

1 A paper by Yi-Bing Lin IEEE Transactions on Mobile Computing Vol. 4, No. 2, March/April ’05 Presented by Derek Pennington Per-User Checkpointing For