22
The DHCP Failover Protocol A Formal Perspective Rui Fan MIT Ralph Droms Cisco Systems Nancy Griffeth CUNY Nancy Lynch MIT

The DHCP Failover Protocol A Formal Perspective Rui FanMIT Ralph Droms Cisco Systems Nancy GriffethCUNY Nancy LynchMIT

Embed Size (px)

Citation preview

Page 1: The DHCP Failover Protocol A Formal Perspective Rui FanMIT Ralph Droms Cisco Systems Nancy GriffethCUNY Nancy LynchMIT

The DHCP Failover ProtocolA Formal Perspective

Rui Fan MIT

Ralph Droms Cisco Systems

Nancy Griffeth CUNY

Nancy Lynch MIT

Page 2: The DHCP Failover Protocol A Formal Perspective Rui FanMIT Ralph Droms Cisco Systems Nancy GriffethCUNY Nancy LynchMIT

Fault Tolerant DHCP Dynamic Host Configuration Protocol (DHCP) is a

widely deployed protocol to assign IP addresses and other client parameters.

DHCP is also important for the wireless and mobile setting.

Current implementations use one DHCP server, are not fault tolerant.

Main challenge to using multiple servers is to maintain consistent view of assigned addresses across servers to avoid double allocation. Standard database techniques are too slow.

The DHCP Failover Protocol (DKS+’03) is a 2-server DHCP algorithm retaining the client interface and performance of DHCP.

Page 3: The DHCP Failover Protocol A Formal Perspective Rui FanMIT Ralph Droms Cisco Systems Nancy GriffethCUNY Nancy LynchMIT

Our Contributions We present an algorithm based on DKS+’03,

generalized to arbitrary number of servers. Rigorously specify algorithm and its behavior using TIOA

Helps end-users understand and use DHCP. We decompose the DHCPF problem into independent

subproblems. Subproblems can be solved separately, and their solutions

composed to solve DHCPF. Helps to understand and prove the correctness of the algorithm. Helps to analyze the effects of network parameters on algorithm

performance, and to optimize the algorithm. Demonstrates that formal, theoretical approach can

provide correct, simple and efficient solutions to complex, real-world problems.

Page 4: The DHCP Failover Protocol A Formal Perspective Rui FanMIT Ralph Droms Cisco Systems Nancy GriffethCUNY Nancy LynchMIT

Timed I/O Automaton Formal modeling framework for describing distributed

systems. Rigorous and structured. Composition, simulation, other proof / design techniques.

A Timed I/O Automaton (TIOA) [KLSV’05] consists of States, start states Discrete actions State transitions (state, action, state) Continuous actions (trajectories) A mapping from [0,t] to states

Scheduling of actions is nondeterministic. Execution is alternating sequence of trajectories and

discrete actions. Example A mobile robot.

State is its position. Discrete actions are changes in destination. Trajectories are movement towards destination.

Page 5: The DHCP Failover Protocol A Formal Perspective Rui FanMIT Ralph Droms Cisco Systems Nancy GriffethCUNY Nancy LynchMIT

System Assumptions Ideally, we want DHCPF to satisfy the

following. Safety property No IP address is double allocated. Liveness property All client commands are quickly

executed. These properties depend on correct behavior

of network and environment. Clock assumption

Clients and servers have bounded skew clocks. Let be a constant. Then |clocki(t) – t| , for every

client or server i, and every time t. Both safety and liveness depend on clock

assumption.

Page 6: The DHCP Failover Protocol A Formal Perspective Rui FanMIT Ralph Droms Cisco Systems Nancy GriffethCUNY Nancy LynchMIT

System Assumptions

Stability Let be a parameter. A time interval [t, t’] is -

stable if Some server is alive throughout [t-, t’]. No server fails or recovers during [t-, t’].

Timeliness Time interval [t, t’] is -timely if any message sent

during [t, t’-] is delivered within time. Liveness property depends on having

sufficiently long stable and timely time intervals.

Page 7: The DHCP Failover Protocol A Formal Perspective Rui FanMIT Ralph Droms Cisco Systems Nancy GriffethCUNY Nancy LynchMIT

System Assumptions

Failure detector tells servers which other servers are alive.

Model by recv,j(dead, j’) and recv,j(alive, j’) actions, where j, j’ are servers.

Can be implemented by heartbeats, network admin, etc. Let be a parameter. is –perfect if it satisfies

Accuracy If recv,*(dead, j’) occurs at time t, then j’ is dead sometime in [t-, t]. Likewise for recv,*(alive, j’).

Timeliness Every j gets a recv,j(dead, j’) or recv,j(alive, j’) msg every seconds, for every j’.

Failure detectors used in many distributed algorithms, and are sometimes provably necessary.

Safety depends on a failure detector

Page 8: The DHCP Failover Protocol A Formal Perspective Rui FanMIT Ralph Droms Cisco Systems Nancy GriffethCUNY Nancy LynchMIT

A Formal Spec of DHCPF

DHCP client interface and message exchange sequence. is an interaction identifier. Client is correct if it executes this message

sequence. Say client i owns an IP address at

time t if send*,i(ack,*,,) occurs before t, and t – Takes into account clock skew of client. If i doesn’t own at t, then i is definitely not

using at t Assumes correct clients.

bcast(discover,)

client server

send(offer,,)

bcast(request,)

send(ack,,’)

bcast(renew,’’’)

send(ack,,’’’)

Page 9: The DHCP Failover Protocol A Formal Perspective Rui FanMIT Ralph Droms Cisco Systems Nancy GriffethCUNY Nancy LynchMIT

A Formal Spec of DHCPF

Assume a -perfect failure detector, and a bound on clock skew.

Safety For all IP addresses and at all times t, at most one client owns at t.

Request liveness Suppose time t is (4+4)-stable and -timely, and client i does bcast(discover,) at time t. Assume client i is correct and does not fail during [t, t+4]. Then By time t+, every live server receives i’s message. By time t+2, either send(offer,,) occurs for some , or for every

, either was offer’ed to some client but not request’ed. There is a lease for which has not expired.

If send(offer,,) occurs, then send(ack,,*,*) occurs by time t+4

Page 10: The DHCP Failover Protocol A Formal Perspective Rui FanMIT Ralph Droms Cisco Systems Nancy GriffethCUNY Nancy LynchMIT

A Formal Spec of DHCPF

Renew liveness Suppose time t is (4+4)-stable and -timely, and client i has a lease for for time t++. Then if i bcasts renew for at t, i recvs an ack for by time t+2

Page 11: The DHCP Failover Protocol A Formal Perspective Rui FanMIT Ralph Droms Cisco Systems Nancy GriffethCUNY Nancy LynchMIT

DHCPF Algorithm Overview We break the DHCPF problem into two independent subproblems,

Lease and Elect. Elect

For any IP address , elect a leader server for Only the leader can lease to clients. There is at most one leader for at any time. The leader can change as servers fail and recover.

Lease The leader gives out leases for Ensure clients can always request or renew leases for Ensure no double allocation even if leader changes.

Lease and Elect run continuously, in parallel. The DHCPF algorithm is the formal composition Elect Lease.

Page 12: The DHCP Failover Protocol A Formal Perspective Rui FanMIT Ralph Droms Cisco Systems Nancy GriffethCUNY Nancy LynchMIT

The Elect Algorithm

For any IP address , Elect ensures Safety There is at most one leader server for at any time. Liveness If execution is currently “nice”, then a leader

exists. Code shown is for server j. clock The current clock value at j. live Set of servers j thinks is alive. my-addrs Set of IP addresses j thinks it is leader for. lead-time[] Time when j became leader for rec-time Time when j last recovered.

Page 13: The DHCP Failover Protocol A Formal Perspective Rui FanMIT Ralph Droms Cisco Systems Nancy GriffethCUNY Nancy LynchMIT

The Elect Algorithm

Basic idea is the min live server should be leader for ’s. Actually, can use a different min for each , for load balancing.

If j hears j’ is alive Add j’ to live. For each , if j no longer min for , give up leadership of

If j hears j’ is dead Remove j’ from live. For each , if j became min for , and enough time passed since last

recovery, become leader for Time to wait depends on quality of failure detector , and clock skew

is min, and enough time passed no longer

min

Page 14: The DHCP Failover Protocol A Formal Perspective Rui FanMIT Ralph Droms Cisco Systems Nancy GriffethCUNY Nancy LynchMIT

Assume is -perfect, and clock skew is at most Theorem (Safety) At any time, for any address , there is at

most one server j with my-addrsj. Proof

Theorem (Liveness) If current state is (4 + 4)-stable, then for every address , we have my-addrsmin L

, where L is the set of current live servers.

Elect Properties

dead

alive

s1 is alive from this point on

t-t-2

s2 sees s1, won’t become leadert

s1

s2

s1, s2 both leaders for

Page 15: The DHCP Failover Protocol A Formal Perspective Rui FanMIT Ralph Droms Cisco Systems Nancy GriffethCUNY Nancy LynchMIT

The Lease Algorithm To avoid double allocation, leader should

tell others servers its leases, in case it fails. Waiting for acks from other servers is too

slow. Leader first gives client a temporary

Maximum Client Lead Time (MCLT) lease. Client gets a shorter lease than he asked

for. While client is using MCLT lease, leader

negotiates an acknowledged lease with other servers. When client renews, he gets the lease he

asked for last time. In this example, suppose MCLT = 3.

renew(15)

req(10)

ok(4)

ok(10) lease(15)

ack(15)

ack(10)

lease(10)

renew(20)

ok(15) lease(20)

1

2

3

4

5

10

s1 s2

Page 16: The DHCP Failover Protocol A Formal Perspective Rui FanMIT Ralph Droms Cisco Systems Nancy GriffethCUNY Nancy LynchMIT

The Lease Algorithm

When new leader takes over, it waits MCLT time, and also till its max acknowledged lease expires.

This upper bounds the maximum potential lease that the previous leader might have given out.

Leader only gives out new lease for when all potential leases have expired.

This is the main idea of DKS+’03.

ack(10)

req(10)

ok(4)lease(10)

1

2

3

4

5

s1 s2

req(8)

nok

Page 17: The DHCP Failover Protocol A Formal Perspective Rui FanMIT Ralph Droms Cisco Systems Nancy GriffethCUNY Nancy LynchMIT

The Lease Algorithm

potlease[] Maximum potential lease given out for reserved Set of addresses offered but not requested. acklease[] The lease value that j will give for An interaction identifier. write-acks[] Set of servers acknowledging interaction instance

MCLT lease

negotiate acknowledged lease

give the ack’ed lease

every server increased potlease, so j can increase acklease

wait for max of MCLT and potlease

check is available

Page 18: The DHCP Failover Protocol A Formal Perspective Rui FanMIT Ralph Droms Cisco Systems Nancy GriffethCUNY Nancy LynchMIT

Safety of Elect Lease Theorem Elect Lease satisfies the

safety property of the DHCPF specification.

Proof A sequence of invariants, proved by induction on the execution. Prove that servers have good estimate of

max lease given out for Lemma For all j, j’, if jwrite-acks[]j’,

then potlease[]j

Lemma For all j, j’, max(potlease[]j, clockj + MCLT + 2) acklease[]j’

Key invariant of [DKS+’03]. Only consider actions which increase

acklease[]j’.

Page 19: The DHCP Failover Protocol A Formal Perspective Rui FanMIT Ralph Droms Cisco Systems Nancy GriffethCUNY Nancy LynchMIT

Safety of Elect Lease

Lemma Let be the leader for . Then potlease[] acklease[]j, for all j. If inductive stepdoesn’t change leader, we show this using

the fact that there’s at most one leader for If leader changes, then sets potlease[]

max(potlease[]j, clockj + MCLT + 2).

Since leader always knows the max lease for , it avoids double allocation during request or renew.

Page 20: The DHCP Failover Protocol A Formal Perspective Rui FanMIT Ralph Droms Cisco Systems Nancy GriffethCUNY Nancy LynchMIT

Liveness of Elect Lease Hard to state

Need to identify all situations which prevent progress. Easy to prove!

When nothing bad happens, something good happens. Theorem Elect Lease satisfies the request and renew

liveness properties of the DHCPF specification. Proof (Request liveness)

Suppose client i bcasts discover at time t. By time t+, every live server gets i’s message.

Since t is (4 + 4)-stable and -timely, then every has a leader. Server j doesn’t offer i any address only if for every j owns, has

been reserved by another client, or the lease for hasn’t expired. If i is offered some ’s, then no other client is offered those ’s, so

within 2 time, i gets ack for Renew liveness proof similar.

Page 21: The DHCP Failover Protocol A Formal Perspective Rui FanMIT Ralph Droms Cisco Systems Nancy GriffethCUNY Nancy LynchMIT

Conclusions

Formally specified and implemented a fault tolerant DHCP algorithm using TIOA.

A simple algorithm based on decomposition into independent subproblems.

Is our decomposition “good”? Does DHCPF need a perfect failure detector? Is the dependence on clock skew and msg delay the best

possible? Is “goodness” merely a “human” and case-by-case

concept, or a more universal one? Perhaps not totally far-fetched? Church-Turing formalized

computation, Cook-Levin formalized completeness…

Page 22: The DHCP Failover Protocol A Formal Perspective Rui FanMIT Ralph Droms Cisco Systems Nancy GriffethCUNY Nancy LynchMIT

Thank you!