Download ppt - The SMART Way to Migrate Replicated Stateful Services Jacob R. Lorch, Atul Adya, Bill Bolosky, Ronnie Chaiken, John Douceur, Jon Howell Microsoft Research

The SMART Way to Migrate Replicated Stateful Services

Jacob R. Lorch, Atul Adya, Bill Bolosky, Ronnie Chaiken, John Douceur, Jon Howell

Microsoft Research

First EuroSys Conference19 April 2006


Paxos

Replicated

BCA

services

statefulReplicated stateful

services

• Problem: Machine failure leads to unavailability– Solution: Replicate the service for fault tolerance

• Problem: Replica state can become inconsistent– Solution: Use replicated state machine approach


Migrating replicated services

B CA D E

• Migration: Changing the configuration – the set of machines running replicas

• Uses of migration– Replace failed machines for long-term fault tolerance– Load balancing– Increasing or decreasing number of replicas


• Can remove non-failed machines– Enables autonomic migration, i.e., migration without human

involvement– Enables load balancing

• Can do concurrent request processing• Can perform arbitrary migrations, even ones replacing

entire configuration• Completely described in our paper

Limitations of current approaches

• Cannot remove non-failed machines without creating window of vulnerability– Can only remove known-failed machines– Cannot use migration for load balancing

• Cannot process requests in parallel

Limitations of current approaches

addressed by SMART


Outline

• Introduction• Background on Paxos• Limitations of existing approaches• SMART: Service Migration And Replication

Technique– Configuration-specific replicas– Shared execution modules

• Implementation and evaluation• Conclusions


Background on Paxos


Paxosprotocol

Background: Paxos overview

• Goal: Every service replica runs the same sequence of requests– Deterministic service ensures state changes and replies are

consistent• Approach: Paxos assigns requests to virtual “slots”

– No two replicas assign different requests to same slot– Each replica executes requests in slot order

BA C

1 2 3 4 5 6…

slots:requests:

1 2 3 4 5 6 1 2 3 4 5 6… …


Background: Paxos protocol

• One replica is the leader• Clients send requests to the leader• Leader proposes a request by sending PROPOSE

message to all replicas• Each replica logs it and sends a LOGGED message

to the leader• When leader receives LOGGED messages from a

majority, it decides it and sends a DECIDED message

BA CZ

client server replicas

ReqLOGGED LOGGEDPROPOSEPROPOSEDECIDEDDECIDED


Background: Paxos leader change

• If leader fails, another replica “elects” itself• New leader must poll replicas and hear

replies from a majority– Ensures it learns enough about previous

leaders’ actions to avoid conflicting proposals

A B CPollPoll Reply


Background: Paxos migration

• Service state includes current configuration– Request that changes that part of the state migrates the

service

• Configuration after request n responsible for requests n+α and beyond

A

B

C 79

79

79Servicestate

A, B, DA, B, C80

80

80α

81

81

81

82

82

82

83

83

83

84

84

84

D 85

85

85


Rationale for α

• With α=1, slot n can change the configuration responsible for slot n+1

• Leader can’t propose slot n+1 until n is decided– Doesn’t know who to make proposal to, let alone whether it

can make proposal at all

• Prevents pipelining of requests– Request may wait a network round trip and a disk write

BA CZReqReq PROPOSEPROPOSE LOGGED LOGGED


Limitations of existing approaches


No request pipelining

• Leader change is complicated– How to ensure that new leader knows the right

configuration to poll?– How to handle some outstanding proposals being

from one configuration and some from another?– Other problems

• To avoid this complexity, current approaches use α=1

• But, this prevents request pipelining


Window of vulnerability

• Removing a machine creates window of vulnerability– Effectively, it induces a failure of the removed replica– Consequently, service can become permanently unavailable

even if less than half the machines fail

• Considered acceptable since machines only removed when known to a human to have permanently failed

• Not suitable for autonomic migration using imperfect failure detectors, or for load balancing

BA C DDECIDEDPROPOSEPROPOSEDECIDED LOGGED LOGGEDPollPoll


SMART


Configuration-specific replicas

• Each configuration has its own set of replicas and its own separate instance of Paxos

• Simplifies leader change so we can pipeline requests– Election always happens in a static configuration

• No window of vulnerability because a replica can remain alive until next configuration is established

A

Replica 1B

B

Replica 2A Replica 2B

Replica 1C

C D

Replica 2D

Replica 1A


SMART migration protocol

• After creating new configuration, send JOIN msgs• After executing request n+α-1, send FINISHED msgs

– Tells new replicas where they can get starting state– Makes up for possibly lost JOIN messages

• When a majority of successor configuration have their starting state, replica kills itself

• If a machine misses this phase, it can still join later

A

Replica 1B

B


Replica 1C

C D

Replica 2D

Replica 1AJOINJOINJOINFINISHEDFINISHEDFINISHED FINISHEDFINISHEDFINISHED FINISHEDFINISHEDFINISHEDFINISHEDFINISHEDFINISHED FINISHEDFINISHEDFINISHED FINISHEDFINISHEDFINISHED

READYREADYREADY READYREADYREADYPREPARE

JOIN-REQ

JOIN


Agreement 1A

Agreement 2A

Agreement 1B

Agreement 2B

Agreement 1C

Agreement 2D

Shared execution modules

• Configuration-specific replicas have a downside– One copy of service state for each replica– Need to copy state to new replicas

• Solution: Shared execution modules– Divide replica into agreement and execution modules– One execution module for all replicas on machine

Replica 1A

A

Replica 1B

B


Replica 1C

C D

Replica 2D

Execution 1A

Execution 2A

Execution 1B

Execution 2B

Execution 1C

Execution 2DExecution Execution Execution Execution


Implementation and evaluation

• SMART implemented in a replicated state machine library, LibSMART– Lets you build a service as if it were single-machine, then

turns it into a replicated, migratable service

• Farsite distributed file system service ported to LibSMART– Straightforward because LibSMART uses BFT interface

• Experimental results using simple key/value service– Pipelining reduces average client latency by 14%– Migration happens quickly, so clients only see a bit of extra

latency, less than 30 ms


Conclusions

• Migration is useful for replicated services– Long-term fault tolerance, load balancing

• Current approaches to migration have limitations• SMART removes these limitations by using

configuration-specific replicas– Can remove live machines, enabling autonomic migration

and load balancing– Can overlap processing of concurrent requests

• SMART is practical– Implementation supports large, complex file system service