33
Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

Embed Size (px)

Citation preview

Page 1: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

Isis2 Design Choices

A few puzzles to think about when considering use of Isis2 in your work

Page 2: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

A Service With Mobile Clients

• Suppose that you are creating a service that will have external clients using web apps or browsers.– Your goals are to load-balance the requests over your

service, but your service depends upon some form of dynamically evolving replicated state.

– The questions that follow relate to how best to use Isis2 as a tool in solving this kind of problem

A

B

C

Your users are remote and

mobile

Your server will run in a cloud-hosted

data center

Page 3: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

A Service with Mobile Clients

• True or False: A good use of Isis2 would be for direct communication of updates between the client systems.

Page 4: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

A Service with Mobile Clients

• False: Isis2 is poorly suited to P2P settings, where there can be a wide variety of communication barriers. The best use of Isis2 is internal to a data center, where the server runs.

A

B

C

Direct peer to peer connectivity is often difficult due to firewalls,

network address translation and slow links. This is an issue even

within a single household!

We can count on fairly good connections back to the hosted

server in the data center

Page 5: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

Client to Server Connectivity

• Which of these is not a good choice?A. Connect the clients to the data center using a prebuilt

web services solutions, such as the RESTFUL service architecture.

B. Employ Visual Studio and tell it you want to create a new WCF application. Build on the automatically created WCF client and server templates.

C. Launch Isis2 in all systems, but have the client applications use the built-in “Client of a group” API in Isis2, and have the group run purely on nodes inside the data center.

Page 6: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

Client to Server Connectivity

• C is a poor choice. By default, Isis2 probably won’t even start correctly in this setting. – It uses IP multicast to find peers during its start

protocol.– Using ISIS_HOSTS and ISIS_UNICAST_ONLY you can help

Isis2 start in this setting, but the overheads of doing so would be pretty high compared to the WCF or RESTFUL approach.

– The “Client” API internal to Isis2 is intended for cases where one group is using services from another group, not for mobile external users.

Page 7: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

Isis2 can help with…

A. Maintaining seamless connectivity, so that the mobile users never see a disconnection.

B. Maintaining the game state, so that every user sees a consistent, dynamically updated state even when connected to different server instances.

C. Real-time coordination, so that activities like multiuser battles are easier to script.

Page 8: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

Isis2 can help with…

A. Maintaining seamless connectivity, so that the mobile users never see a disconnection.

B. Maintaining the game state, so that every user sees a consistent, dynamically updated state even when connected to different server instances.

C. Real-time coordination, so that activities like multiuser battles are easier to script.

Isis2 won’t even know about the network links, which will probably use TCP. The Cornell TCP-R technology

offers unbreakable TCP links. You could deploy it side by side with Isis2 to create seamless connectivity

Isis2 is fast, but it is not a real-time technology. By synchronizing clocks (e.g. using NTP with a good-

quality NTP stratum 0 time source) on your servers, you could employ Isis2 as part of a real-time system

Page 9: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

The best option for guaranteed actions is…

• Suppose a mobile user does some action and we want to guarantee that it will be performed exactly once. We’re running Isis2 within our data center on the game servers.A. Isis2 can automatically handle this through a

form of primary-backup coordinationB. Isis2 lacks a solution to this but provides tools

that can be used to create a solution in any of several ways, depending on your specific goals.

Page 10: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

The best option for guaranteed actions is…

• Suppose a mobile user does some action and we want to guarantee that it will be performed exactly once. We’re running Isis2 within our data center on the game servers.A. Isis2 can automatically handle this through a

form of primary-backup coordinationB. Isis2 lacks a solution to this but provides tools

that can be used to create a solution in any of several ways, depending on your specific goals.

Isis2 won’t even know about the incoming requests since they will arrive as WCF or REST events, delivered as

upcalls to individual group members. Also, Isis2 lacks a built in “do this fault-tolerantly” option.

Page 11: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

Take an action fault-tolerantly

• Suppose our group has members {P,Q,R}• Some request arrives at member P from a

client, and we wish to perform it exactly once even if failures occur. Which option is best?A. P should relay the request to the whole group,

e.g. using g.OrderedSend(). If a client timeout occurs, the client can reissue the request

B. We will need to use the Isis2 g.SafeSend() disk durability option to solve this problem.

Page 12: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

Take an action fault-tolerantly

• Suppose our group has members {P,Q,R}• Some request arrives at member P from a

client, and we wish to perform it exactly once even if failures occur. Which option is best?A. P should relay the request to the whole group,

e.g. using g.OrderedSend(). If a client timeout occurs, the client can reissue the request

B. We will need to use the Isis2 g.SafeSend() disk durability option to solve this problem.

The SafeSend() protocol in Isis2 is used when a group is employed as a “wrapper” around replicas of a database external to the group (e.g. a

replicated mySQL database, or an Oracle database). For gaming applications, running over a replicated durable database would be too slow, so this is not a good design for the application we have In mind.

Page 13: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

Relaying a Request

• In the previous question we decided that P should relay the request, but if P fails, that the mobile client system might reissue it.A. In this situation, Isis2 would automatically sense a

reissued request. Thus if P uses OrderedSend to relay client request X, but then the client asks Q to relay the same request, it will only be delivered once.

B. Isis2 cannot sense this form of duplication. Application code of your own would be needed to sense duplicate requests and perform them just once.

Page 14: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

Relaying a Request

• In the previous question we decided that P should relay the request, but if P fails, that the mobile client system might reissue it.A. In this situation, Isis2 would automatically sense a

reissued request. Thus if P uses OrderedSend to relay client request X, but then the client asks Q to relay the same request, it will only be delivered once.

B. Isis2 cannot sense this form of duplication. Application code of your own would be needed to sense duplicate requests and perform them just once.

When designing your gaming application, give each request a unique id. Then, if the group receives a duplicated request, you can just replay the

same response under the assumption that the mobile application timeout out for some reason and missed the original response.

Page 15: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

Sending Failures

• The best way to sense failures would beA. Let Isis2 do this automatically. You are unlikely to

do better and Isis2 will be very fast in any case.B. One by one ask what failures can occur. For each

case try and design a super-fast failure handling solution, which could include telling Isis2 that one of the group members has failed.

C. Connect your service to the Amazon EC2 fault sensing and reporting framework.

Page 16: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

Sending Failures

• The best way to sense failures would beA. Let Isis2 do this automatically. You are unlikely to

do better and Isis2 will be very fast in any case.B. One by one ask what failures can occur. For

each case try and design a super-fast failure handling solution, which could include telling Isis2 that one of the group members has failed.

C. Connect your service to the Amazon EC2 fault sensing and reporting framework.

Isis2 rapidly senses and resends lost messages internal to the data center, so that one case will be handled automatically. But outright failures of the

group members will be sensed slowly, after 45-90s by default.

Surprisingly, there is no EC2 fault sensing and reporting framework. Most gaming applications end up designing a rapid sensing framework of their own.

Page 17: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

Real-Time In Isis2

• Your gaming system needs a kind of real-time “pulse” that will trigger periodic actions by all the members. But you want consistency!A. Have one leader track the time and then use

g.Send() to trigger the pulseB. Same as A but use g.RawSend() for better speedC. Synchronize time across the whole group, and

just have each group member take actions at the pre-agreed “pulse time” points

Page 18: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

Real-Time In Isis2

• Your gaming system needs a kind of real-time “pulse” that will trigger periodic actions by all the members. But you want consistency!A. Have one leader track the time and then use

g.Send() to trigger the pulseB. Same as A but use g.RawSend() for better speedC. Synchronize time across the whole group, and

just have each group member take actions at the pre-agreed “pulse time” points

The CAP theorem tells us that we have a tradeoff here. G.Send() is always consistent, and will normally be very fast. If consistency matters, this is probably the best way to achieve it.

RawSend() won’t necessarily reach every member. So it has more steady timing on delivery, but some members might fail to pulse (e.g. if a message is lost –

RawSend() won’t try to recover it).

This gives the best timing but completely lacks any kind of strong consistency. Also, keep in mind that on shared, virtualized platforms like EC2, even with NTP one may have trouble synchronizing clocks to better than 25-50ms. By renting heavy-weight

EC2 instances you can reduce the risk of disruptive scheduling delays

Page 19: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

Duplicated Computing

• Certain gaming requests require fairly heavy computing. We want to have two group members perform each such request for fault-tolerance, but how should they be picked?A. Relay the request via OrderedSend, then on receipt, use

the group view to select 2 members. They compute the identical answer because data is consistent and both reply. The client takes the first reply and ignores the duplicate.

B. Have the external client just send the same request twice. Again, the client just takes the first reply.

Page 20: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

Duplicated Computing

• Certain gaming requests require fairly heavy computing. We want to have two group members perform each such request for fault-tolerance, but how should they be picked?A. Relay the request via OrderedSend, then on receipt,

use the group view to select 2 members. They compute the identical answer because data is consistent and both reply. The client takes the first reply and ignores the duplicate.

B. Have the external client just send the same request twice. Again, the client just takes the first reply.

For example, take the request-id and hash it to a number k 0…N-1. Then have group members k and k+1 (mod N) run the operation for this request.

This could work, but keep in mind that the two requests might end up assigned to the same group member. It is hard to completely control the EC2 load-balancer!

Page 21: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

TCP-R

• We mentioned the Cornell TCP-R technology. The role of TCP-R is:A. To allow a group member to “take over” a TCP

endpoint seamlessly, thus allowing transparent fail-over or migration of computing roles.

B. To enhance performance of TCP for real-time and gaming uses by changing flow-control behavior.

C. To allow a TCP connection to terminate at a group of endpoints, like the members of an Isis2 group. All endpoints would deliver identical data.

Page 22: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

TCP-R

• We mentioned the Cornell TCP-R technology. The role of TCP-R is:A. To allow a group member to “take over” a TCP

endpoint seamlessly, thus allowing transparent fail-over or migration of computing roles.

B. To enhance performance of TCP for real-time and gaming uses by changing flow-control behavior.

C. To allow a TCP connection to terminate at a group of endpoints, like the members of an Isis2 group. All endpoints would deliver identical data.

When used correctly, a new server (perhaps a backup) can “splice” a new TCP connection to an already-open one that connected to some prior server (perhaps a primary that crashed). TCP-R

ensures that not a byte is duplicated or lost, but it does require application help: code you write to checkpoint the TCP-R state and the data sent/received on the connection.

In fact there are a number of special versions of TCP for real-time settings. However, to use them on systems like EC2 you would need to run them as application-layer libraries, which is a

little tricky to do. The same can be said of TCP-R: none of these options are transparent.

There are also TCP fault-tolerance solutions that work this way, .

Page 23: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

TCP-R in action

tcp connection TCP-Rblack box

tcp InitialServer

checkpoints

StandbyServer

new tcp connection

Mobile client sees no disruption at alland the spliced TCP connection looksidentical to the old one. Not a byte isduplicated or lost in either direction

Page 24: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

Isis2 plus TCP-R

• When we say that the application could combine these, we mean that one could use TCP-R to talk to a server group that uses Isis2 internally to maintain replicated state– The replicated state functions as the checkpoint– However, this is still not at all transparent• You must deploy TCP-R and Isis2 • Your server must still include TCP-R state into the data

replicated in the group, and must checkpoint at the proper points in time (as per the TCP-R user manual)

Page 25: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

Backup takes over

• Consider a general setting in which a group replicates state such as “actions the external users have requested” or “the game state”

• Now a member fails and a backup takes overA. With Isis2 this is transparent and seamlessB. Isis2 delivers events that can trigger the take-

over but the backup will still need to “figure out” what the member had done prior to failing

Page 26: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

Backup takes over

• Consider a general setting in which a group replicates state such as “actions the external users have requested” or “the game state”

• Now a member fails and a backup takes overA. With Isis2 this is transparent and seamlessB. Isis2 delivers events that can trigger the take-

over but the backup will still need to “figure out” what the member had done prior to failing

The new-view event tells you who failed, and you also know that any multicasts sent prior to the failure either have been delivered, or were completely erased by the

crash. But the backout would often need to query the “external world” to know if actions the failed process was performing had succeeded or not, e.g. if it was

updating a database or activating a piece of hardware or performing other kinds of “external” actions.

Page 27: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

Out-of-Band Tool

• The Isis2 OOB (out of band file transfer) tool:A. Is used to copy memory-mapped files from node

to node, at locations where an Isis2 application has group members.

B. Is helpful when dealing with remote clients that are using web services to send data outside of the Isis2 system

C. Provides a way for an application to implement a control layer that oversees some other communication technology, such as with SDN

Page 28: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

Out-of-Band Tool

• The Isis2 OOB (out of band file transfer) tool:A. Is used to copy memory-mapped files from

node to node, at locations where an Isis2 application has group members.

B. Is helpful when dealing with remote clients that are using web services to send data outside of the Isis2 system

C. Provides a way for an application to implement a control layer that oversees some other communication technology, such as with SDN

Isis2 multicast works best for small objects, so with the OOB tools, you can move gigabyte objects as memory-mapped files. The multicasts talk about file names and

sizes, but the data itself is moved externally to the group, at very high data rates using a form of nearly direct DMA transfer from source to destination(s)

Although your application can certainly use WCF or RESTFUL technology to support remote mobile clients, Isis2 wouldn’t have any direct knowledge about them. The OOB

technology only works between members of an Isis2 process group.

Although it would certainly be possible to build new tools similar to the OOB tool for managing a software defined network, we haven’t tried doing that yet with Isis2

Page 29: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

OOB for State Transfer

• When using the OOB tool to accelerate a state transfer, which of the following is not true?A. One option is to put the state in a mapped file,

transfer it via OOB, and have the state transfer itself just point to the mapped file.

B. One option is to pre-transfer state, then have the state transfer include just the delta of updates that may have occurred after that pre-transfer

C. OOB cannot be used in this case because the process is not yet a member of the group

Page 30: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

When deleting an OOB replica…

• Suppose that in group {P,Q,R….} P initially has some large object “X” and uses OOB replication to create new replicas at Q and R. A. The copy at P can be deleted in the same

OOBRereplicate request that created the copies at Q and R

B. The copy at P should not be deleted until after the copies for Q and R have been made

Page 31: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

When deleting an OOB replica…

• Suppose that in group {P,Q,R….} P initially has some large object “X” and uses OOB replication to create new replicas at Q and R. A. The copy at P can be deleted in the same

OOBRereplicate request that created the copies at Q and R

B. The copy at P should not be deleted until after the copies for Q and R have been made

Page 32: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

OOB for State Transfer

• When using the OOB tool to accelerate a state transfer, which of the following is not true?A. One option is to put the state in a mapped file,

transfer it via OOB, and have the state transfer itself just point to the mapped file.

B. One option is to pre-transfer state, then have the state transfer include just the delta of updates that may have occurred after that pre-transfer

C. OOB cannot be used in this case because the process is not yet a member of the group

Page 33: Isis 2 Design Choices A few puzzles to think about when considering use of Isis 2 in your work

OOB for State Transfer

• When using the OOB tool to accelerate a state transfer, which of the following is not true?A. One option is to put the state in a mapped file,

transfer it via OOB, and have the state transfer itself just point to the mapped file.

B. One option is to pre-transfer state, then have the state transfer include just the delta of updates that may have occurred after that pre-transfer

C. OOB cannot be used in this case because the process is not yet a member of the group

There are several ways to work around the “must be a member” limitation. One can do the OOB transfer in some other group, created just for the purpose, or can

perform the OOB ReReplicate “during” the state transfer event.