Upload
quynh
View
33
Download
1
Tags:
Embed Size (px)
DESCRIPTION
A Brief Introduction To High Assurance Cloud Computing With Isis2. Cornell University. Ken Birman. Isis 2 System. A prebuilt technology that automates many of the hard tasks involved in replicating services and the data on which they depend Targets cloud computing settings - PowerPoint PPT Presentation
Citation preview
A BRIEF INTRODUCTION TO HIGH ASSURANCE CLOUD COMPUTING WITH ISIS2Ken Birman
1
Cornell University
2
Isis2 System A prebuilt technology that automates
many of the hard tasks involved in replicating services and the data on which they depend
Targets cloud computing settings Available in open-source from
isis2.codeplex.com Intended to be easy to use… … but lacks commercial support, and also
lacks some of the polish of a commercial product
3
Isis2 System
Elasticity (sudden scale changes) Potentially heavily loads High node failure rates Concurrent (multithreaded) apps
Long scheduling delays, resource contention Bursts of message loss Need for very rapid response times Community skeptical of “assurance
properties”
C# library (but callable from any .NET language) offering replication techniques for cloud computing developers
Based on a model that fuses virtual synchrony and state machine replication models
Goal is to make it easy for users to deal with some very hard problems seen in distributed settings
4
Supported languages… The main library is written in .NET / C#
…and so it is best used from C# Evolved from Java
IronPython also works, and you can use a version of C++ called C++/CLI
.NET runs on Windows Our developers use Visual Studio
… but also can be used on Linux You work with Mono and use the Mono developer
environment or compile using “mcs” at the command line
5
Other ways to access Isis2… New “outboard” option
A recent addition…. It allows native C++ and C users to access a subset of the Isis2 functionality
Initial focus is on file replication for memory-mapped files and other big memory-mapped objects
This outboard feature can be accessed from a Isis2 command-line tool To do so you run the Isis2 system as a server Commands to it can then be issued from any
language
6
Why Isis2? The new version of Isis is a completely new
system, but builds on a long history First version was the Isis Toolkit (~1985-1996) That system was used to create the French Air
Traffic Control system, the NYSE trading platform, the US Navy AEGIS core communications library and many other mission-critical applications
After the Isis Toolkit, we created Horus and Ensemble (focus was on ties to formal methods)
Isis2 is our latest and best technology. Pronounced “Isis two”, not “Isis squared”
7
What does Isis2 do? A library to help you create highly resilient,
secure, scalable applications. The methods automate important tasks
You can run your program on a few machines of your own, or on a big cluster, or in an enterprise network, or even on a cloud platform like Amazon EC2
Isis2 is a “specialist” in solving problems involving data replication within “groups” of programs
8
Imagine that some collection of programs are running on some set of machines.Data is said to be replicated if each of them has a copy, and sees the updates to that copy
Key Concept: Replication
9
Time
Replication in pictures This is an example of a non-replicated
server with a client who issues some form of request
Update the monitoring and alarmscriteria for Mrs. Marsh as follows…
Confirmed
Response delay seen by end-user would
include Internet latencies
Service response
delay
Service instance
10
Replication in pictures Here we replicate data on multiple
programs. The group of programs offers better reliability than a program, and could perform better tooUpdate the monitoring and
alarmscriteria for Mrs. Marsh as follows…
Confirmed
Response delay seen by end-user would
include Internet latencies
Service response
delay
Service group
11
Things to notice Although there is a group of programs, the client still
talks to some representative. Standard “web services” requests work in the usual way
The group members are just normal programs and could be running on the same machines, or different machines. They could be running the same code, or different code. They cooperate to replicate data No variables or memory is physically shared by the programs.
Instead, each has a private replica. When an update
happens, a message is sent to tell all the group members update their copies
12
The big picture…
Steps in using Isis2
13
First questions Step back and draw a sketch of your
application Client systems, if it has remote access The service you want to build that will use
Isis2 Data files the service may need, or
databases Other subsystems or services it will interact
with
14
Our example from earlier
Response
Request
Computes
Service group
Updates that modify
replicated data
15
Where will the service run? Will you run your replicated service…
Next to its clients, on the same machines? On EC2, RackSpace, GooglePlex, MSFT
Azure, ….? In a cluster of computers up the hallway?
How will clients connect to it? Web Services remote method invocation? CORBA RPC? TCP connections?
16
Now build a non-replicated instance
Before adding Isis2 mechanisms, build a single instance of your server and test access to it from your client, or whatever else might access it
Debug this code. Adding Isis2 functionality after the fact isn’t
hard and may really be much easier than using it from the outset!
But planning ahead can help: using C#, C++/CLI or IronPython will be far easier than other options
17
… some people get stuck at step 1! Building a cloud-hosted service involves
steps you may never have tried before For example, with Visual Studio, you create
a new kind of service instance, and need to tell it which methods are remotely available, and then need to install and run it on your machine, which may involve firewall changes to allow clients to access it…
… Isis2 can’t really help with that. But it isn’t terribly hard. Just do a small step at a time following the instructions for the package you decide to use
18
Why not native C or native C++? We do support a stripped-down way to use Isis2
from these and other languages, but you only can access part of the functionality
We’ll discuss these features later, but not today
If you know Java, you will find it easy to migrate to C#, and we would recommend that route C# works on both Windows and Linux (via Mono)
19
Designing the replicated functions Now you are ready to design the replication
features of your service Ask what data needs to be replicated? How will it be loaded initially? How will it be
updated? Will you do load-balanced queries (if so they can just
access any service member), or parallel queries? What synchronization or coordination is needed?
Start with a good image of how you want it to work
20
State machine model Think about a state machine
Some sort of object or class, with state in it (data) The state is updated deterministically with events
occur (updates…), and it can be checkpointed and later loaded back in from the checkpoint
With Isis2 we simply replicate a state machine! You design an object or subsystem that has the
replicated data in it The updates become multicasts, totally ordered,
and also group membership changes
21
Replication in pictures By the time you are ready to code, you’ll
have a picture similar to this one, specific to your setting
Update the monitoring and alarmscriteria for Mrs. Marsh as follows…
Confirmed
Response delay seen by end-user would
include Internet latencies
Service response
delay
Service group
Hosted on EC2
Ordered multicast
to replicate update
22
From this picture… Now your needs are more concrete
How will I represent the replicated state in my group? How can I copy the current data to a joining member? I don’t want to reply to the client until all the replicas
have been updated. What’s the best way to do that? I’m worried about speed. What will limit response
delays? How fast can a service like this run? This set of MOOC modules will help you answer
such questions, but experiments and optimization of your solution are certainly needed too…
23
(some data…)
Let’s replicate!
24
How would you solve this problem? Assume you have access to a client-
server technology, like the ones used to build web services
You would need a list of group members Then when an update happens, the
program receiving the request could update the others
25
Replication version 0
X=5 X=5 X=5 X=5
Set x=x+150 X=155 X=155
X=155
X=155
26
Things you’ll need to think about
Keep track of group membership. Can it change? If new members join, how do they learn the initial value of
x, or other “replicated state”? When someone joins, how do we update the group
membership list? If a member fails, how are they dropped from the group?
Implement the 1-to-(N-1) “multicast” Handle reliability issues: what if a request times out? Handle security: should we use SSL? Something else? What if two conflicting updates happen concurrently?
27
This sounds hard! … in fact it is very hard
Any form of replication is equivalent to a famous problem called distributed consensus
There are many published papers on this topic; you’ll want to use a good solution
Once you solve the basic issues you’ll still face many challenges of performance, scale, portability, respecting rules for the particular runtime setting…
28
We say that replicated data is consistent if multiple users accessing it can’t detect whether or not it was replicated.
Key Concept: Consistency
29
Consistent replicated data
X=5 X=5 X=5 X=5
Set x=x+150 X=155 X=155
X=155
X=155
Set x=22
X=22X=22
X=22
X=22
30
Inconsistent replicated data
X=5 X=5 X=5 X=5
Set x=x+150 X=155 X=155
X=155
X=155
Set x=22
X=22X=22
X=22
X=22
Thinks x=155Think
s x=22
Thinks
x=22
Thinks
x=22
Isis2 never lets this happen. It automatically
enforces ordering for you
31
Possible sources of inconsistency? Conflicting updates could reach members in different
orders
Maybe someone dropped an update (network message loss), or applied one twice.
Update sender could have failed while sending the updates, so that some copies were sent, but others weren’t sent
Confusion about membership: perhaps some member was joining and the update initiator didn’t realize it, hence didn’t send the update to it, or the initial state was “wrong”
32
A subtle risk Failure handling poses hard problems!
Suppose that process A is supposed to send an update reliably to processes B, C and D
A might use TCP, or some other reliable protocol. But if process C fails, A needs to give up.
What if the network temporarily fails, and A can’t reach C? This is indistinguishable from C failing, except that later, C will be accessible again!
33
Split brain syndrome We say that the network “partitioning”
issue causes a “split brain” problem
Some members of our group think that A and B the healthy members. Others think that C and D are healthy and that A and B have failed. Which “side” is right? Who should be in charge if this group is doing
something critical, like deciding which plane can land on a particular runway?
34
Split brain syndrome
Inconsistency can be very dangerous!
Flight AA 27 clear to land on
runway 3-B
Flight US 1827 clear to take off on runway 3-B
35
Split brain syndromeFlight US 1827 clear to take off on runway 3-B
A B C D
Consider this air traffic control “server”. It helps make sure runways are used safely.
36
Split brain syndrome
Flight AA 27 clear to land on
runway 3-B
Flight US 1827 clear to take off on runway 3-B
A B C D
Temporary network failure: A and B can talk to each other but can’t reach
C or D, and vice-versa
Isis2 never lets this happen. It automatically
prevents split-brain behavior
37
Other settings that need consistency Medical systems that give information about
patient status and current drug regimen Smart power grid system that controls power
infrastructure components such as transformers Self-driving vehicles that coordinate while driving
at high speed on highways Control systems for chemical plants and refineries Corporate accounting systems, and payroll
… you can make quite a long list.
38
Cloud computing: CAP theorem The most common tools for building cloud computing
solutions don’t help with consistency They treat the property as a special need and assume
you’ll find your own ways to do this This reflects the so-called CAP theorem
“You can have at most two out of these three:Consistency, Availability and Partition Tolerance.” Eric Brewer, Berkeley
Cloud platforms assume you are not sophisticated enoughto make tradeoffs. They weaken consistency to get the fastest possible response times under the widest possible range of operating conditions.
39
By using a preexisting library like Isis2 you don’t need to solve these problems yourselfThe system solves them for you
Isis2 to the rescue!
In Egyptian mythology, Isis rescued Osiris after he was torn to pieces in an epic battle with Horus. She restored him to life and he went on to rule the underworld. Their child,
Set, later defeated Horus and banished him. The story illustrates the value of fault-
tolerance
40
Isis2 makes developer’s life easier
Formal model permits us to achieve correctness
Isis2 is based on protocols that can be expressed mathematically and proved correct
Think of Isis2 as a collection of modules, each with rigorously stated properties
Isis2 implementation needs to be fast, lean, easy to use
Developer must see it as easier to use Isis2 than to build from scratch
Seek great performance under “cloudy conditions”
Forced to anticipate many styles of use
Benefits of Using Formal model
Importance of Sound Engineering
Isis2 makes developer’s life easier
Group g = new Group(“myGroup”);Dictionary<string,double> Values = new
Dictionary<string,double>();g.ViewHandlers += delegate(View v) {
Console.Title = “myGroup members: “+v.members;};g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v;};g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]);};g.Join();
g.OrderedSend(UPDATE, “Harry”, 20.75);
List<double> resultlist = new List<double>();nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);
First sets up group
Join makes this entity a member. State transfer isn’t shown
Then can multicast, query. Runtime callbacks to the “delegates” as events arrive
Easy to request security (g.SetSecure), persistence
“Consistency” model dictates the ordering aseen for event upcalls and the assumptions user can make
41
Isis2 makes developer’s life easier
Group g = new Group(“myGroup”);Dictionary<string,double> Values = new
Dictionary<string,double>();g.ViewHandlers += delegate(View v) {
Console.Title = “myGroup members: “+v.members;};g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v;};g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]);};g.Join();
g.OrderedSend(UPDATE, “Harry”, 20.75);
List<double> resultlist = new List<double>();nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);
First sets up group
Join makes this entity a member. State transfer isn’t shown
Then can multicast, query. Runtime callbacks to the “delegates” as events arrive
Easy to request security (g.SetSecure), persistence
“Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make
42
Isis2 makes developer’s life easier
Group g = new Group(“myGroup”);Dictionary<string,double> Values = new
Dictionary<string,double>();g.ViewHandlers += delegate(View v) {
Console.Title = “myGroup members: “+v.members;};g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v;};g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]);};g.Join();
g.OrderedSend(UPDATE, “Harry”, 20.75);
List<double> resultlist = new List<double>();nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);
First sets up group
Join makes this entity a member. State transfer isn’t shown
Then can multicast, query. Runtime callbacks to the “delegates” as events arrive
Easy to request security (g.SetSecure), persistence
“Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make
43
Isis2 makes developer’s life easier
Group g = new Group(“myGroup”);Dictionary<string,double> Values = new
Dictionary<string,double>();g.ViewHandlers += delegate(View v) {
Console.Title = “myGroup members: “+v.members;};g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v;};g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]);};g.Join();
g.OrderedSend(UPDATE, “Harry”, 20.75);
List<double> resultlist = new List<double>();nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);
First sets up group
Join makes this entity a member. State transfer isn’t shown
Then can multicast, query. Runtime callbacks to the “delegates” as events arrive
Easy to request security (g.SetSecure), persistence
“Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make
44
Isis2 makes developer’s life easier
Group g = new Group(“myGroup”);Dictionary<string,double> Values = new
Dictionary<string,double>();g.ViewHandlers += delegate(View v) {
Console.Title = “myGroup members: “+v.members;};g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v;};g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]);};g.Join();
g.OrderedSend(UPDATE, “Harry”, 20.75);
List<double> resultlist = new List<double>();nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);
First sets up group
Join makes this entity a member. State transfer isn’t shown
Then can multicast, query. Runtime callbacks to the “delegates” as events arrive
Easy to request security (g.SetSecure), persistence
“Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make
45
46
Each replica is a completely standard program with its own private, local variables. Nothing is “physically shared”!So in particular, the variable “Values” is local: each program has its own copy of this variableThe different copies get the same messages, in the same order. This allows them to apply the same updates.
Key Concept: Local Data
47
C# Dictionary Generic Type C# supports generic types
Objects in which some other type is a parameter:
Dictionary<string,double> Values = new Dictionary<string,double>();
A C# Dictionary maps a key to a value. This Dictionary maps strings to doubles.
The variable name is Values. Each program has its own private copy.
Isis2 makes developer’s life easier
Group g = new Group(“myGroup”);Dictionary<string,double> Values = new
Dictionary<string,double>();g.ViewHandlers += delegate(View v) {
Console.Title = “myGroup members: “+v.members;};g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v;};g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]);};g.Join();
g.OrderedSend(UPDATE, “Harry”, 20.75);
List<double> resultlist = new List<double>();nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);
First sets up group
Join makes this entity a member. State transfer isn’t shown
Then can multicast, query. Runtime callbacks to the “delegates” as events arrive
Easy to request security (g.SetSecure), persistence
“Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make
48
Isis2 makes developer’s life easier
Group g = new Group(“myGroup”);Dictionary<string,double> Values = new
Dictionary<string,double>();g.ViewHandlers += delegate(View v) {
Console.Title = “myGroup members: “+v.members;};g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v;};g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]);};g.Join();
g.Send(UPDATE, “Harry”, 20.75);
List<double> resultlist = new List<double>();nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);
First sets up group
Join makes this entity a member. State transfer isn’t shown
Then can multicast, query. Runtime callbacks to the “delegates” as events arrive
Easy to request security (g.SetSecure), persistence
“Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make
49
50
Concept: A “multi-query” Our lookup is
Multicast to the group All members respond
A chance for parallelism Each can do part of the
job: e.g. search 1/nth of a database
Reduces response delays
Lookup “Harry” in the Ithaca phone directory
Front end
With n replicas... ... we get an n times speedup!
Names with Harry in them: ....
Isis2 makes developer’s life easier
Group g = new Group(“myGroup”);Dictionary<string,double> Values = new
Dictionary<string,double>();g.ViewHandlers += delegate(View v) {
Console.Title = “myGroup members: “+v.members;}; g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v;};g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]);};g.Join();
g.OrderedSend(UPDATE, “Harry”, 20.75);
List<double> resultlist = new List<double>();nr = g.OrderedQuery(ALL, LOOKUP, “Harry”, EOL,
resultlist);
First sets up group
Join makes this entity a member. State transfer isn’t shown
Then can multicast, query. Runtime callbacks to the “delegates” as events arrive
Easy to request security (g.SetSecure), persistence
“Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make
51
Tells Isis2 which kind of multicast order and durability you need
52
RPC versus multi-query In a remote
procedure call, the client asks the server to do something A message is sent Server replies in a
second message Result returned
much as from a local method
With a multi-query A multicast is sent Servers respond A list of results is
returned
If the group had N members when the query was done, we get N responses
53
But all the replies will be the same! In our example, Values is identical in all replicas
… so what would be the point in asking for ALL replies?
In fact Isis2 has many options You could just ask one group member to do the query.
With N members, the group can handle N times the load You could ask the k’th member to search just 1/k’th of
the Values array. This would be N times faster You could combine these ideas and treat your group as
if it had N members in N/K subsets of size K each…
Adding security: Just one line!
Group g = new Group(“myGroup”);Dictionary<string,double> Values = new
Dictionary<string,double>();g.ViewHandlers += delegate(View v) {
Console.Title = “myGroup members: “+v.members;};g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v;};g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]);};g.SetSecure(myKey);g.Join();
g.OrderedSend(UPDATE, “Harry”, 20.75);
List<double> resultlist = new List<double>();nr = g.OrderedQuery(ALL, LOOKUP, “Harry”, EOL, resultlist);
First sets up group
Join makes this entity a member. State transfer isn’t shown
Then can multicast, query. Runtime callbacks to the “delegates” as events arrive
Easy to request security, persistence, UNICAST_ONLY, etc
“Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make
54
55
Picking a multicast primitive Optimistic delivery
RawSend, Send, CausalSend, OrderedSend
Use Flush(k) prior to talking to an external entity like a client
Very fast, ideal for services with “soft state”
Very pessimistic SafeSend: a version of
Paxos, like OrderedSend but with stronger durability
Active “disk logger” for strongest assurance
Slow, but suitable for updating a replicated database or persistent external filesWe’ll discuss these issues in detail later
56
Our example was overly simple it didn’t show the “state transfer” code
Corresponds to the “white arrows” in time-line figure In Isis2 we have a way
to make checkpoints State transfer: Some
active member makes acheckpoint, and the joinerloads the state from it.
The code looks like other operations in our example Checkpoints can also be used to save group state
during periods when all members are inactive
p
q
r
s
t
Time: 0 10 20 30 40 50 60 70
Time
57
Some uses for process groups To replicate data
maintained by the members in memory
To replicate actions taken on an external service such as a replicated database
To ensure that all replicas are configured the same way
To coordinate the processing of requests and load-balance
To offer a way to parallelize processing by having each group member do part of the work
Fault-tolerance via a backup scheme
58
Isis2 Summary A library that you can invoke from a normal
program written in a normal way
It does the work of creating groups and sending multicasts and ensuring that the consistency model will be enforced
The developer just tells it what to do. She thinks about a parallel distributed application. Virtual synchrony eliminates many hard problems
59
Why not build it yourself from scratch?
Isis2 user
object
Isis2 user
object
Isis2 user
object
Isis2 library
Group instances and multicast protocolsFlow Control
Membership Oracle
Large Group Layer UDP overlayDr. Multicast Platform Security
Reliable Sending Fragmentation Group Security
Sense Runtime EnvironmentSelf-stabilizing
Bootstrap ProtocolSocket Mgt/Send/Rcv
SendCausalSend
OrderedSendSafeSendQuery....
Message Library “Wrapped” locks Bounded Buffers
Oracle Membership
Group membership
Report suspected failures
Other groupmembers
SafeSend and Send are two of the protocol components hosted over what we call the large-
scale properties sandbox. The sandbox addresses issues like flow control, security, etc.
All protocols share and benefit from those properties
These systems are complex, especially if you want to run on platforms like EC2
By using Isis2 you “inherit” 30 years of research on how to make it work
OOB File Replication Key-Value Store
Views
The SandBox itself is mostly composed of “convergent” protocols that use
probabilistic methods
60
Isis2: Cornell’s cloud solution
Isis2 makes it easy to write programs in a standard way, using the methods in its library, that replicate data and can coordinate their actions The system automates tasks needed to
ensure consistency Without the system, consistency, security
and other such issues are very hard to solve But you still need to decide where to run
your programs, make sure they can find one-another, etc. Isis2 focuses mostly on keeping them consistent.