10
Towards Dynamic Collaboration Architectures Goopeel Chung Westfield State College Department of Computer and Information Science Westfield, MA 01086 413 572 5714 [email protected] Prasun Dewan University of North Carolina Department of Computer Science Chapel Hill, NC 27599 919 962 1823 [email protected] ABSTRACT In this paper, we introduce the concept of dynamically changing between centralized, replicated, and hybrid collaboration architectures. It is implemented by providing users a function that dynamically changes the mapping between user-interface and program components. We decompose the function into more primitive commands that are executed autonomously by individual users. These commands require a mechanism to dynamically replicate user-interface and program components on a user’s site. We present a logging approach for implementing the mechanism that records input (output) messages sent to one incarnation of a program (user-interface) component, and replays the recorded messages to a different incarnation of the component. Preliminary experiments with an implementation of the mechanism show that response and completion times can improve by dynamically changing the architecture to adapt to changes to the set of users in a collaboration session involving a mix of mobile and stationary devices. Categories and Subject Descriptors C.2.4 [Distributed Systems]: Distributed applications, client/server. C.4 [Performance of Systems]: Design studies, performance attributes. C.5.5 [Servers] General Terms Algorithms, Measurement, Performance, Design. Keywords Collaboration architecture, application sharing, latecomers, mobile collaboration, ad-hoc collaboration. 1. INTRODUCTION An important issue in the design of a collaborative system is its architecture [1], which determines the nature of the components of the system and the placement of these components on the computers of the various users participating in a collaborative session. In general, each of these computers executes a user- interface component (such as a window system or a toolkit), which communicates with a local or remote program component depending on whether the architecture is replicated or centralized (Figure 1). In the centralized architecture, all of the users interact with a single program, which resides on one of the users’ workstations. The program processes each user’s input, and distributes all output to all of the users’ workstations, thereby creating a replica of the user interface on each of these workstations. The replicated architecture, on the other hand, replicates the program on all of the users’ workstations, and has each user interact with the local program replica. It synchronizes the program replicas by broadcasting each user’s input to all of them. Client Program Output Input (a) Centralized (b) Replicated User Interface Component 1 User Interface Component 2 Program Replica User Interface Component 2 Program Replica User Interface Component 3 Figure 1. Centralized and Replicated Architectures Each architecture has its pros and cons [1]. An advantage of the replicated architecture is that the “replicas” need not share their complete state if programmer-defined code is defined to multicast selected input events. A disadvantage is that each replicated operation must be idempotent, that is, multiple executions of it must be equivalent to a single execution. Another disadvantage is that, in general, floor control is needed to ensure consistency, though in some cases application-specific operation transformations can be used instead. Qualitative and experimental comparisons of the centralized and replicated architectures [2, 3] have found that each architecture offers better performance under certain conditions. In particular, the centralized architecture offers better performance when (a) the computer on which the central program component executes is much more powerful than the Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CSCW’04, November 6-10, 2004, Chicago, Illinois, USA. Copyright 2004 ACM 1-58113-810-5/04/0011…$5.00. Volume 6, Issue 3 1

Towards dynamic collaboration architectures

Embed Size (px)

Citation preview

Towards Dynamic Collaboration Architectures Goopeel Chung

Westfield State College Department of Computer and Information Science

Westfield, MA 01086 413 572 5714

[email protected]

Prasun Dewan University of North Carolina

Department of Computer Science Chapel Hill, NC 27599

919 962 1823

[email protected]

ABSTRACT

In this paper, we introduce the concept of dynamically changing between centralized, replicated, and hybrid collaboration architectures. It is implemented by providing users a function that dynamically changes the mapping between user-interface and program components. We decompose the function into more primitive commands that are executed autonomously by individual users. These commands require a mechanism to dynamically replicate user-interface and program components on a user’s site. We present a logging approach for implementing the mechanism that records input (output) messages sent to one incarnation of a program (user-interface) component, and replays the recorded messages to a different incarnation of the component. Preliminary experiments with an implementation of the mechanism show that response and completion times can improve by dynamically changing the architecture to adapt to changes to the set of users in a collaboration session involving a mix of mobile and stationary devices.

Categories and Subject Descriptors C.2.4 [Distributed Systems]: Distributed applications, client/server. C.4 [Performance of Systems]: Design studies, performance attributes. C.5.5 [Servers]

General Terms Algorithms, Measurement, Performance, Design.

Keywords Collaboration architecture, application sharing, latecomers, mobile collaboration, ad-hoc collaboration.

1. INTRODUCTION An important issue in the design of a collaborative system is its architecture [1], which determines the nature of the components of the system and the placement of these components on the computers of the various users participating in a collaborative

session. In general, each of these computers executes a user-interface component (such as a window system or a toolkit), which communicates with a local or remote program component depending on whether the architecture is replicated or centralized (Figure 1). In the centralized architecture, all of the users interact with a single program, which resides on one of the users’ workstations. The program processes each user’s input, and distributes all output to all of the users’ workstations, thereby creating a replica of the user interface on each of these workstations. The replicated architecture, on the other hand, replicates the program on all of the users’ workstations, and has each user interact with the local program replica. It synchronizes the program replicas by broadcasting each user’s input to all of them.

Client Program Output

Input

(a) Centralized

(b) Replicated

User Interface Component 1

User Interface Component 2

Program Replica

User Interface Component 2

Program Replica

User Interface Component 3

Figure 1. Centralized and Replicated Architectures Each architecture has its pros and cons [1]. An advantage of the replicated architecture is that the “replicas” need not share their complete state if programmer-defined code is defined to multicast selected input events. A disadvantage is that each replicated operation must be idempotent, that is, multiple executions of it must be equivalent to a single execution. Another disadvantage is that, in general, floor control is needed to ensure consistency, though in some cases application-specific operation transformations can be used instead. Qualitative and experimental comparisons of the centralized and replicated architectures [2, 3] have found that each architecture offers better performance under certain conditions. In particular, the centralized architecture offers better performance when (a) the computer on which the central program component executes is much more powerful than the

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CSCW’04, November 6-10, 2004, Chicago, Illinois, USA. Copyright 2004 ACM 1-58113-810-5/04/0011…$5.00.

Volume 6, Issue 3 1

other computers participating in the collaboration, (b) the central site is connected by fast network connections to each of the other sites, (c) the user at the central site provides a proportionately large amount of input, and (d) the number of users is large.

Thus, designers of commercial collaborative systems have chosen the architecture based on the kind of collaborations they target. For example, Groove chooses the replicated “peer to peer” architecture as it is designed for highly interactive collaborations among small groups of users who may have slow or intermittent connectivity to each other. LiveMeeting, on the other hand, offers a central architecture as it is designed for presentations to large numbers of users connected by relatively high-speed networks. Some systems such as WebEx offer both architectures depending on the collaborative tool. A more interesting approach to mix the two architectures is to offer the hybrid architecture [4], which can consist of both centralized and replicated sub-architectures (Figure 2). Such an architecture allows the program component to both be replicated and serve multiple user-interface components. The replicas together form a replicated sub-architecture, and all user-interface components connected to a replica form a centralized sub-architecture. As the replicated and centralized architectures are special cases of this architecture, a system supporting it gives users the flexibility to statically configure the architecture of a collaborative tool at the start of a collaborative session with the tool. This is more flexible than binding the architecture at tool design time.

Figure 2. Hybrid Architecture An even more flexible approach is to allow the architecture of the collaborative tool to change dynamically at execution time. There are several reasons for exploring this alternative. Here we will focus on changing the architecture in response to changes to the set of users interacting with the tool. As mentioned above, the ideal architecture depends on network connections, computers, distribution of user input, and number of users. All four conditions can change as new users enter or leave the collaboration. To illustrate, consider a collaborative session started between user1

and user2, where user1 has a much more

powerful computer. Then we might initially configure the centralized architecture of Figure 1(a). Now, suppose user1 is replaced by user3 whose computer is as powerful as that of user2’s. If the architecture were static, then user1’s computer would need to continue participating in the session as it hosts the central program component. In a mobile collaborations scenario, it may not be possible for user1 to leave his computer in the collaboration as he might travel with his computer to some other task. It would be useful to support a transition to the replicated configuration of Figure 1(b), allowing user1’s computer to leave the session. In this paper, we present the motivation, design, implementation, and

evaluation of a (first-cut) mechanism for supporting dynamic transitions between instances of the replicated, centralized, and hybrid architectures. These transitions make sense only for those collaborations whose replicated and centralized semantics are the same. Therefore, we assume each application operation is idempotent, executed by each replica, and issued by the user with the floor. Moreover, like most replicated systems, we assume each site that may execute a replica has the code to run it.

We have seen above motivation for transitioning from a centralized architecture to a replicated architecture. Section 2 presents a more detailed scenario motivating additional kinds of architectural transitions. Section 3 shows that a dynamic architecture requires a function that dynamically changes the mapping between user-interface and program components. Section 4 decomposes the function into more primitive commands that are executed autonomously by individual users. Section 5 shows that these commands require a mechanism to dynamically replicate user-interface and program components on a user’s site. Section 6 outlines a logging approach for implementing the mechanism that records input (output) messages sent to one incarnation of a program (user-interface) component, and replays the recorded messages to a different incarnation of the component. The idea of dynamic architecture adaptations is not entirely new – current systems provide a limited form of these adaptations. Section 7 compares our work with relevant previous research. Section 8 presents preliminary experiments with an implementation of the mechanism showing that response and completion times can improve by dynamically changing the architecture to adapt to changes to the set of users in a collaboration session involving mobile and stationary devices. Finally, Section 9 presents conclusions and future work.

Program Replicas

Centralized Sub-architecture

Centralized Sub-architecture

Replicated Sub-architecture

2. PERVASIVE COLLABORATION There are several reasons for supporting dynamic architecture changes – the one we use to motivate our work here is pervasive collaboration. The adjective “pervasive” is used today to qualify many kinds of activities. For example, there is work today to bring us pervasive computing, communication, and file sharing. In all of these cases, the goal is to support the activity independent of location, device (e.g. cellphone, palmtop, handheld, laptop), network (e.g. LAN, WAN, Ad-hoc network), and application (e.g. whiteboard, chat). Thus, we use the term pervasive collaboration to refer to collaboration that is independent of these factors. Current collaboration infrastructures do not support such sharing. Therefore, we use our imagination in the scenario given below to illustrate the shape such sharing can take. Assumpta, Benu, Cathy, and Dimitri are sharing a whiteboard and a discussion tool (Figure 3) to work out the layout of the demonstrations and Internet-access room at the upcoming pervasive collaboration conference. The layout is the joint responsibility of Assumpta and Benu. Phase 1: Desktop-based collaboration: Both of them are currently at UNC but are about to catch a flight to take them to the conference, where they will finalize the layout. They have together created an initial layout, and are currently receiving the opinions of Cathy of University of Wisconsin and Dimitri of Purdue, who designed the Internet-access room at the previous

2

Figure 3. Sharing a Drawing and Chat Tool conference. The four collaborators take turns in giving comments and reacting to them. All of them use (state-of-the-art) desktop computers. As all computers are equally powerful, the number of users is small, all of them provide input, and they are at three different sites connected by wide-area connections, they use the replicated architecture (Figure 4). Phase 2: Ad-hoc Handheld-based collaboration: As the discussion winds down, Assumpta and Benu realize it is time to drive to the airport. The four users go offline, temporarily suspending the coupling among their screens. In addition, Assumpta and Benu transfer their session state to the handheld computers they will take to the conference. Once they are seated in the airplane, they resume their collaboration to incorporate the changes suggested by Cathy and Dimitri, using a wireless ad-hoc network for communication. As the two computers are of equal (but lesser than before) computing power, they again use a replicated architecture (Figure 5). However, it is a different instance of the replicated architecture than the one used in the previous phase (Figure 4). Thus, this phase change is an example of the Replicated Replicated dynamic architecture transition.

Figure 4. Replication for Wide-Area Desktop Sharing

Assumpta@Airplane

Benu@ Airplane

Program ProgramProgram Program

Figure 5. Replication for Ad-Hoc Handheld Collaboration

Phase 3: Multi-device Collaboration: When they reach the hotel, they continue to use their handhelds, and include Cathy and Dimitri again in the collaborative session, and also Enrique, who is the hotel representative and uses his state-of-the-art laptop. Cathy is at her daughter’s soccer game, using her palmtop, while Dimitri is attending his son’s swimming lessons, and is using his cell phone. Because the palmtop does not have a keyboard, Cathy uses it mainly for the drawing tool and an audio-based chat. Similarly, because the cell phone does not have a pointing device, Dimitri uses it mainly for chatting and viewing a cell-phone-adapted view of the drawing. (Though offering reduced interaction in comparison to desktop computers, the palmtop and

cell phone offer more capabilities than the alternative of a traditional phone.) As the two remote users are connected by wide-area connections, they have local replicas (Figure 6). As the three local users are connected by fast connections and one of them has a much more powerful machine, they use a centralized sub-architecture rooted at the laptop (Figure 6). Thus, this phase change is an example of the Replicated Hybrid dynamic architecture transition. Phase 4: Incremental Leaving of Users Hosting Replicas: Cathy and David now leave with their replicas (Figure 7), using the Hybrid Hybrid, and Hybrid Centralized transitions. Next, Enrique decides to leave. In existing centralized systems, the user hosting the central program component is not allowed to leave until the collaboration ends. The Centralized Replicated transition, however, allows the architecture to change to the replicated architecture of Figure 8, allowing Enrique to leave.

Enrique@ Hotel

ProgramProgram Program ProgramProgram Program

Cathy@ Soccer

Dimitri@ Pool

Assumpta@Hotel

Benu@ Hotel

Figure 6. Hybrid Multi-Device Collaboration

Program

Enrique@ Hotel

Assumpta@Hotel

Benu@ Hotel

Figure 7. Users with Private Replicas Leave

Program Program

Assumpta@Hotel

Benu@ Hotel

Figure 8. User with Shared Replica Leaves No existing generic collaborative system supports any of these transitions. We describe below an approach for supporting transitions between arbitrary centralized, hybrid, and replicated architectures.

3. ARCHITECTURE ADAPTATION So far, we have used pictures to describe the three kinds of architectures and the transitions among them. Here, we do so more formally.

Let U be the set of user interface components created for the set of users in a collaborative session, that is, U = {ui | 1≤i≤n, n is the number of users, and ui is the user interface component running on useri’s computer.} Let P be the set of program replicas that can run on the computers, that is, P = {pi | 1≤i≤n, n is the number of users, and pi is the program replica that can run on useri’s computer.} An architecture implementing the collaborative session maps each user interface in U to one of the program

Assumpta @ UNC

Benu @ UNC

Cathy@ U. Wisc.

Dimitri@ Purdue

Program Program Program ProgramProgram Program Program Program

Volume 6, Issue 3 3

replicas in P for processing input and receiving output. Therefore, it is a many-to-one mapping ƒ from U to P, ƒ:U P, where ƒ(ui)=pj, ui∈U, pj∈P if pj generates output messages for ui in response to ui’s input messages. In order to synchronize all of the program replicas in ƒ(U), input messages from a user interface component ui are broadcast to all program replicas in ƒ(U).

However, not all possible such mappings are supported by our current mechanism. As mentioned above, we address only the set of centralized, replicated and hybrid architectures. An architecture in which a program component has remote user-interface components mapped to it but not the local user-interface component is not a member of this set, and thus not supported by us. More formally, we impose the following constraint on the architecture: pj∈ƒ(U)→ƒ(uj)=pj.

A mapping change can occur when a subset of user interface components gets mapped to program replicas different from the current ones. In addition, it can occur when the composition of users changes – i.e. when latecomers join or existing users leave the collaboration session. Given two consecutive mappings ƒi: Ui

Pi and ƒi+1: Ui+1 Pi+1, we formally define a mapping change ƒi ⇒ ƒi+1 as follows. ƒi ⇒ ƒi+1 iff Ui≠Ui+1 (and hence Pi≠Pi+1), or ∃ u∈(Ui∩Ui+1) such that ƒi(u)≠ƒi+1(u).

A collaboration session S supporting dynamic architecture transitions can be defined as a sequence of mappings S = <ƒ1, ƒ2, ƒ3,…. ƒm>, where ƒi ⇒ ƒi+1, 1 ≤ i < m. Such a session requires a function ti,i+1, that transforms ƒi to ƒi+1 – i.e. ƒi+1 = ti,i+1 o ƒi . To be general, it must allow transitions between arbitrary supported architectures.

4. USER COMMANDS Before we discuss how such a function may be implemented, let us consider the user commands for invoking it. One can imagine providing a single user command that specifies the new mapping. However, such a command has at least two problems. First, it is hard to use as the complete new mapping must be specified rather

than the mapping change. Moreover, it raises security concerns. For example, should I be able to change the mapping of your user-interface component? Instead of providing a single high-level mapping changing command ti,i+1, we provide five basic commands, which can be executed autonomously by the users. As we are targeting architecture adaptations in response to user-composition changes, these commands extend the traditional join/quit commands: Join as Slave (of Specific Master). Abbreviated as jsj,k, it is used when a latecomer userj wants to use a pre-existing up-to-date program replica pk. Become Master. Abbreviated as bmj, it is used when userj wants to use the local program replica pj instead of the current remote program replica. Join as Master. Abbreviated as jmj, it is used when a latecomer userj wants to use the local program replica pj. Become Slave (of Specific Master). Abbreviated as bsj,k, it is used when userj wants to switch from his current program replica to a remote up-to-date program replica pk. Quit. Abbreviated as qtj, it is used when userj leaves the collaboration session. We now show how the five basic commands can be used to make a transition between two arbitrary architectures supported by our approach. We illustrate our algorithm using Figure 9, which shows how these five commands are used to transition between the first and final architecture in the figure. The basic idea is to first employ a series, JM, of join-as-master commands, then a series, BM, of become-master commands, next a series, JS, of join-as-slave commands, and finally a series, BSQT, mixing become-slave and quit commands. More formally, given ƒi : Ui Pi, and ƒi+1 : Ui+1 Pi+1 , ti,i+1 = BSQT o JS o BM o JM. We formally describe below how these four series are chosen based on the old architecture, ƒi, and the new architecture, ƒi+1.

JM = jmg(1) o jmg(2) o…..o jmg(m), where pg(j) ∈ (ƒi+1(Ui+1) - Pi), m = | ƒi+1(Ui+1) - Pi |, and g(j1) ≠ g(j2) if j1 ≠ j2.

With JM (= jm6 in Figure 9), ti,i+1 identifies those latecomers that want to use local program replicas (user6 in Figure 9), and allows

p1

u1 u2

jm6

p1

u1 u2 u6

p6

ƒi jm6 oƒi

u3 u3

bm3

p1

u1 u2

bm3 o jm6 oƒi

u3

p3

js7,6

js7,6 o bm3 o jm6 oƒi

u7

qt2

p1

u1

qt2 o js7,6o bm3 o jm6 oƒi

u3

p3

bs1,3

u1

bs1,3 o qt2 o js7,6o bm3 o jm6 oƒi

u3

p3

p5

u4 u5 u6

p6

p1

u1 u2 u3

p3

u6

p6

u7u6

p6

u7u6

p6

p5

u4 u5

p5

u4 u5

p5

u4 u5

p5

u4 u5

p5

u4 u5

bs4,3

u1

bs4,3 o bs1,3 o qt2 o js7,6 o bm3 o jm6 oƒi

u3

p3

u7u6

p6p5

u4 u5

qt5

u1

ƒi+1 = qt5 o bs4,3 o bs1,3 o qt2 o js7,6o bm3 o jm6 oƒi

u3

p3

u7u6

p6

u4

Figure 9. Decomposing an architecture transition

4

them to join the collaboration session by dynamically replicating the user interface and program components on their computers, and mapping the user interface components to their respective local program replicas.

BM = bmg(1) o bmg(2) o…..o bmg(n), where pg(j) ∈ ((ƒi+1(Ui+1) ∩ Pi) - ƒi(Ui)), n = | (ƒi+1(Ui+1) ∩ Pi) - ƒi(Ui) |, and g(j1) ≠ g(j2) if j1 ≠ j2.

Before BM is applied, all of the program replicas in ƒi+1(Ui+1) have been synchronized except for those in ((ƒi+1(Ui+1) ∩ Pi) - ƒi(Ui)) – i.e. program replicas that are promoted from slaves in ƒi to masters in ƒi+1 (p3 in Figure 9). With BM (= bm3 in Figure 9), ti,i+1 dynamically synchronizes these remaining program replicas, and maps their respective local user interface components to them.

JS = jsg(1),g′(1) o jsg(2),g′(2) o…..o jsg(p),g′(p), where ug(j) ∈ SUi+1 = {ug(j) | ug(j) ∈ (Ui+1- Ui), ƒi+1(ug(j)) = pg′(j), and g(j) ≠ g′(j)}, p = |SUi+1|, and g(j1) ≠ g(j2) if j1 ≠ j2.

With JS (= js7,6 in Figure 9), ti,i+1 takes the other latecomers that want to use remote program replicas (user7 in Figure 9), and allows them to join the collaboration session by dynamically replicating the user interface components on their computers, and making appropriate mappings for the user interface components.

BSQT = bsqtg(1) o bsqtg(2) o ….. o bsqtg(q), where ug(j) ∈ SUi = { ug(j) | [ug(j) ∈ (Ui ∩Ui+1), ƒi(ug(j)) ≠ ƒi+1(ug(j)) = pg′(j), and g(j) ≠ g′(j)], or [ug(j) ∈ (Ui - Ui+1)] }, q = | SUi |, bsqtg(j) = bsg(j),g′(j) if ug(j) ∈ (Ui ∩Ui+1), bsqtg(j) = qtg(j) if ug(j) ∈ (Ui - Ui+1), j1 < j2 if ƒi(ug(j1)) = ƒi (ug(j2)) = pg(j1), and g(j1) ≠ g(j2) if j1 ≠ j2.

Finally, we have to execute become-slave commands to execute mapping changes for user interface components in Ui that are mapped to different remote program replicas in ƒi+1, and use quit commands to remove mappings for user interface components no longer in Ui+1. With BSQT (= qt5 o bs4,3 o bs1,3 o qt2 in Figure 9), ti,i+1 executes the mapping changes for the two kinds of user interface components. However, BSQT has to be careful in choosing the order in which it executes the mapping changes. This is because bs requires that, in order to change the mapping of a uj mapped to a local program replica (i.e. pj), uj should be the only user interface component mapped to pj. Similarly, qt requires that, in order to remove the mapping of a uj mapped to a local program replica (i.e. pj), uj should be the only user interface component mapped to pj. In order to honor the pre-conditions, BSQT carefully mixes bs’s and qt’s to make sure that functions for removing or changing remote mappings (= qt2 and bs4,3 in Figure 9) are applied before those involving local mappings (= bs1,3 and qt5 in Figure 9).

5. SHARED FUNCTIONS Each of these commands can be implemented independently, but this approach does not recognize that they must perform some common tasks. For example, both join-as-master and become-master commands must load a new program replica on the machine executing the command. We identify below six (implementation) functions that are shared by and sufficient to implement the five user commands.

The shared functions are:

ruj: dynamically replicates the user interface component on userj’s computer.

rpj: dynamically replicates the program on userj’s computer.

cmj,k: creates the mapping from uj to pk. rmj: removes the current mapping of uj.

addj: adds uj and pj to U and P respectively.

removej: removes uj and pj from U and P respectively. Using these sub-functions, we can form each of the five basic mapping-changing commands as follows.

jsj,k = cmj,k o addj o ruj: It dynamically replicates a user interface component to userj’s computer, adds uj and pj to U and P respectively, and maps uj to pk.

bmj = cmj,j o rmj o rpj: It dynamically synchronizes pj with other up-to-date program replicas, and changes the current mapping of uj to pj.

jmj = cmj,j o rpj o addj o ruj: It dynamically replicates a user interface component to userj’s computer, adds uj and pj to U and P respectively, dynamically synchronizes pj with other up-to-date program replicas, and finally maps uj to pj. Here, the order between rpj and addj may be swapped depending on the implementation. We execute addj first because our implementation of the function, which we also use for jsj,k, assumes a fresh copy of pj.

bsj,k = cmj,k o rmj: It changes the current mapping of up-to-date uj to up-to-date pk. qtj = removej o rmj: It removes the current mapping of uj, and removes uj and pj from U and P respectively.

In the case of both bsj,k and qtj , if uj is currently mapped to the local program replica (i.e. pj), uj should be the only user interface component mapped to pj because of the mapping constraint that ƒ(uj)=pj if pj∈ƒ(U), as discussed earlier.

To support dynamic joins and leaves, a centralized system must offer the cmj,k o addj o ruj and removej o rmj operations, respectively; and a replicated system must offer the cmj,j o rpj o addj o ruj and removej o rmj operations, respectively. The important and surprising result of the discussion above is that by decomposing these operations into the six shared sub-functions, and then composing them to create the five user commands, arbitrary transitions between centralized, replicated, and hybrid architectures can be supported! Thus our more general mechanism requires a more modular but not larger implementation.

The two complex shared sub-functions here are dynamic replication of the user-interface and program components. We consider in the next section how they are implemented.

6. DYNAMIC REPLICATION There are many possible approaches for implementing the two functions, with their pros and cons, as mentioned later. We describe here the approach we have implemented, whose novelty is its use of logging. It is designed to support dynamic architecture adaptations as an independent service for arbitrary client collaboration systems. Under this approach, dynamic adaptations are implemented by a software module called logger, which records I/O messages from and replays logged messages to the client system, which is called a loggable. Figure 10 shows the relationship between the logger and loggable.

Volume 6, Issue 3 5

Figure 10. Loggger/Loggable Relationship The logger is an extension of the distributor of [5] and thus is independent of the client collaboration system because it defines a generic format for the I/O messages. Like the client-specific translator of [5], the loggable is expected to translate between these messages and the actual I/O messages communicated between the program and user-interface components, as shown in Figure 10. Each site participating in the collaboration has a loggable and logger module. The loggable exchanges (generic) I/O messages with only the local logger – it never communicates directly with remote loggers. The logger modules on different computers communicate with each other to allow recorded I/O messages to be replayed at arbitrary sites. As we see below, the logger modules have a client-server relationship among themselves – a client module simply forwards (receives) a message to (from) a distinguished server module, which actually records (replays) the message. As we also see below, this relationship changes dynamically in response to architecture changes. Forwarding messages to a centralized logger does not diminish the advantages of the replicated architecture as logging is done asynchronously in a thread that is separate from the input-processing application thread. Thus, the user of a replica still gets the benefit of local processing of input. Nonetheless, it may be possible to improve the performance of the record and replay operations by creating a peer-to-peer logger architecture, which is beyond the scope of this work, as is the more ambitious goal of recursively using the logger to change its own architecture!

6.1 Dynamic User-Interface Replication In order to support dynamic replication of a user-interface component, the logger records output messages sent to existing user-interface components, and later, replays them to the new user-interface component, as shown in Figure 11. One of the loggers records in an output log the sequence of output messages sent to a user interface component (1). When a new

Figure 11. Output Logging and Replay

user-interface is to be created, the logger replays the recorded output messages to the latecomer’s user interface component through the logger-loggable pair attached to it (2). We have incorporated the two functions, logging and replaying of output messages, in a centralized module called output log, which is part of a distinguished logger called the output logger. We initially choose as the output logger the logger that starts the collaboration session. If the output logger leaves the collaboration session, we dynamically move the output log to one of the remaining loggers, and make it the output logger.

Loggable Logger

Program

User Interface Comp.

Generic I/O Messages

User Interface Comp.

The output logger creates a copy of every output message it receives from the local loggable, and sends the copy to the output log installed within it. The output log, in turn, records the output message. When a latecomer joins the collaboration session, the latecomer’s logger initiates replay of recorded output messages by sending a “replay” message to the output log. The output log, then, replays the recorded output messages to the new user interface component through the latecomer’s logger and loggable.

6.2 Program Replication Dynamic replication of a program component is similarly done except that we now record and replay input messages destined to program components instead of output messages destined to user-interface components.

Logger 2

Logger 1

User InterfaceComponent 2

New Replica

Existing Replica

User InterfaceComponent 1

input 3 input 1 input 2

(1)

(2)Input Log

input 3

input 1

input 2 Loggable 2Loggable 1

Figure 12. Input Logging and Replay As shown in Figure 12, an input log records input messages from the loggables (1). When a new program replica is to be brought up, the input log replays the recorded messages to the replica through the logger-loggable pair attached to it (2). During replay of input messages to the new replica, output messages from the replica to the local user-interface component must be suppressed, as the user interface may already be up-to-date. This happens, for example, if the local site executes the become-master command. If the local user-interface component must also be synchronized, which happens when the join-master command is executed at the site, then output log described above is replayed to it.

As in the output case, we have incorporated the two functions of logging and replaying input messages in a centralized module called an input log, which is part of a distinguished logger called the input logger. Moreover, as in the output case, we initially choose as the input logger the logger that starts the collaboration session. This works because this logger is a master logger initially, and thus has access to up-to-date input messages. It creates a copy of every input message it receives from the local loggable, and sends the copy to the input log installed within it. As in the output case, we must migrate the input log when the input logger’s site leaves the collaboration. In addition, we must also move it when the site gets demoted to a slave. The reason is that a slave site does not see all input events – in particular, it does not

Logger 2

Logger 1

User InterfaceComponent 1

Shared Program

Loggable 2

User Interface Component 2

Loggable 1

output 2 output 1 output 3

Latecomer

Output Log

output 3

output 2 output 1

(1)

(2)

6

receive input entered at remote sites. In summary, when the current input logger is demoted to a slave logger, or leaves the collaboration session, we choose as the next distinguished logger one of the master loggers in the next mapping, and transfer the input log to the new distinguished logger. As this discussion implies, the input and output loggers may reside at different sites.

6.3 Other Issues The logging approach described so far raises several issues, which we cannot comprehensively address in this paper because of space limitations. Below, we briefly describe them and our current approach to addressing them.

• We must ensure that a message is not sent to a dynamically replicated program or user-interface component before it is ready to receive the message. For example, an X client should not be sent an input event regarding a window before it has executed the code to create the window – otherwise, it can crash. Therefore, we allow the loggable to tell the logger when it is ready to receive the next replay message.

• As mentioned earlier, when replaying the input log to a dynamically replicated program component, its output messages must be discarded as the local user-interface component might already be up-to-date. On the other hand, output produced in response to new input fed to the program must be displayed. There is no general way to detect the transition from old to new output for two reasons. First, it is sometimes difficult to determine with which input event an output event is associated. Second, some clients produce different output events in response to the same set of input events as they compress output. Our current approach is to count the number of output messages to detect the transition. In order to deal with programs that do perform compression, we make some assumptions about the kind of compression done, which allow us accommodate all of the systems we studied.

• Recording and replaying all of the messages provided by the loggable is not efficient, practically limiting the use of logging to short-term collaboration sessions. Therefore, we use a technique to reduce the log size (and thus the time to replay it) to order of the number of “state objects” of an up-to-date software component [6]. By “state objects”, we mean the objects that, we deduce from logged messages, a software component maintains to implement its behavior. In addition, we rely on the loggable to only send state changing messages to the logger. For example, we expect the loggable to not send cursor positions if they do not change program state. Finally, we do not discard the input log at a site when it is demoted to a slave. Next time it is promoted to a master, it needs to receive only the input entered since it was demoted.

• Even with log compression, applications such as a chat program that must maintain the entire history of the collaboration can create a large log. Concurrent play/replay [5] ensures that collaboration is not paused when this log is replayed to a new replica. This technique ensures that the old replica continues to play live messages until the new replica has replayed all of the recorded messages, with the replay and play occurring concurrently.

7. RELATED WORK There has been significant research whose goal is to vary the architecture according to resource constraints. A good example in CSCW is DACIA [7], which addresses implementation of reconfigurable collaborative applications. Our work has specifically addressed automatic transitions between the centralized, hybrid, and replicated architectures. This idea is also not entirely new. Most collaborative systems, both centralized and replicated, support latecomer accommodation, essentially allowing the architecture to transition from one instance of the centralized or replicated architecture to another instance of the same kind of architecture. In the centralized case, however, the location of the central program replica does not normally change – most systems only allow new user-interface components to be added. An exception is the XTV shared X system, which allows dynamic migration of the centralized X program [5]. The Presence-AR spreadsheet of the Advanced Reality Corporation promises dynamic transition from the replicated to the centralized architecture as the number of users increases beyond some threshold.

The idea of using logging to support state replication is also not new, both in collaborative systems and the general area of distributed systems. For example, XTV uses logging for replicating both the user-interface component [8] and the program component [5]. Examples of log-based distributed systems include Coda [9] and Cygnus [10]. Coda was developed to support mobile access to shared data. When a mobile client disconnects from the network, it logs local changes, and transmits the log to other sites sharing the data when connection to the network is next established. Cygnus uses logging for fault tolerance. Instead of addressing total network disconnection, it addresses cases where a client’s network connection is still intact, but only the remote server providing a service to the client is no longer available. To prepare for possible disconnection from the current server, Cygnus logs client operations. When the current server is no longer available, Cygnus locates another server that provides the same functionality as the previous one, and allows the client to continue to use the functionality after synchronizing the new server through replay of the logged operations.

An alternative to log-based state creation is the image copy method, which copies the image of an incarnation of an object to a remote location. Operating systems supporting process migration use this approach. Process migration has been an active area of research for load balancing in distributed systems. Some of the primary factors considered for migrating processes are the processor loads and communication loads. When a processor in a distributed environment is overloaded, process migration is used to more or less balance loads on distributed processors by moving some processes on the overloaded processor to less utilized ones. Load balancing also tries to reduce communication traffic between processors by putting intensely communicating processes on the same site, thereby decreasing host-to-host communication. Many process migration systems such as MOS [11] use migration-aware kernels to automatically extract and ship process state information to the destination processor, where the process context is rebuilt before resuming execution.

The image copy method is more efficient than the logging method for lazy synchronization because the former transfers the object state, while the latter takes a round-about way of transferring what

Volume 6, Issue 3 7

causes state changes. However, process migration can be supported only in operating systems that support location-independent resource references (e.g. socket descriptors). Most current operating systems do not have this property - the operating systems that supported migration were research systems. It is questionable that users will take a rather radical step of replacing existing kernels for collaboration.

The image copy method has also been used to support latecomer accommodation in the Suite collaborative system [12], which automatically replicates the shared state on the computer of the latecomer. However, this approach assumes a non-standard architecture for a shared application that exposes the shared state to the collaboration system. The GroupKit [13] replicated collaborative system provides a more flexible but less automatic scheme for latecomer accommodation that requires an existing replica to transfer the collaborations state to a latecomer’s replica. The latecomer accommodation service is flexible in that the existing replica can support image-copy or logging or something that mixes the two. However, it has to be implemented manually by the application programmer.

From the point of view of this paper, the pros and cons of the image-copy and logging approach are not that important. What is more important is the idea of providing dynamic transitions between arbitrary replicated, hybrid, and replicated architectures, which none of the existing systems support. It is conceivable that a system supporting image-copy could also support such transitions.

8. EVALUATION There are at least two reasons for supporting dynamic architecture adaptations. First, they are necessary to dynamically create certain user configurations that would not be possible otherwise – for example, to allow the site hosting a central program component to leave the collaboration without destroying the session. Second, they can increase the performance of the collaboration – specifically, they can decrease the task completion and response times. We describe below an experiment to verify this claim.

In this experiment, we considered a collaborative session in which a third user with a more powerful computer dynamically joins a session involving two existing users in which the application is centralized on the first user’s computer. We compared two cases: one in which the architecture remains centralized and another where the central program moves to the new user’s computer. We refer to them as the centralized and adaptive cases, respectively. Both cases are dynamic in that the user-interface component must be dynamically replicated. However, the second case is more dynamic in that it involves the program component to migrate. Either case can be used to support the desired user-configuration change. The purpose of this experiment was to show that, of the various architectural transitions supporting a user-configuration change, some can perform better than others.

To get variance in computing powers, we used three different kinds of computers for user1, user2, and user3, respectively: a 206MHz palmtop, a 133MHz laptop, and a 667MHz desktop. The palmtop runs SavaJe, a Java-based operating system, which actually makes it the slowest of the computers we used. In particular, its Java execution registered a performance about two times slower than that of the laptop computer in our experiment.

What are important here are the relative rather than absolute processing powers of the three computers. In this experiment, we have one computer that is twice and another that is ten times the effective performance of the slowest computer, which one can expect in a collaboration involving mobile and desktop computers.

As the computing power matters mainly when the application is compute-intensive, we chose a checkers program that evaluated moves three levels deep. For the task, we assumed that the game is played by the first two users (against the computer) and the latecomer simply observes it. Our next challenge was to devise a task sequence log that could be used to compare the two architecture cases. The log matters because the task completion time depends on the relative number of commands executed by the users. The only log available to us was one from MITRE [14], but it was for a drawing task. We morphed this log to our task by mapping drawing operations to checkers operations. In both architecture sequences, the third user joins the session right after the 7th input message is processed. In the absence of a wide variety of standard collaboration benchmarks, this is the best we could do. In particular, we believe our approach is better than creating synthetic logs, which is an accepted method of evaluating performance of collaborative/distributed systems. Moreover, as the goal of this paper is to argue for the general need to research dynamic adaptations rather than to be the definite word in this area, we believe our approximation is useful. As mentioned above, improving performance is only one reason for supporting dynamic architecture transitions. User-configuration changes is another, perhaps more compelling, reason. In fact, as also mentioned above, accommodating a new user in this experiment requires a dynamic architecture transition in both the centralized and adaptive cases. Nonetheless, this experiment makes a case for creating a wide variety of collaboration benchmarks, which will help determine which of the many possible dynamic transitions should be chosen.

Another important experimental factor, as mentioned before, is the network delay between the various computers. All three computers were connected by a 10 Mbps LAN. However, we simulated LAN, WAN and modem connections from the latecomer to the other two computers by adding appropriate network delays to the messages communicated with it. To base the delays on reality, we used the average ping round trip times to actual user sites. We considered four cases: (a) LAN: The third user is directly connected to the local LAN. In this case, we added no extra delay to the messages communicated with it. (b) Germany: The third user is directly connected to a network in Germany. In this case we imposed a 72ms delay, which is half of the average round trip time between our LAN and a site in Germany. (c) Germany through modem: The third user is connected to a network in Germany through a modem connection. To determine the appropriate delay to add in this case, we measured the average round trip time of the modem connection to our LAN, and added its half to the Germany delay, which comes to 162ms. (d) India: The India case was the same as the Germany case except that the third user was assumed to be in India. The network delay in this case is 370ms.

Figure 13 shows the results of the experiment. In particular, it shows that, based on the network delay, the centralized or adaptive case offers better performance. It shows that the adaptive case performs better over a small range of network delays. Had

8

0

5000

10000

15000

20000

25000

30000

0 ms 72 ms 162 ms 370 msnetw ork delay from desktop to the others

task

com

plet

ion

time

& ar

chite

ctur

eco

nfig

urat

ion

time

(ms)

centralizedadaptive

centralized 10097.70 10183.96 10216.10 10955.63adaptive 8222.23 12047.66 16686.53 28114.20

the latecomer also provided input instead of simply being an observer, this range would have increased. On the other hand, had the replica been originally centralized on user2’s computer, which is twice as fast as user1’s computer, this range would have decreased.

Figure 13(d) compares architecture re-configuration costs included in the task completion times. In the centralized case, the cost includes registering new users, and replaying the output log to update latecomer’s user interface. In the adaptive case, the cost includes all of the centralized case’s cost, plus replaying input log to update program replicas, and updating the information on dynamic mapping changes. As we see in the experiment, the differences between the two costs are low for this application and task sequence.

This experiment and discussion about it verify the claim that, of the various architectural transitions supporting a user-configuration change, some can perform better than others.

Another issue in the evaluation of this research is the programming cost of interfacing with our log-based mechanism. In order to determine this cost, we composed the logger with 11 different loggables, which include a text field editor, an outline editor, a multiuser design pattern editor, an object-based drawing editor, a Body Mass Index (BMI) calculator, the checkers program, a bike route calculator, JavaBeans, Java’s Swing, XTV, and Virtual Network Computing (VNC). Some of these, such as XTV and VNC, are collaborative systems to which we simply added the ability to make dynamic architecture transitions. Others such as Swing and JavaBeans are single-user systems, to which

0

100

0 ms 72 ms 162 ms 370 msnetw ork delay f rom desktop to the others

200

300

400

500

600

700

800

user

1's

resp

onse

tim

e (m

s)c entralizedadaptive

200

300

400

500

600

700

800

er2'

s re

spon

se ti

me

(ms) c entraliz ed

adaptiv e

0

100

0 ms 72 ms 162 ms 370 msnetw ork delay f rom des ktop to the others

us

centralized 284.50 286.80 291.39 288.89adaptive 170.26 265.20 431.93 723.00

centralized 2668.70 2797.30 2831.73 3399.30adaptive 3377.06 3712.90 3677.63 4782.06

centralized 159.63 159.04 156.56 163.58adaptive 98.18 211.09 347.66 690.75

(a) user1’s response time (b) user2’s response time

(c) task completion time

(d) architecture configuration time

Figure 13. Response, Task Completion and Architecture Configuration Times

Volume 6, Issue 3 9

we also added the ability to interact with multiple users. We measured the amount of code needed to translate between the specific I/O protocol supported by the loggable and the abstract protocol defined by the logger. In most cases, this code was less than 500 lines. The only exceptions were Swing and XTV, which required 1336 and 3600 lines of code, respectively. These numbers are high because of the large number of I/O calls supported by these systems and the fact that XTV and our logger are written in different languages – C and Java, respectively. For an approximate idea of how much effort would be required to manually implement dynamic transitions, we counted the lines of Java code written for the logger (7000). Thus, in most cases, the cost of interfacing with the logger is small in comparison to the cost of implementing its service.

Yet another issue is the size of the logs, which of course depends on the duration of the collaboration and the application. An experiment with an X bitmap editor found that after 2000 X requests had been made, the sizes of the uncompressed and compressed input logs were 150K and 60K, respectively. More importantly, the uncompressed log kept increasing linearly with the number of requests while the compressed log stabilized to 60K. Thus, log size is an important concern and the compression provided by object-based logging is effective.

9. CONCLUSIONS & FUTURE WORK This paper makes several contributions. It introduces, defines, and motivates the notion of making dynamic transitions between arbitrary centralized, replicated, and hybrid architectures, presents a set of individual user commands that are sufficient to specify all of these transitions, identifies a set of functions that are shared by the implementation of these commands, outlines a log-based approach to implementing the two most complex of these functions, compares this implementation approach with previous work, shows that dynamic architecture adaptations are necessary to enable certain kinds of user-configuration changes and can also improve the performance of the collaboration, and also demonstrates that the log-based implementation does not impose substantial overhead.

As indicated by the title of the paper, its goal is to motivate the research of dynamic adaptations as a first-class issue. There are several future directions including analytical equations for the performance of alternative architectures, a large range of benchmarks and experiments to validate these equations, policies for automatically choosing the architecture based on these equations and experiments, and ability to tolerate faults during log record and replay.

10. ACKNOWLEDGMENTS This research was funded in part by Microsoft and NSF grants ANI 0229998, EIA 03-03590, and IIS 0312328. The comments of the referees and Sasa Junuzovic strengthened the paper.

11. REFERENCES [1] Dewan, P., Architectures for Collaborative Applications,

Trends in Software: Computer Supported Cooperative Work, Volume 7, 1998, pages 165-194.

[2] Ahuja, S., Ensor, J.R., and Lucco, S.E. A comparison of application sharing mechanisms in real-time desktop conferencing systems. in ACM Conference on Office Information Systems. 1990.

[3] Chung, G. and Dewan, P. Flexible Support for Application-Sharing Architecture. in Proc. European Conference on Computer Supported Cooperative Work. 2001. Bonn.

[4] Lantz, K.A., et al., Reference Models, Window Systems, and Concurrency. Computer Graphics, April 1987. 21(2).

[5] Chung, G. and Dewan, P. A Mechanism for Supporting Client Migration in a Shared Window System. in Proceedings of the Ninth Conference on User Interface Software and Technology. October 1996.

[6] Chung, G., Dewan, P. and Rajaram, S., Generic and Composable Latecomer Accommodation Service for Centralized Shared Systems, IFIP Working Conference on Engineering for Human-Computer Interaction, September 1998.

[7] Little, R. & Prakash, A., Developing Adaptive Groupware Applications Using a Mobile Component Framework, Proceedings of ACM Computer Supported Cooperative Work, 2000.

[8] Chung, G., Jeffay, K., and Abdel-Wahab, H. Accommodating Latecomers in Shared Window Systems. IEEE Computer, January 1993. 26(1): p. 72-73.

[9] Kistler, J.J. and Satyanaraynan, M. Disconnected Operation in the Coda File System. ACM Transactions on Computer Systems, February 1992. 10(1): p. 3-25.

[10] Chang, R.N. and Ravishankar, C.V. A Service Acquisition Mechanism for Server-Based Heterogeneous Distributed Systems. in IEEE Transactions on Parallel and Distributed Systems. February 1994.

[11] Barak, A. and Litman, A. Mos: A Multicomputer Distributed Operating System. Software Practice and Experience, August 1985. 15(8): p. 725-737.

[12] Dewan, P. and Choudhary, R. A High-Level and Flexible Framework for Implementing Multiuser User Interfaces. ACM Transactions on Information Systems, October 1992. 10(4): p. 345-380.

[13] Roseman, M. and Greenberg, S. Building Real-Time Groupware with GroupKit, A Groupware Toolkit. ACM Transactions on Computer-Human Interaction, 1996. 3(1).

[14] Cugini, J., et al. Methodology for Evaluation of Collaboration Systems. in http://www.aantd.nist.gov/~icv-ewg/documents/meth_index.html.

10