Remote Control: Distributed Application Conﬁguration, Ma ...cseweb.ucsd.edu/~snoeren/papers/plush-lisa07.pdf · systems like UNIX [28] which deﬁned the standard abstractio ns

Remote Control: Distributed Application Configuration, Management, andVisualization with Plush

Jeannie Albrecht Ryan Braud, Darren Dao, Nikolay Topilski,Department of Computer ScienceChristopher Tuttle, Alex C. Snoeren, and Amin Vahdat

Williams College Department of Computer Science and [email protected] University of California, San Diego

{rbraud, hdao, ntopilsk, ctuttle, snoeren, vahdat}@cs.ucsd.edu

Abstract

Support for distributed application management in large-scale net-worked environments remains in its early stages. Although anumber of solutions exist for subtasks of application deployment,monitoring, maintenance, and visualization in distributed envi-ronments, few tools provide a unified framework for applicationmanagement. Many of the existing tools address the managementneeds of a single type of application or service that runs in aspe-cific environment, and these tools are not adaptable enough to beused for other applications or platforms. In this paper, we presentthe design and implementation of Plush, a fully configurableap-plication management infrastructure designed to meet the generalrequirements of several different classes of distributed applicationsand execution environments. Plush allows developers to specifi-cally define the flow of control needed by their computations usingapplication building blocks. Through an extensible resource man-agement interface, Plush supports execution in a variety ofenvi-ronments, including both live deployment platforms and emulatedclusters. To gain an understanding of how Plush manages differentclasses of distributed applications, we take a closer look at specificapplications and evaluate how Plush provides support for each.

1 Introduction

Managing distributed applications involves deploying, configur-ing, executing, and debugging software running on multiplecom-puters simultaneously. Particularly for applications running onresources that are spread across the wide-area, distributed ap-plication management is a time-consuming and error-prone pro-cess. After the initial deployment of the software, the applicationsneed mechanisms for detecting and recovering from the inevitablefailures and problems endemic to distributed environments. Toachieve availability and reliability, applications must be carefullymonitored and controlled to ensure continued operation andsus-tained performance. Operators in charge of deploying and manag-ing these applications face a daunting list of challenges: discov-ering and acquiring appropriate resources for hosting the applica-tion, distributing the necessary software, and appropriately con-figuring the resources (and re-configuring them if operatingcondi-

tions change). It is not surprising, then, that a number of tools havebeen developed to address various aspects of the process in dis-tributed environments, but no solution yet flexibly automates theapplication deployment and management process across all envi-ronments.

Presently, most researchers who want to evaluate their applicationsin wide-area distributed environments take one of three manage-ment approaches. On PlanetLab [6, 26], service operators addressdeployment and monitoring in anad hoc, application-specific fash-ion using customized scripts. Grid researchers, on the other hand,leverage one or more toolkits (such as the Globus Toolkit [12])for application development and deployment. These toolkits oftenrequire tight integration with not only the infrastructure, but theapplication itself. Hence, applications must be custom tailored fora given toolkit, and can not easily be run in other environments.Similarly, system administrators who are responsible for configur-ing resources in machine-room settings often use remote executiontools such as cfengine [9] for managing and configuring networksof machines. As in the other two approaches, however, the config-uration files are tailored to a specific environment and a particularset of resources, and thus are not easily extended to other plat-forms.

Motivated by the limitations of existing approaches, we believethat a unified set of abstractions for achieving availability, scala-bility, and fault tolerance can be applied to a broad range ofdis-tributed applications, shielding developers from some of the com-plexities of large-scale networked environments. The primary goalof our research is to understand these abstractions and define in-terfaces for specifying and managing distributed computations runin anyexecution environment. We are not trying to build anothertoolkit for managing distributed applications. Rather, wehope todefine the way users think about their applications, regardless oftheir target platform. We took inspiration from classical operatingsystems like UNIX [28] which defined the standard abstractionsfor managing applications: files, processes, pipes, etc. For mostusers, communication with these abstractions is simplifiedthroughthe use of a shell or command-line interpreter. Of course, dis-tributed computations are both more difficult to specify, because

1

of heterogeneous hardware and software bases, and more difficultto manage, because of failure conditions and variable host and net-work attributes. Further, many distributed testbeds do notprovideglobal file system abstractions, which complicates data manage-ment.

To this end, we present Plush [27], a generic application man-agement infrastructure that provides a unified set of abstractionsfor specifying, deploying, and monitoring distributed applications.Although Plush was initially designed to support applications run-ning on PlanetLab [2], Plush now provides extensions that allowusers to manage distributed applications in a variety of comput-ing environments. Plush users describe distributed computationsusing an extensible application specification language. Incontrastto other application management systems, however, the languageallows users to customize various aspects of the deploymentlifecycle to fit the needs of an application and its target infrastruc-ture. Users can, for example, specify a particular resourcedis-covery service to use during application deployment. Plushalsoprovides extensive failure management support to automaticallyadapt to failures in the application and the underlying computa-tional infrastructure. Users interact with Plush through asimplecommand-line interface or a graphical user interface (GUI). Addi-tionally, Plush exports an XML-RPC interface that allows users toprogrammatically integrate their applications with Plushif desired.

Plush provides abstractions for managing resource discovery andacquisition, software distribution, and process execution in a vari-ety of distributed environments. Applications are specified usingcombinations of Plush application “building blocks” that define acustom control flow. Once an application is running, Plush moni-tors it for failures or application-level errors for the duration of itsexecution. Upon detecting a problem, Plush performs a number ofuser-configurable recovery actions, such as restarting theapplica-tion, automatically reconfiguring it, or even searching foralternateresources. For applications requiring wide-area synchronization,Plush provides several efficient synchronization primitives in theform of partial barriers, which help applications achieve better per-formance and robustness in failure-prone environments [1].

The remainder of this paper discusses the architecture of Plush.We motivate the design in Section 2 by enumerating a set of gen-eral requirements for managing distributed applications.Section 3details the design and implementation of Plush. We provide spe-cific application case studies and uses of Plush in Section 4 beforediscussing related work in Section 5, and wrapping up in Section6.

2 Application Management Requirements

To better understand the requirements of a distributed applicationmanagement framework, we first consider how we might run aspecific application in a widely-used distributed environment. Inparticular, we investigate the process of running SWORD [23],a publicly-available resource discovery service, on PlanetLab.SWORD uses a distributed hash table (DHT) for storing data, and

aims to run on as many hosts as possible, as long as the hostsprovide some minimum level of availability (since SWORD pro-vides a service to other PlanetLab users). Before starting SWORD,we have to find and gain access to PlanetLab machines capableof hosting the service. Since SWORD is most concerned withreliability, it does not necessarily need powerful machines, butit must avoid nodes that frequently perform poorly over a rela-tively long time. We locate reliable machines using a tool likeCoMon [25], which monitors resource usage on PlanetLab, andthen we install the SWORD software on those machines. This in-volves downloading the SWORD software package on each hostindividually, unpacking the software, and installing any softwaredependencies, including a Java Runtime Environment. Afterthesoftware has been installed on all of the selected machines,westart the SWORD execution. Recall that reliability is important toSWORD, so if an error or failure occurs at any point, we need toquickly detect it (perhaps using custom scripts and cron jobs) andrestore the service to maintain high availability.

Running SWORD on PlanetLab is an example of a specific dis-tributed application deployment. The low-level details ofmanag-ing distributed applicationsin generallargely depend on the char-acteristics of the target application and environment. Forexample,long-running services such as SWORD prefer reliable machinesand attempt to dynamically recover from failures to ensure highavailability. On the other hand, short-lived scientific parallel ap-plications (e.g., EMAN [18]) prefer powerful machines with highbandwidth/low latency network connections. Long term reliabilityis not a huge concern for these applications, since they haveshortexecution times. At a high level, however, if we ignore the com-plexities associated with resource management, the requirementsfor managing distributed applications are largely similarfor all ap-plications and environments. Rather than reinvent the sameinfras-tructure for each class separately, our goal is to identify commonabstractions that support the execution of many types of distributedapplications, and to build an application-management infrastruc-ture that supports the general requirements of all applications. Inthis section, we identify these general requirements for distributedapplication management.

Specification. A generic application controller must allow appli-cation operators to customize the control flow for each applica-tion. Thisspecificationis an abstraction that describes distributedcomputations. A specification identifies all aspects of the exe-cution and environment needed to successfully deploy, manage,and maintain an application, including the software required torun the application, the processes that will run on each machine,the resources required to achieve the desired performance,andany environment-specific execution parameters. User credentialsfor resources must also be included in the application specifica-tion in order to obtain access to resources. To manage complexmulti-phased computations, such as scientific parallel applications,the specification must support application synchronization require-ments. Similarly, distributing computations among pools of ma-chines requires a way to specify a workflow—a collection of tasks

2

that must be completed in a given order—within an applicationspecification.

The complexity of distributed applications varies greatlyfrom sim-ple, single-process applications to elaborate, parallel applications.Thus the challenge is to define a specification language abstrac-tion that provides enough expressibility for complex distributedapplications, but is not too complicated for single-process compu-tations. In short, the language must be simple enough for noviceapplication developers to understand, yet expose enough advancedfunctionality to run complex scenarios.

Resource Discovery and Acquisition.Another key abstraction indistributed applications areresources. Put simply, resources arecomputing devices that are connected to a network and are ca-pable of hosting an application. Because resources in distributedenvironments are often heterogeneous, application developers nat-urally want to find the resource set that best satisfies the demandsof their application. Even if hardware is largely homogeneous,dynamic resource characteristics such as available bandwidth orCPU load can vary over time. The goal of resource discovery isto find the bestcurrentset of resources for the distributed applica-tion as described in the specification. In environments thatsupportdynamic virtual machine instantiation [5, 30], these resources maynot exist in advance. Thus, resource discovery in this case involvesfinding the appropriate physical machines to host the virtual ma-chine configurations.

Resource discovery systems often interact directly with resourceacquisition systems. Resource acquisition involves obtaining alease or permission to use the desired resources. Dependingon theexecution environment, acquisition can take a number of forms.For example, to support advanced resource reservations as in abatch pool, resource acquisition is responsible for submitting a re-source request and subsequently obtaining a lease from the sched-uler. In virtual machine environments, resource acquisition mayinvolve instantiating virtual machines, verifying their successfulcreation, and gathering the appropriate information (e.g., IP ad-dress, authentication keys) required for access. The challenge fac-ing an application management framework is to provide a genericresource-management interface. Ultimately, the complexities as-sociated with creating and gaining access to physical or virtualresources should be hidden from the application developer.

Deployment. Upon obtaining an appropriate set of resources, theapplication-deployment abstraction defines the steps required toprepare the resources with the correct software and data files, andrun any necessary executables to start the application. This in-volves copying, unpacking, and installing the software on the tar-get hosts. The application controller must support a variety of dif-ferent file-transfer mechanisms for each environment, and shouldreact to failures that occur during the transfer of softwareor instarting executables.

One important aspect of application deployment is configuring therequested number of resources with compatible versions of thesoftware. Ensuring that a minimum number of resources are avail-

able and correctly configured for a computation may involve re-questing new resources from the resource discovery and acqui-sition systems to compensate for failures that occur at startup.Further, many applications require some form of synchronizationacross hosts to guarantee that various phases of computation startat approximately the same time. Thus, the application controllermust provide mechanisms for loose synchronization.

Maintenance. Perhaps the most difficult requirement for man-aging distributed applications is monitoring and maintaining anapplication after execution begins. Thus, another abstraction thatthe application controller must define is support for customizableapplication maintenance. One key aspect of maintenance is ap-plication and resource monitoring, which involves probinghostsfor failure due to network outages or hardware malfunctions, andquerying applications for indications of failure (often requiringhooks into application-specific code for observing the progress ofan execution). Such monitoring allows for more specific error re-porting and simplifies the debugging process.

In some cases, system failures may result in a situation where ap-plication requirements can no longer be met. For example, ifanapplication is initially configured to be deployed on 50 resources,but only 48 can be contacted at a certain point in time, the ap-plication controller should adapt the application, if possible, andcontinue executing with only 48 machines. Similarly, different ap-plications have different policies and requirements with respect tofailure recovery. Some applications may be able to simply restarta failed process on a single host, while others may require the en-tire execution to abort in the case of failure. Thus, in addition tothe other features previously described, the application controllershould support a variety of options for failure recovery.

3 Plush: Design and Implementation

We now describe Plush, an extensible distributed application con-troller, designed to address the requirements of large-scale dis-tributed application management discussed in Section 2. Todi-rectly monitor and control distributed applications, Plush itselfmust be distributed. Plush uses a client-server architecture, withclients running on each resource (e.g., machine) involved in the ap-plication. The Plush server, called thecontroller, interprets inputfrom the user (i.e., the person running the application) and sendsmessages on behalf of the user over an overlay network (typically atree) to Plushclients. The controller, typically run from the user’sworkstation, directs the flow of control throughout the lifeof thedistributed application. The clients run alongside each applicationcomponent across the network and perform actions based uponin-structions received from the controller.

Figure 1a shows an overview of the Plush controller architecture.(The client architecture is symmetric to the controller with onlyminor differences in functionality.) The architecture consists ofthree main sub-systems: the application specification, core func-tional units, and user interface. Plush parses the application speci-fication provided by the user and stores internal data structures and

3

I/O and Timers

Communication Fabric

Host MonitorBarriers File

Manager Processes

Barrier Block

Process

Monitor

Resource

Discovery and

Acquisition Process Block

Workflow Block

Component Block

Application Block

Software

Resource

Manager

User Interface

Barrier Block

Application BlockApplication Block /app/app

Component Block 1 /app/comp1

Senders

Process Block 1 /app/comp1/proc1

prepare_files.pl


join_overlay.pl


send_files.pl

Barrier Block 1 /app/comp1/barr1

bootstrap_barrier

Component Block 2 /app/comp2

Receivers


join_overlay.pl


receive_files.pl

Barrier Block 1 /app/comp2/barr1

bootstrap_barrier

(a) (b)

Figure 1: (a) The architecture of Plush. Theuser interfaceis shown above the rest of the architecture and contains methods for interactingwith all boxes in the lower sub-systems of Plush. Boxes belowthe user interface and above the dotted line indicate objects defined withintheapplication specificationabstraction. Boxes below the line represent thecore functional unitsof Plush. (b) Example file-distributionapplication comprised of application, component, process, and barrier blocks in Plush. Arrows indicate control-flow dependencies. (i.e.,X → Y implies that X must complete before Y starts.)

objects specific to the application being run. The core functionalunits then manipulate and act on the objects defined by the appli-cation specification to run the application. The functionalunitsalso store authentication information, monitor physical machines,handle event and timer actions, and maintain the communicationinfrastructure that enables the controller to query the status of thedistributed application on the clients. The user interfaceprovidesthe functionality needed to interact with the other parts ofthe ar-chitecture, allowing the user to maintain and manipulate the ap-plication during execution. In this section, we describe the designand implementation details of each of the Plush sub-systems1.

3.1 Application Specification

Developing a complete, yet accessible, application specificationlanguage was one of the principal challenges in this work. Ourapproach, which has evolved over the past three years, consists ofcombinations of five different abstractions:

1. Process blocks- Describe the processes executed on eachmachine involved in an application. The process abstraction in-cludes runtime parameters, path variables, runtime environmentdetails, file and process I/O information, and the specific com-mands needed to start a process on a remote machine.

2. Barrier blocks - Describe the barriers that are used to synchro-nize the various phases of execution within a distributed applica-tion.

3. Workflow blocks - Describe the flow of data in a distributedcomputation, including how the data should be processed. Work-

1Note that the components within the sub-systems are highlighted using bold-face throughout the text in the remainder of this section.

flow blocks may contain process and barrier blocks. For example,a workflow block might describe a set of input files over which aprocess or barrier block will iterate during execution.

4. Component blocks- Describe the groups of resources requiredto run the application. This includes expectations specificto a setof metrics for the target resources. In the case of compute nodes ina cluster, for example, these metrics might include maximumloadrequirements and minimum free memory requirements. Compo-nents also define required software configurations, installation in-structions, and any authentication information needed to accessthe resources. Component blocks may contain workflow blocks,process blocks, and barrier blocks.

5. Application blocks - Describe high-level information abouta distributed application. This includes one or many componentblocks, as well as attributes to help automate failure recovery.

To better illustrate the use of these blocks in Plush, consider build-ing the specification for a simple file-distribution application asshown in Figure 1b. This application consists of two groups ofmachines. One group, the senders, stores the files, and the secondgroup, the receivers, attempt to retrieve the files from the senders.The goal of the application is to experiment with the use of anoverlay network to send files from the senders to the receivers us-ing some new file-distribution protocol. In this example, there aretwo phases of execution. In the first phase, all senders and re-ceivers join the overlay before any transfers begin, and thesendersmust prepare the files for transfer before the receivers start receiv-ing files. In the second phase, the receivers begin receivingthefiles. No new senders or receivers are allowed to join the networkduring the second phase.

4

The first step in building the corresponding Plush application spec-ification for our new file-distribution protocol is to define an appli-cation block. The application block defines general characteris-tics about the application including liveness properties and fail-ure detection and recovery options, which determine default fail-ure recovery behavior. For this example, we choose the behavior“restart-on-failure,” which attempts to restart the failed applicationinstance on a single host, since it is not necessary to abort the entireapplication across all hosts if only a single failure occurs.

The application block also contains one or many componentblocks that describe the groups of resources (i.e., machines) re-quired to run the application. Our application consists of asetof senders and a set of receivers, and two separate componentblocks describe the two groups of machines. The sender compo-nent block defines the location and installation instructions for thesender software, and includes authentication informationto accessthe resources. Similarly, the receiver component block defines thereceiver software package. In our example, it may be desirableto require that all machines in the sender group have a processorspeed of at least 1 GHz, and each sender should have sufficientbandwidth for sending files to multiple receivers at once. Thesetypes of machine-specific requirements are included in the com-ponent blocks. Within each component block, a combination ofworkflow, process, and barrier blocks describe the distributed com-putation2.

Plush process blocks describe the specific commands required toexecute the application. Most process blocks depend on the suc-cessful installation ofsoftware packages defined in the compo-nent blocks. Users specify the commands required to start a givenprocess, and actions to take upon process exit. The exit policiescreate a Plushprocess monitor that oversees the execution of aspecific process. Our example has several process blocks. Inthesender component, process blocks define processes for preparingthe files, joining the overlay, and sending the files. Similarly, thereceiver component contains process blocks for joining theoverlayand receiving the files.

Some applications operate in phases, producing output filesinearly stages that are used as input files in later stages. To ensureall hosts start each phase of computation only after the previousphase completes, barrier blocks define loose synchronization se-mantics between process and workflow blocks. In our example,a barrier ensures that all receivers and senders join the overlay inphase one before beginning the file transfer in phase two. Notethat although each barrier block is uniquely defined within acom-ponent block, it is possible for the same barrier to be referenced inmultiple component blocks. We use barrier blocks in our examplewithin each component block that refer to the same barrier, whichmeans that the application will wait for all receivers and senders toreach the barrier before allowing either component to startsendingor receiving files.

2Although our example does not use workflow blocks, they are used in applica-tions where data files must be distributed and iteratively processed.

In Figure 1b, the outer application block contains our two compo-nent blocks that run in parallel (since there are no arrows indicatingcontrol-flow dependencies between them). Within the componentblocks, the different phases are separated by the bootstrapbarrierthat is defined by barrier block 1. Component block 1, which de-scribes the senders, contains process blocks 1 and 2 that defineperl scripts that run in parallel during phase one, synchronize onthe barrier in barrier block 1, and then proceed to process block3 in phase two which sends the files. Component block 2, whichdescribes the receivers, runs process block 1 in phase one, syn-chronizes on the barrier in barrier block 1, and then proceeds toprocess block 2 in phase two which runs the process that receivesthe files. In our implementation, the blocks are representedbyXML that is parsed by the Plush controller when the application isrun. We show an example of the XML in Section 4.

We designed the Plush application specification to support ava-riety of execution patterns. With the blocks described above,Plush supports the arbitrary combination of processes, barriers,and workflows, provided that the flow of control between themforms a directed acyclic graph. Using predecessor tags in Plush,users specify the flow of control and define whether processesrunin parallel or sequentially. Arrows between blocks in Figure 1b, forexample, indicate the predecessor dependencies. (Processblocks1 and 2 in component block 1 will run in parallel before block-ing at the bootstrap barrier, and then the execution will continueon to process block 3 after the bootstrap barrier releases.)In-ternally, Plush stores the blocks in a hierarchical data structure,and references specific blocks in a manner similar to referenc-ing absolute paths in a UNIX file system. Figure 1b shows theunique path names for each block from our file-distribution exam-ple. This naming abstraction also simplifies coordination amongremote hosts. Each Plush client maintains an identical local copyof the application specification. Thus, for communication regard-ing control flow changes, the controller sends the clients mes-sages indicating which “block” is currently being executed, andthe clients update their local state information accordingly.

3.2 Core Functional Units

After parsing the block abstractions defined by the user withinthe application specification, Plush instantiates a set of core func-tional units to perform the operations required to configureanddeploy the distributed application. Figure 1a shows these units asshaded boxes below the dotted line. The functional units manipu-late the objects defined in the application specification to managedistributed applications. In this section, we describe therole ofeach of these units.

Starting at the highest level, the Plushresource discovery andacquisition unit uses the resource requirements in the componentblocks to locate and create (if necessary) resources on behalf of theuser. The resource discovery and acquisition unit is responsible forobtaining a valid set, called amatching, of resources that meet theapplication’s demands. To determine this matching, Plush may ei-ther call an existing external service to construct a resource pool,

5

such as SWORD or CoMon for PlanetLab, or use a statically de-fined resource pool based on information provided by the user.The Plushresource matcherthen uses the resources in the resourcepool to create a matching for the application. All hosts involved inan application run a Plushhost monitor that periodically publishesinformation about the host. The resource discovery and acquisitionunit may use this information to help find the best matching. Uponacquiring a resource, a Plushresource managerstores the lease,token, or any necessary user credential needed for accessing thatresource to allow Plush to perform actions on behalf of the user inthe future.

The remaining functional units in Figure 1a are responsibleforapplication deployment and maintenance. These units connect toresources, install required software, start the execution, and mon-itor the execution for failures. One important functional unit usedfor these operations is the Plushbarrier manager, which providesadvanced synchronization services for Plush and the application it-self. In our experience, traditional barriers [17] are not well suitedfor volatile, wide-area network conditions; the semanticsare sim-ply too strict. Instead, Plush uses partial barriers, whichare de-signed to perform better in volatile environments [1]. Partial barri-ers ensure that the execution makes forward progress in the face offailures, and improve performance in failure-prone environmentsusing relaxed synchronization semantics.

The Plushfile manager handles all files required by a dis-tributed application. This unit contains information regarding soft-ware packages, file transfer methods, installation instructions, andworkflow data files. The file manager is responsible for prepar-ing the physical resources for execution using the information pro-vided by the application specification. It monitors the status of filetransfers and installations, and if it detects an error or failure, thecontroller is notified and the resource discovery and acquisitionunit may be required to find a new host to replace the failed one.

Once the resources are prepared with the necessary software, theapplication deployment phase completes by starting the execution.This is accomplished by starting a number of processes on remotehosts. Plushprocessesare defined within process blocks in the ap-plication specification. A Plush process is an abstraction for stan-dard UNIX processes that run on multiple hosts. Processes requireinformation about the runtime environment needed for an execu-tion including the working directory, path, environment variables,file I/O, and the command line arguments.

The two lowest layers of the Plush architecture consist of acom-munication fabric and theI/O and timer subsystems. The com-munication fabric handles passing and receiving messages amongPlush overlay participants. Participants communicate over TCPconnections. The default topology for a Plush overlay is a star,although we also provide support for tree topologies for increasedscalability (see Section 3.3.2). In the case of a star topology, allclients connect directly to the controller, which allows for quickfailure detection and recovery. The controller sends messagesto the clients instructing them to perform certain actions.When

the clients complete their tasks, they report back to the controllerfor further direction. The communication fabric at the controllerknows what hosts are involved in a particular application instance,so that the appropriate messages reach all necessary hosts.

At the bottom of all of the other units is the Plush I/O and timerabstraction. As messages are received in the communicationfab-ric, message handlers fire events. These events are sent to the I/Oand timer layer and enter a queue. The event loop pulls eventsoffthe queue, and calls the appropriate event handler. Timers are aspecial type of event in Plush that fire at a predefined instant.

3.3 Fault Tolerance and Scalability

Two of the biggest challenges that we encountered during thede-sign of Plush was being robust to failures and scaling to hundredsof machines spread across the wide-area. In this section we ex-plore fault tolerance and scalability in Plush.

3.3.1 Fault tolerance

Plush must be robust to the variety of failures that occur duringapplication execution. When designing Plush, we aimed to providethe functionality needed to detect and recover from most failureswithout ever needing to involve the user running the application.Rather than enumerate all possible failures that may occur,we willdiscuss how Plush handles three common failure classes—process,host, and controller failures.

Process failures.When a remote host starts a process defined ina process block, Plush attaches a process monitor to the process.The role of the process monitor is to catch any signals raisedbythe process, and to react appropriately. When a process exits eitherdue to successful completion or error, the process monitor sends amessage to the controller indicating that the process has exited, andincludes its exit status. Plush defines a default set of behaviors thatoccur in response to a variety of exit codes (although these can beoverridden within an application specification). The default behav-iors include ignoring the failure, restarting only the failed process,restarting the application, or aborting the entire application.

In addition to process failures, Plush also allows users to monitorthe status of a process that is still running through a specific typeof process monitor called aliveness monitor, whose goal is todetect misbehaving and unresponsive processes that get stuck inloops and never exit. This is especially useful in the case oflong-running services that are not closely monitored by the user.To usethe liveness monitor, the user specifies a script and a time intervalin the process block of the application specification. The livenessmonitor wakes up once per time interval and runs the script totest for the liveness of the application, returning either success orfailure. If the test fails, the Plush client kills the process, causingthe process monitor to be alerted and inform the controller.

Remote host failures.Detecting and reacting to process failuresis straightforward since the controller is able to communicate in-formation to the client regarding the appropriate recoveryaction.When a host fails, however, recovering is more difficult. A host

6

may fail for a number of reasons, including network outages,hard-ware problems, and power loss. Under all of these conditions, thegoal of Plush is to quickly detect the problem and reconfiguretheapplication with a new set of resources to continue execution. ThePlush controller maintains a list of the last time successful com-munication occurred with each connected client. If the controllerdoes not hear from a client within a specified time interval, the con-troller sends a ping to the client. If the controller does notreceivea response from the client, we assume host failure. Reliablefailuredetection is an active area of research; while the simple techniquewe employ has been sufficient thus far, we intend to leverage ad-vances in this space where appropriate.

There are three possible actions in response to a host failure:restart, rematch, and abort. By default, the controller tries all threeactions in order. The first and easiest way to recover from a hostfailure is to simply reconnect and restart the application on thefailed host. This technique works if the host experiences a tempo-rary power or network outage, and is only unreachable for a shortperiod of time. If the controller is unable to reconnect to the host,the next option is to rematch in an attempt to replace the failedhost with a different host. In this case, Plush reruns the resourcematcher to find a new machine. Depending on the application,the entire execution may need to be restarted across all hosts afterthe new host joins the control overlay, or the execution may onlyneed to be started on the new host. If the controller is unabletofind a new host to replace the failed host, Plush aborts the entireapplication.

In some applications, it is desirable to mark a host as failedwhenit becomes overloaded or experiences poor network connectivity.The Plush host monitor that runs on each machine is responsiblefor periodically informing the controller about each machine’s sta-tus. If the controller determines that the performance is less thanthe application tolerates, it marks the host as failed and attempts torematch. This functionality is a preference specified at startup. Al-though Plush currently monitors host-level metrics including loadand free memory, the technique is easily extended to encompasssophisticated application-level expectations of host viability.

Controller failures. Because the controller is responsible formanaging the flow of control across all connected clients, recover-ing from a failure at the controller is difficult. One solution is touse a simple primary-backup scheme, where multiple controllersincrease reliability. All messages sent from the clients and pri-mary controller are sent to the backup controllers as well. If apre-determined amount of time passes and the backup controllersdo not receive any messages from the primary, the primary is as-sumed to have failed. The first backup becomes the primary, andexecution continues.

This strategy has several drawbacks. First, it causes extrames-sages to be sent over the network, which limits the scalabilityof Plush. Second, this approach does not perform well when anetwork partition occurs. During a network partition, multiplecontrollers may become the primary controller for subsets of the

clients initially involved in the application. Once the network par-tition is resolved, it is difficult to reestablish consistency amongall hosts. While we have implemented this architecture, we arecurrently exploring other possibilities.

3.3.2 Scalability

In addition to fault tolerance, an application controller designedfor large-scale environments must scale to hundreds or eventhou-sands of participants. Unfortunately there is a tradeoff betweenperformance and scalability. The solutions that perform the bestat moderate scale typically do not scale as well as solutionswithlower performance. To balance scalability and performance, Plushprovides users with two topological alternatives.

By default, all Plush clients connect directly to the controller form-ing a star topology. This architecture scales to approximately 300remote hosts, limited by the number of file descriptors allowed perprocess on the controller machine in addition to the bandwidth,CPU, and latency required to communicate with all connectedclients. The star topology is easy to maintain, since all clientsconnect directly to the controller. In the event of a host failure,only the failed host is affected. Further, the time requiredfor thecontroller to exchange messages with clients is short due tothedirect connections.

At larger scales, network and file descriptor limitations atthe con-troller become a bottleneck. To address this, Plush also supportstree topologies. In an effort to reduce the number of hops betweenthe clients and the controller, we construct “bushy” trees,wherethe depth of the tree is small and each node in the tree has manychildren. The controller is the root of the tree. The children ofthe root are chosen to be well-connected and historically reliablehosts whenever possible. Each child of the root acts as a “proxycontroller” for the hosts connected to it. These proxy controllerssend invitations and receive joins from other hosts, reducing thetotal number of messages sent back to the root controller. Impor-tant messages, such as failure notifications, are still sentback tothe root controller. Using the tree topology, we have been ableto use Plush to manage an application running on 1000 Model-Net [29] emulated hosts, as well as an application running on500PlanetLab clients. We believe that Plush has the ability to scale byanother order of magnitude with the current design.

While the tree topology has many benefits over the star topology,it also introduces several new problems with respect to hostfail-ures and tree maintenance. In the star topology, a host failure issimple to recover from since it only involves one host. In thetreetopology, however, if a non-leaf host fails, all children ofthe failedhost must find a new parent. Depending on the number of hostsaffected, a reconfiguration involving several hosts often has a sig-nificant impact on performance. Our current implementationtriesto minimize the probability of this type of failure by makingintel-ligent decisions during tree construction. For example, inthe caseof ModelNet, many virtual hosts (and Plush clients) reside on thesame physical machine. When constructing the tree in Plush,onlyone client per physical machine connects directly to the controller

7

and becomes the proxy controller. The remaining clients runningon the same physical machine become children of the proxy con-troller. In the wide area, similar decisions are made by placinghosts that are geographically close together under the sameparent.This decreases the number of hops and latency between leaf nodesand their parent, minimizing the chance of network failures.

3.4 Running An Application Using Plush

In this section, we will discuss how the architectural componentsof Plush interact to run a distributed application. When startingPlush, the user’s workstation becomes the controller. The usersubmits an application specification to the Plush controller. Thecontroller parses the specification, and internally creates the ob-jects shown above the dotted line in Figure 1a.

After parsing the application specification, the controller runs theresource discovery and acquisition unit to find a suitable set ofresources that meet the requirements specified in the componentblocks. Upon locating the necessary resources, the resource man-ager stores the required access and authentication information.The controller then attempts to connect to each remote host.If thePlush client is not already running, the controller initiates a boot-strapping procedure to copy the Plush client binary to the remotehost, and then uses SSH to connect to the remote host and starttheclient process. Once the client process is running, the controllerestablishes a TCP connection to the remote host, and transmits anINVITE message to the host to join the Plush overlay (which iseither a star or tree as discussed in Section 3.3.2).

If a Plush client agrees to run the application, the client sendsa JOIN message back to the controller accepting the invitation.Next, the controller sends aPREPARE message to the new client,which contains a copy of the application specification (XML rep-resentation). The client parses the application specification, startsa local host monitor, sends aPREPAREDmessage back to the con-troller, and waits for further instruction. Once enough hosts jointhe overlay and agree to run the application, the controlleriniti-ates the beginning of the application deployment stage by sendingaGO message to all connected clients. The file managers then be-gin installing the requested software and preparing the hosts forexecution.

In most applications, the controller instructs the hosts tobegin ex-ecution after all hosts have completed the software installation.(Synchronizing the beginning of the execution is not required ifthe application does not need all hosts to start simultaneously.)Since each client has now created an exact copy of the controller’sapplication specification, the controller and clients exchange mes-sages about the application’s progress using the block naming ab-straction (i.e., /app/comp1/proc1) to identify the status of the ex-ecution. For barriers, a barrier manager running on the controllerdetermines when it is appropriate for hosts to be released from thebarriers.

Upon detecting a failure, clients notify the controller, and the con-troller attempts to recover from it according to the actionsenumer-

ated in the user’s application specification. Since many failuresare application-specific, Plush exports optional callbacks to theapplication itself to determine the appropriate reaction for somefailure conditions. When the application completes (or upon auser command), Plush stops all associated processes, transfers out-put data back to the controller’s local disk if desired, performsuser-specified cleanup actions on the resources, disconnects theresources from the overlay by closing the TCP connections, andstops the Plush client processes.

3.5 User Interface

Plush aims to support a variety of applications being run by userswith a wide range of expertise in building and managing dis-tributed applications. Thus, Plush provides three interfaces whicheach provide users with techniques for interacting with their appli-cations. We describe the functionality of each user interface in thissection.

Figure 1a shows the user interface above all other parts of Plush.In reality, the user interacts with every box shown in the figurethrough the user interface. For example, the user forces there-source discovery and acquisition unit to find a new set of resourcesusing a terminal command. We designed Plush in this way to pro-vide maximum control over the application. At any point, theusercan override a default Plush behavior. The effect is a customizableapplication controller that supports a variety of distributed appli-cations.

3.5.1 Graphical User Interface

In an effort to simplify the creation of application specificationsand help visualize the status of executions running on resourcesaround the world, we implemented a graphical user interfaceforPlush called Nebula. In particular, we designed Nebula (as shownin Figures 2 and 3) to simplify the process of specifying and man-aging applications running across the PlanetLab wide-areatestbed.Plush obtains data from the PlanetLab Central (PLC) database todetermine what hosts a user has access to, and Nebula uses this in-formation to plot the sites on the map. To start using Nebula,usershave the option of building their Plush application specificationfrom scratch or loading a preexisting XML document representingan application specification. Upon loading the applicationspeci-fication, the user runs the application by clicking theRun buttonfrom the Plush toolbar, which causes Plush to start locatingandacquiring resources.

The main Nebula window contains four tabs that show differentinformation about user’s application. In the “World View” tab,users see an image of a world map with colored dots indicatingPlanetLab hosts. Different colored dots on the map indicatesitesinvolved in the current application. In Figure 2a, the dots (whichrange from red to green on a user’s screen) show PlanetLab sitesinvolved in the current application. The grey dots are otheravail-able PlanetLab sites that are not currently being used by Plush. Asthe application proceeds through the different phases of execution,the sites change color, allowing the user to visualize the progress

8

(a) (b)

Figure 2: (a) Nebula World View tab showing an application running on PlanetLab. Different colored dots indicate PlanetLab sites invarious stages of execution. (b) Nebula Application View tab displaying Plush application specification.

of their application. When failures occur, the impacted sites turnred, giving the user an immediate visual indication of the problem.Similarly, green dots indicate that the application is executing cor-rectly. If a user wishes to establish an SSH connection directly toa particular resource, they can simply right-click on a hostin themap and choose theSSH option from the pop-up menu. This opensa new tab in Nebula containing an SSH terminal to the host. Userscan also mark hosts as failed by right-clicking and choosingtheFail option from the pop-up menu if they are able to determinefailure more quickly than Plush’s automated techniques. Failedhosts are completely removed from the execution.

Users retrieve more detailed usage statistics and monitoring infor-mation about specific hosts (such as CPU load, free memory, orbandwidth usage) by double clicking on the individual sitesin themap. This opens a second window that displays real-time graphsbased on data retrieved from resource monitoring tools, as shownin the bottom right corner of Figure 2a. The second smaller win-dow displays a graph of the CPU or memory usage, and the statusof the application on each host. Plush currently provides built-insupport for monitoring CoMon [25] data on PlanetLab machines,which is the source of the CPU and memory data. Additionally,if the user wishes to view the CPU usage or percentage of freememory available across all hosts, there is a menu item underthePlanetLab menu that changes the colors of the dots on themap such that red means high CPU usage or low free memory,and green indicates low CPU usage and high free memory. Userscan also add and remove hosts to their PlanetLab account (orslicein PlanetLab terminology) directly by highlighting regions of themap and choosing the appropriate menu option from the Planet-Lab menu. Additionally, users can renew their PlanetLab slicefrom Nebula.

The second tab in the Nebula main window is the “Application

Figure 3: Nebula Host View tab showing PlanetLab resources.This tab allows users to select multiple hosts at once and runshellcommands on the selected resources. The text-box at the bottomshows the output from the command.

View.” The Application View tab, shown in Figure 2b, allows usersto build Plush application specifications using the blocks describedin Section 3.1. Alternatively, users may load an existing XMLfile describing an application specification by choosing theLoadApplicationmenu option under theFilemenu. There is alsoan option to save a new application specification to an XML filefor later use. After creating or loading an application specifica-tion, theRun button located on the Application View tab startsthe application. The Plush blocks in the application specificationchange to green during the execution of the application to indicateprogress. After an application begins execution, users have the op-tion to “force” an application to skip ahead to the next phaseof

9

execution (which corresponds to releasing a synchronization bar-rier), or aborting an application to terminate execution across allresources. Once the application aborts or completes execution,the user may either save their application specification, disconnectfrom the Plush communication mesh, restart the same application,or load and run a new application by choosing the appropriateop-tion from theFile menu.

The third tab is the “Resource View” tab. This tab is blank untilan application starts running. During execution, this tab lists thespecific machines that are currently involved in the application.If failures occur during execution, the list of machines is updateddynamically, such that the Resource View tab always contains anaccurate listing of the machines that are in use. The resources areseparated into components, so that the user knows which resourcesare assigned to which tasks in their application.

The fourth tab in Nebula is called the “Host View” tab, shownin Figure 3. This tab contains a table that displays the hostnameof all available resources. In the right column, the status of thehost is shown. The purpose of this tab is to give users anotheralternative to visualize the status of an executing application. Thestatus of the host in the right column corresponds to the color ofthe dot in the “World View” tab. This tab also allows users torun shell commands simultaneously on several resources, and viewthe output. As shown in Figure 3, users can select multiple hostsas once, run a command, and the output is shown in the text-boxat the bottom of the window. Note that hosts do not have to beinvolved in an application in order to take advantage of thisfeature.Plush will connect to any available resources and run commandson behalf of the user. Just as in the World View tab, right-clickingon hosts in the Host View tab opens a pop-up menu that enablesusers to SSH directly to the hosts.

3.5.2 Command-line Interface

Motivated by the popularity and familiarity of the shell inter-face in UNIX, Plush further streamlines the develop-deploy-debugcycle for distributed application management through a simplecommand-line interface where users deploy, run, monitor, and de-bug their distributed applications running on hundreds of remotemachines. Plush combines the functionality of a distributed shellwith the power of an application controller to provide a robust ex-ecution environment for users to run their applications. From auser’s standpoint, the Plush terminal looks like a shell. Plush sup-ports several commands for monitoring the state of an execution,as well as commands for manipulating the application specifica-tion during execution. Table 1 shows some of the available com-mands.

3.5.3 Programmatic Interface

Many commands that are available via the Plush command-lineinterface are also exported via an XML-RPC interface to deliversimilar functionality as the command-line to those who desire pro-grammatic access. This allows Plush to be scripted and used for re-mote execution and automated application management, and also

Command Description

load <file> Load application specificationconnect <host> Connect to host and start clientdisconnect Close all connections and clientsinfo nodes Print all resource informationinfo mesh Print communication fabric status infoinfo control Print application control state inforun Start the application (after load)shell <cmd> Run “cmd” on all connected resources

Table 1: Sample Plush terminal commands.

class PlushXmlRpcServer extends XmlRpcServer{void plushAddNode(HashMap properties);void plushRemoveNode(string hostname);string plushTestConnection();void plushCreateResources();void plushLoadApp(string filename);void plushRunApp();void plushDisconnectApp(string hostname);void plushQuit();void plushFailHost(string hostname);void setXmlRpcClientUrl(string clientUrl);

}

class PlushXmlRpcCallback extends XmlRpcClient{void sendPlanetLabSlices();void sendSliceNodes(string slice);void sendAllPlanetLabNodes();void sendApplicationExit();void sendHostStatus(string host);void sendBlockStatus(string block);void sendResourceMatching(HashMap matching);

}

Figure 4: Plush XML-RPC API.

enables the use of external services for resource discovery, cre-ation, and acquisition within Plush. In addition to the commandsthat Plush exports, external services and programs may alsoregis-ter themselves with Plush so that the controller can send callbacksto the XML-RPC client when various actions occur during the ap-plication’s execution.

Figure 4 shows the Plush XML-RPC API. The functionsshown in the PlushXmlRpcServer class are availableto users who wish to access Plush programmatically inscripts, or for external resource discovery and acquisitionservices that need to add and remove resources from thePlush resource pool. TheplushAddNode(HashMap)and plushRemoveNode(string) calls add andremove nodes from the resource pool, respectively.setXmlRpcClientUrl(string) registers XML-RPCclients for callbacks, whileplushTestConnection() sim-ply tests the connection to the Plush server and returns “HelloWorld.” The remaining function calls in the class mimic thebehavior of the corresponding command-line operations.

Aside from resource discovery and acquisition services, the XML-RPC API allows for the implementation of different user interfaces

10

for Plush. Since almost all of the Plush terminal commands areavailable as XML-RPC function calls, users are free to implementtheir own customized environment specific user interface withoutunderstanding or modifying the internals of the Plush implemen-tation. This is beneficial because it gives the users more flexibilityto develop in the programming language of their choice. Mostmainstream programming languages have support for XML-RPC,and hence users are able to develop interfaces for Plush in anylanguage, provided that the chosen language is capable of han-dling XML-RPC. For example, Nebula is implemented in Java,and uses the XML-RPC interface shown in Figure 4 in interactwith a Plush controller. To increase the functionality and simplifythe development of these interfaces, the Plush XML-RPC serverhas the ability to make callbacks to programs that register with thePlush controller viasetXmlRpcClientUrl(string). Someof the more common callback functions are shown in the bottomof Figure 4 in classPlushXmlRpcCallback. Note that thesecallbacks are only useful if the programmatic client implementsthe corresponding functions.

3.6 Implementation Details

Plush is a publicly available software package consisting of over60,000 lines of C++ code. Plush depends on several C++ libraries,including those provided by xmlrpc-c, curl, xml2, zlib, math,openssl, readline, curses, boost, and pthreads. The command-lineinterface also depends on packages for lex and yacc (we use flexand bison). In addition to the main C++ codebase, Plush uses sev-eral simple perl scripts for interacting with the PlanetLabCentraldatabase and bootstrapping resources. Plush runs on most UNIX-based platforms, including Linux, FreeBSD, and Mac OS X, anda single Plush controller can manage clients running on differentoperating systems. The only prerequisite for using Plush ona re-source is the ability to SSH to the resource. Currently Plushisbeing used to manage applications on PlanetLab, ModelNet, andXen virtual machines [5] in our research cluster.

Nebula consists of approximately 25,000 lines of Java code.Neb-ula communicates with Plush using the XML-RPC interface.XML-RPC is implemented in Nebula using the Apache XML-RPC client and server packages. In addition, Nebula uses theJOGL implementation of the OpenGL graphics package for Java.Nebula runs in any computing environment that supports Java, in-cluding Windows, Linux, FreeBSD, and Mac OS X among others.Note that since Nebula and Plush communicate solely via XML-RPC, it is not necessary to run Nebula on the same physical ma-chine as the Plush controller.

4 Usage Scenarios

One of the primary goals of our work is to build a generic ap-plication management framework that supports execution inanyexecution environment. This is mainly accomplished through thePlushresourceabstraction. In Plush, resources are computing de-vices capable of hosting applications, such as physical machines,emulated hosts, or virtual machines. To show that Plush achieves

this goal, in this section we take a closer look at specific uses ofPlush in different distributed computing environments, includinga live deployment testbed, an emulated network, and a cluster ofvirtual machines.

4.1 PlanetLab Live Deployment

To demonstrate Plush’s ability to manage the live deployment ofapplications, we revisit our previous example from Section2, andshow how Plush manages SWORD [23] on PlanetLab. Recall thatSWORD is a resource discovery service that relies on host moni-tors running on each PlanetLab machine to report information pe-riodically about their resource usage. This data is stored in a DHT(distributed hash table), and later accessed by SWORD clients torespond to requests for groups of resources that have specific char-acteristics. SWORD is a service that helps PlanetLab users findthe best set of resources based on the priorities and requirementsspecified, and is an example of a long-running Internet service.

The XML application specification for SWORD is shown in Fig-ure 5. Note that this specification could be built using Nebula, inwhich case the user would never have to edit the XML directly.The top half of the specification in Figure 5 defines the SWORDsoftware package and the component (resource group) required forthe application. Notice that SWORD uses one component consist-ing of hosts assigned to the ucsdsword PlanetLab slice. An inter-esting feature of this component definition is the “numhosts” tag.Since SWORD is a service that wants to run on as many nodes aspossible, a range of acceptable values is used rather than a singlenumber. In this case, as long as 10 hosts are available, Plushwillcontinue managing SWORD. Since the max is set to 800, Plushwill not look for more than 800 resources to host SWORD. SincePlanetLab contains less than 800 hosts, this means that SWORDwill attempt to run on all PlanetLab resources. The lower half ofthe application specification defines the application block, compo-nent block, and process block that describes the SWORD execu-tion.

The application block contains a few key features that help Plushreact to failures more efficiently for long-running services. Whendefining the application block object for SWORD, we include spe-cial “service” and “reconnectinterval” attributes. The service at-tribute tells the Plush controller that SWORD is a long-runningservice and requires different default behaviors for initializationand failure recovery. For example, during application initializa-tion the controller does not wait for all participants to install thesoftware before starting all hosts simultaneously. Instead, the con-troller instructs individual clients to start the application as soonas they finish installing the software, since there is no reason tosynchronize the execution across all hosts. Further, if a processfails when the service attribute has been specified, the controllerattempts to restart SWORD on that host without aborting the en-tire application. The reconnectinterval specifies the period of timethe controller waits before rerunning the resource discovery andacquisition unit. For long running services, hosts often fail andrecover during execution. The reconnectinterval attribute tells the

11

<?xml version="1.0" encoding="utf-8"?>

<plush><project name="sword">

<software name="sword_software" type="tar">

<package name="sword.tar" type="web">

<path>http://plush.ucsd.edu/sword.tar</path><dest>sword.tar</dest>

</package></software><component name="sword_participants">

<rspec><num hosts min="10" max="800"/>

</rspec><resources>

<resource type="planetlab" group="ucsd_sword"/></resources><software name="sword_software"/>

</component><applicationblock name="sword_app_block" service="1"

reconnectinterval="300">

<execution><componentblock name="participants">

<component name="sword_participants"/><processblock name="sword">

<process name="sword_run">

<path>dd/planetlab/run−sword</path></process>

</processblock></componentblock>

</execution></applicationblock>

</project>

</plush>

Figure 5: SWORD application specification.

controller to check for new hosts that have come alive since thelast run of the resource discovery unit. The controller alsoun-sets any hosts that had previously been marked as “failed” atthistime. This is the controller’s way of “refreshing” the list of avail-able hosts. The controller continues to search for new hostsuntilreaching the maximum numhosts value, which is 800 in our case.

4.1.1 Evaluating Fault Tolerance

To demonstrate Plush’s ability to automatically recover from hostfailures for long running services, we ran SWORD on PlanetLabwith 100 randomly chosen hosts, as shown in Figure 6. The hostset includes machines behind DSL links as well as hosts from othercontinents. When Plush starts the application, the controller startsthe Plush client on 100 randomly chosen PlanetLab machines,andthey each begin downloading the SWORD software package (38MB). It takes approximately 1000 seconds for all hosts to success-fully download, install, and start SWORD. At timet = 1250s, wekill the SWORD process on 20 randomly chosen hosts to simulatehost failure. Normally, Plush would automatically try to restart theSWORD process on these hosts. However, we disabled this featureto simulate host failures and force a rematching. The remotePlushclients notify the controller that the hosts have failed, and the con-troller begins to find replacements for the failed machines.Thereplacement hosts join the Plush overlay and start downloadingthe SWORD software. As before, Plush chooses the replacementsrandomly, and low bandwidth/high latency links have a greatim-pact on the time it takes to fully recover from the host failure. At

0

20

40

60

80

100

0 500 1000 1500 2000 2500

Num

ber

of h

osts

Elapsed time (seconds)

RunningFailed

Figure 6: SWORD running on 100 randomly chosen PlanetLabhosts. At t=1250 seconds, we fail 20 hosts. The Plush controllerfinds new hosts, who start the Plush client process and begin down-loading and installing the SWORD software. Service is fullyre-stored at approximately t=2200 seconds.

t = 2200s, the service is restored on 100 machines.

Using Plush to manage long-running services like SWORD alle-viates the burden of manually probing for failures and configur-ing/reconfiguring hosts. Further, Plush interfaces directly with thePlanetLab Central API, which means that users can automaticallyadd hosts to their slice and renew their slice using Plush. This isbeneficial since services typically want to run on as many Planet-Lab hosts as possible, including any new hosts that come online. Inaddition, Plush simplifies the task of debugging problems bypro-viding a single point of control for all connected PlanetLabhosts.Thus, if a user wants to view the memory consumption of their ser-vice across all connected hosts, a single Plush command retrievesthis information, making it easier to maintain and monitor aser-vice running on hundreds of resources scattered around the world.

4.2 ModelNet Emulation

Aside from PlanetLab resources, Plush also supports running ap-plications on virtual hosts in emulated environments. In this sec-tion we discuss how Plush supports using ModelNet [29] emulatedresources to host applications. In addition, we will discuss how abatch scheduler uses the Plush programmatic interface to performremote job execution.

Mission is a simple batch scheduler used to manage the executionof jobs that run on ModelNet in our research cluster. ModelNet is anetwork emulation environment that consists of one or more Linuxedge nodes and a set of FreeBSD core machines running a special-ized ModelNet kernel. The code running on the edge hosts routespackets through the core machines, where the packets are sub-jected to the delay, bandwidth, and loss specified in a targettopol-ogy. A single physical machine hosts multiple “virtual” IP ad-dresses that act as emulated resources on the Linux edge hosts. Tosetup the ModelNet computing environment with the target topol-

12

ogy, two phases of execution are required: deploy and run. Beforerunning any applications, the user must firstdeploy the desiredtopology on each physical machine, including the FreeBSD core.The deploy process essentially instantiates the emulated hosts, andinstalls the desired topology on all machines. Then, after setting afew environment variables, the user is free torun applications onthe emulated hosts using virtual IP addresses just as applicationsare run on physical machines using real IP addresses.

A single ModelNet experiment typically consumes almost allofthe computing resources available on the physical machinesin-volved. Thus, when running an experiment, it is essential tore-strict access to the machines so that only one experiment is run-ning at a time. Further, there are a limited number of FreeBSDcore machines running the ModelNet kernel available, and accessto these hosts must also be arbitrated. Mission is a batch sched-uler developed locally to help accomplish this goal by allowingthe resources to be efficiently shared among multiple users.Mod-elNet users submit their jobs to the Mission job queue, and asthemachines become available, Mission pulls jobs off the queueandruns them on behalf of the user, ensuring that no two jobs are runsimultaneously.

A Mission job submission has two components: a Plush applica-tion specification and resource directory file. For ModelNet, thedirectory file contains information about both the physicaland vir-tual (emulated) resources on which the ModelNet experimentwillrun. In the resource directory file, some entries include twoextraparameters, “vip” and “vn”, which define the virtual IP addressand virtual number (similar to a hostname) for the emulated re-sources. In addition to the directory file that is used to populate thePlush resource pool, users also submit an application specificationdescribing the application they wish to run on the emulated topol-ogy to the Mission server. This application specification containstwo component blocks separated by a synchronization barrier. Thefirst component block describes the processes that run on thephys-ical machines during the deployment phase (where the emulatedtopology is instantiated). The second component block defines theprocesses associated with the target application. When thecon-troller starts the Plush clients on the emulated hosts, it specifiesextra command line arguments that are defined in the directory fileby the “vip” and “vn” attributes. This sets the appropriate Model-Net environment variables, ensuring that all commands run on thatclient on behalf of the user inherit those settings.

When a user submits a Plush application specification and direc-tory file to Mission, the Mission server parses the directoryfile toidentify which resources are needed to host the application. Whenthose resources become available for use, Mission starts a Plushcontroller on behalf of the user using the Plush XML-RPC inter-face. Mission passes Plush the directory file and application spec-ification, and continues to interact throughout the execution of theapplication via XML-RPC. After Plush notifies Mission that theexecution has ended, Mission kills the Plush process and reportsback to the user with the results. Any terminal output that isgen-erated is emailed to the user.

Plush jobs are currently being submitted to Mission on a daily ba-sis at UCSD. These jobs include experimental content distribu-tion protocols, distributed model checking systems, and other dis-tributed applications of varying complexity. Mission users benefitfrom Plush’s automated execution capabilities. Users simply sub-mit their jobs to Mission and receive an email when their taskiscomplete. They do not have to spend time configuring their envi-ronment or starting the execution. Individual host errors that occurduring execution are aggregated into one message and returnedback to the user in the email. Logfiles are collected in a public di-rectory on a common file system and labeled with a job ID, so thatusers are free to inspect the output from individual hosts ifdesired.

4.3 Virtual Machine Deployment

In all of the examples discussed above, the pool of resourcesavail-able to Plush is known at startup. In the PlanetLab examples,Plushuses slice information to determine the set of user-accessible hosts.For ModelNet, the emulated topology includes specific informa-tion about the virtual hosts to be created and this information ispassed to Plush in the directory file. We next describe how Plushmanages applications in environments without fixed sets of ma-chines, but rather underlying capabilities to create and destroy re-sources on demand.

Shirako [16] is a utility computing framework. Through pro-grammatic interfaces, Shirako allows users to create dynamic on-demand clusters of resources, including storage, network paths,physical servers, and virtual machines. Shirako is based ona re-source leasing abstraction, enabling users to negotiate access toresources. Usher [21] is a virtual machine scheduling system forcluster environments. It allows users to create their own virtualmachines or clusters. When a user requests a virtual machine,Usher uses data collected by virtual machine monitors to makeinformed decisions about when and where the virtual machineshould run.

We have extended Plush to interface with both Shirako and Usher.Through its XML-RPC interface, Plush interacts with the Shirakoand Usher servers. As resources are created and destroyed, the re-source pool in Plush is updated to include the current set of leasedresources. Using this dynamic resource pool, Plush managesap-plications running on potentially temporary virtual machines inthe same way that applications are managed in static environmentslike PlanetLab. Thus, using the resource abstractions provided byPlush, users are able to run their applications on PlanetLab, Mod-elNet, or on clusters of virtual machines without ever having toworry about the underlying details of the environment.

To support dynamic resource creation and management, we aug-ment the Plush application specification with a descriptionof thedesired virtual machines as shown in Figure 7. Specifically,thePlush application specification needs to include information aboutthe desired attributes of the resources so that this information canbe passed on to either Shirako or Usher. Shirako and Usher cur-rently create Xen [5] virtual machines (as indicated by the “type”

13

<?xml version="1.0" encoding="utf-8"? >

<plush><project name="simple">

<component name="Group1">

<rspec><num hosts>10</num hosts><shirako>

<num hosts>10</num hosts><type>1</type><memory>200</memory><bandwidth>200</bandwidth><cpu>50</cpu><leaselength>600</leaselength><server>http://shirako.cs.duke.edu:20000</serve r>

</shirako></rspec><resources>

<resource type="ssh" group="shirako "/></resources>

</component></project>

</plush>

Figure 7: Plush component definition containing Shirako re-sources. The resource description contains a lease parameterwhich tells Shirako how long the user needs the resources.

flag in the resource description) with the CPU speed, memory,disk space, and maximum bandwidth specified in the resource re-quest. As the Plush controller parses the application specifica-tion, it stores the resource description. Then when thecreateresource command is issued either via the terminal interface orprogrammatically through XML-RPC, Plush contacts the appro-priate Shirako or Usher server and submits the resource request.Once the resources are ready for use, Plush is informed via anXML-RPC callback that also contains contact information aboutthe new resources. This callback updates the Plush resourcepooland the user is free to start applications on the new resources byissuing therun command to the Plush controller.

Though the integration of Plush and Usher is still in its earlystages, Plush is being used by Shirako users regularly at Duke Uni-versity. While Shirako multiplexes resources on behalf of users, itdoes not provide any abstractions or functionality for using the re-sources once they have been created. On the other hand, Plushpro-vides abstractions for managing distributed applicationson remotemachines, but provides no support for multiplexing resources. A“resource” is merely an abstraction in Plush to describe a machine(physical or virtual) that can host a distributed application. Re-sources can be added and removed from the application’s resourcepool, but Plush relies on external mechanisms (like ShirakoandUsher) for the creation and destruction of resources.

The integration of Shirako and Plush allows users to seamlesslyleverage the functionality of both systems. While Shirako pro-vides a web interface for creating and destroying resources, it doesnot provide an interface for using the new resources, so Shirakousers benefit from the interactivity provided by the Plush shell.Researchers at Duke are currently using Plush to orchestrate work-flows of batch tasks and perform data staging for scientific appli-cations including BLAST [3] on virtual machine clusters managedby Shirako [14].

5 Related work

The functionality that Plush provides is related to work in avari-ety of areas. With respect to remote job execution, there areseveraltools available that provide a subset of the features that Plush sup-ports, including cfengine [9], gexec [10], and vxargs [20].The dif-ference between Plush and these tools is that Plush providesmorethan just remote job execution. Plush also supports mechanismsfor failure recovery, and automatic reconfiguration due to chang-ing conditions. In general, the pluggable aspect of Plush allowsfor the use of existing tools for actions like resource discovery andallocation, which provides more advanced functionality than mostremote job execution tools.

From the user’s point of view, the Plush command-line is sim-ilar to distributed shell systems such as GridShell [31] andGCEShell [22]. These tools provide a user-friendly language ab-straction layer that support script processing. Both toolsare de-signed to work in Grid environments. Plush provides a similarfunctionality as GridShell and GCEShell, but unlike these tools,Plush works in a variety of environments.

In addition to remote job execution tools and distributed shells,projects like the PlanetLab Application Manager (appman-ager) [15] and SmartFrog [13] focus specifically on managingdis-tributed applications. appmanager is a tool for maintaining longrunning services and does not provide support for short-lived exe-cutions. SmartFrog [13] is a framework for describing, deploying,and controlling distributed applications. It consists of acollectionof daemons that manage distributed applications and a descriptionlanguage to describe the applications. Unlike Plush, SmartFrog isa not a turnkey solution, but rather a framework for buildingcon-figurable systems. Applications must adhere to a specific APItotake advantage of SmartFrog’s features.

There are also several commercially available products that per-form functions similar to Plush. Namely, Opsware [24] and Ap-pistry [4] provide software solutions for distributed applicationmanagement. Opsware System 6 allows customers to visualizemany aspects of their systems, and automates software manage-ment of complex, multi-tiered applications. The Appistry Enter-prise Application Fabric strives to deliver application scalability,dependability, and manageability in grid computing environments.In comparison to Plush, both of these tools focus more on enter-prise application versioning and package management, and lesson providing support for interacting with experimental distributedsystems.

The Grid community has several application managementprojects with goals similar to Plush, including Condor [8] andGrADS/vGrADS [7]. Condor is a workload management systemfor compute-intensive jobs that is designed to deploy and man-age distributed executions. Where Plush is designed to deploy andmanage naturally distributed tasks with resources spread acrossseveral sites, Condor is optimized for leveraging underutilized cy-cles in desktop machines within an organization where each jobis parallelizable and compute-bound. GrADS/vGrADS [7] pro-

14

vides a set of programming tools and an execution environmentfor easing program development in computational grids. GrADSfocuses specifically on applications where resource requirementschange during execution. The task deployment process in GrADSis similar to Plush. Once the application starts execution,GrADSmaintains resource requirements for compute intensive scientificapplications through a stop/migrate/restart cycle. Plush, on theother hand, supports a far broader range of recovery actions.

Within the realm of workflow management, there are tools thatprovide more advanced functionality than Plush. For example,GridFlow [11], Kepler [19], and the other tools described in[32]are designed for advanced workflow management in Grid environ-ments. The main difference between these tools and Plush is thatthey focus solely on workflow management schemes. Thus theyare not well suited for managing applications that do not containworkflows, such as long-running services.

Lastly, the Globus Toolkit [12] is a framework for building Gridsystems and applications, and is perhaps the most widely usedsoftware package for Grid development. Some components ofGlobus provide a similar functionality as Plush. With respect toour application specification language, the Globus Resource Spec-ification Language (RSL) provides an abstract language for de-scribing resources that is similar in design to our language. TheGlobus Resource Allocation Manager (GRAM) processes requestsfor resources, allocates the resources, and manages activejobs inGrid environments, providing much of the same functionality asPlush does. The biggest difference between Plush and Globusisthat Plush provides a user-friendly shell interface where users di-rectly interact with their applications. Globus, on the other hand,is a framework, and each application must use the APIs to createthe desired functionality. In the future, we plan to integrate Plushwith some of the Globus tools, such as GRAM and RSL. In thisscenario Plush will act as a front-end user interface for thetoolsavailable in Globus.

6 Conclusion

Plush is an extensible application control infrastructuredesignedto meet the demands of a variety of distributed applications. Plushprovides abstractions for resource discovery and acquisition, soft-ware installation, process execution, and failure recovery in dis-tributed environments. When an error is detected, Plush hastheability to perform several application-specific actions, includingrestarting the computation, finding a new set of resources, or at-tempting to adapt the application to continue execution andmain-tain liveness. In addition, Plush provides new relaxed synchro-nization primitives that help applications achieve good throughputeven in unpredictable wide-area conditions where traditional syn-chronization primitives are too strict to be effective.

Plush is in daily use by researchers worldwide, and user feedbackhas been largely positive. Most users find Plush to be an “ex-tremely useful tool” that provides a user-friendly interface to apowerful and adaptable application control infrastructure. Other

users claim that Plush is “flexible enough to work across manyad-ministrative domains (something that typical scripts do not do).”Further, unlike many related tools, Plush does not require applica-tions to adhere to a specific API, making it easy to run distributedapplications in a variety of environments. Our users tell usthatPlush is “fairly easy to get installed and setup on a new machine.The structure of the application specification largely makes senseand is easy to modify and adapt.”

Although Plush has been in development for over three years now,we still have some features that need improvement. One importantarea for future enhancements is error reporting. Debuggingappli-cations is inherently difficult in distributed environments. We tryto make it easier for researchers using Plush to locate and diag-nose errors, but this is a difficult task. For example, one user saysthat “when things go wrong with the experiment, it’s often difficultto figure out what happened. The debug output occasionally doesnot include enough information to find the source of the problem.”We are currently investigating ways to allow application specificerror reporting, and ultimately simplify the task of debugging dis-tributed applications in volatile environments.

7 Author Biographies

Jeannie Albrecht is an Assistant Professor of Computer Scienceat Williams College in Williamstown, Massachusetts. She re-ceived her Ph.D. in Computer Science from the University of Cal-ifornia, San Diego in June 2007 under the supervision of AminVahdat and Alex C. Snoeren.

Ryan Braud is a fourth-year doctoral student at the Universityof California, San Diego where he works under the direction ofAmin Vahdat in the Systems and Networking research group. Hereceived his B.S. in Computer Science and Mathematics from theUniversity of Maryland, College Park in 2004.

Darren Dao is an M.S. student at the University of California, SanDiego where he works under the direction of Amin Vahdat in theSystems and Networking group. He received his B.S. in ComputerScience from the University of California, San Diego in 2006.

Nikolay Topilski is a graduate student at the University of Califor-nia, San Diego where he works under the direction of Amin Vah-dat in the Systems and Networking research group. He receivedhis B.S. in Computer Science from the University of California,San Diego in 2002.

Christopher Tuttle is a Software Engineer at Google in MountainView, California. He received his M.S. in Computer Science fromthe University of California, San Diego in December 2005 underthe supervision of Alex C. Snoeren.

Alex C. Snoerenis an Assistant Professor in the Department ofComputer Science and Engineering at the University of California,San Diego. He received his Ph.D. in Electrical Engineering andComputer Science from the Massachusetts Institute of Technologyin 2003 under the supervision of Hari Balakrishnan and M. FransKaashoek.

15

Amin Vahdat is a Professor in the Department of Computer Sci-ence and Engineering and the Director of the Center for NetworkedSystems at the University of California, San Diego. He receivedhis Ph.D. in Computer Science from the University of California,Berkeley in 1998 under the supervision of Thomas Anderson. Be-fore joining UCSD in January 2004, he was on the faculty at DukeUniversity from 1999-2003.

References[1] J. Albrecht, C. Tuttle, A. C. Snoeren, and A. Vahdat. “Loose Syn-

chronization for Large-Scale Networked Systems,”Proceedings ofthe USENIX Annual Technical Conference (USENIX), 2006.

[2] J. Albrecht, C. Tuttle, A. C. Snoeren, and A. Vahdat. “PlanetLabApplication Management Using Plush,”ACM Operating SystemsReview (OSR), Vol. 40, Num. 1, 2006.

[3] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman.“Basic Local Alignment Search Tool,”Journal of Molecular Biol-ogy, Vol. 215, 1990.

[4] Appistry, http://www.appistry.com/.[5] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho,

R. Neugebauer, I. Pratt, and A. Warfield. “Xen and the Art of Virtu-alization,” Proceedings of the ACM Symposium on Operating Sys-tem Principles (SOSP), 2003.

[6] A. Bavier, M. Bowman, B. Chun, D. Culler, S. Karlin, S. Muir,L. Peterson, T. Roscoe, T. Spalink, and M. Wawrzoniak. “OperatingSystems Support for Planetary-Scale Network Services,”Proceed-ings of the ACM/USENIX Symposium on Networked Systems Designand Implementation (NSDI), 2004.

[7] F. Berman, H. Casanova, A. Chien, K. Cooper, H. Dail, A. Das-gupta, W. Deng, J. Dongarra, L. Johnsson, K. Kennedy, C. Koel-bel, B. Liu, X. Liu, A. Mandal, G. Marin, M. Mazina, J. Mellor-Crummey, C. Mendes, A. Olugbile, M. Patel, D. Reed, Z. Shi,O. Sievert, H. Xia, and A. YarKhan. “New Grid Scheduling andRescheduling Methods in the GrADS Project,”International Jour-nal of Parallel Programming (IJPP), Vol. 33, Num. 2-3, 2005.

[8] A. Bricker, M. Litzkow, and M. Livny. “Condor Technical Sum-mary,” Technical Report 1069, University of Wisconsin–Madison,CS Department, 1991.

[9] M. Burgess. “Cfengine: A Site Configuration Engine,”USENIXComputing Systems, Vol. 8, Num. 3, 1995.

[10] B. Chun.gexec, http://www.theether.org/gexec/.[11] J. Coa, S. Jarvis, S. Saini, and G. Nudd. “GridFlow: Workflow

Managament for Grid Computing,”Proceedings of the IEEE Inter-national Symposium on Cluster Computing and the Grid (CCGrid),2003.

[12] I. Foster. A Globus Toolkit Primer, 2005,http://www.globus.org/toolkit/docs/4.0/key/GT4Primer 0.6.pdf.

[13] P. Goldsack, J. Guijarro, A. Lain, G. Mecheneau, P. Murray, andP. Toft. “SmartFrog: Configuration and Automatic Ignition of Dis-tributed Applications,” HP Openview University Association Con-ference (HP OVUA), 2003.

[14] L. Grit, D. Irwin, V. Marupadi, P. Shivam, A. Yumerefendi, J. Chase,and J. Albrecht. “Harnessing Virtual Machine Resource Control forJob Management,”Proceedings of the Workshop on System-levelVirtualization for High Performance Computing (HPCVirt), 2007.

[15] R. Huebsch. PlanetLab Application Manager,http://appmanager.berkeley.intel-research.net.

[16] D. Irwin, J. Chase, L. Grit, A. Yumerefendi, D. Becker, andK. G. Yocum. “Sharing Networked Resources with BrokeredLeases,”Proceedings of the USENIX Annual Technical Conference(USENIX), 2006.

[17] H. F. Jordan. “A Special Purpose Architecture for Finite ElementAnalysis,” Proceedings of the International Conference on ParallelProcessing (ICPP), 1978.

[18] S. Ludtke, P. Baldwin, and W. Chiu. “EMAN: Semiautomated Soft-ware for High-Resolution Single-Particle Reconstructions,” Journalof Structural Biology, Vol. 122, 1999.

[19] B. Ludscher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger-Frank,M. Jones, E. A. Lee, J. Tao, and Y. Zhao. “Scientific WorkflowManagement and the Kepler System,”Concurrency and Computa-tion: Practice and Experience, Special Issue on Scientific Workflows(CC: P&E), Vol. 18, Num. 10, 2005.

[20] Y. Mao. vxargs, http://dharma.cis.upenn.edu/planetlab/vxargs/.[21] M. McNett, D. Gupta, A. Vahdat, and G. M. Voelker. “Usher: An

Extensible Framework for Managing Clusters of Virtual Machines,”Proceedings of the USENIX Large Installation System Administra-tion Conference (LISA), 2007.

[22] M. A. Nacar, M. Pierce, and G. C. Fox. “Developing a SecureGrid Computing Environment Shell Engine: Containers and Ser-vices,” Neural, Parallel, and Scientific Computations (NPSC), Vol.12, 2004.

[23] D. Oppenheimer, J. Albrecht, D. Patterson, and A. Vahdat. “De-sign and Implementation Tradeoffs for Wide-Area Resource Dis-covery,” Proceedings of the IEEE Symposium on High PerformanceDistributed Compuuting (HPDC), 2005.

[24] Opsware, http://www.opsware.com/.[25] K. Park and V. S. Pai. “CoMon: A Mostly-Scalable Monitoring

System for PlanetLab,”ACM Operating Systems Review (OSR), Vol.40, Num. 1, 2006.

[26] L. Peterson, T. Anderson, D. Culler, and T. Roscoe. “A Blueprint forIntroducing Disruptive Technology into the Internet,”Proceedingsof the ACM Workshop on Hot Topics in Networks (HotNets), 2002.

[27] Plush, http://plush.ucsd.edu.[28] D. M. Ritchie and K. Thompson. “The UNIX Time-Sharing Sys-

tem,” Communications of the Association for Computing Machinery(CACM), Vol. 17, Num. 7, 1974.

[29] A. Vahdat, K. Yocum, K. Walsh, P. Mahadevan, D. Kostic, J. Chase,and D. Becker. “Scalability and Accuracy in a Large-Scale NetworkEmulator,” Proceedings of the ACM/USENIX Symposium on Oper-ating System Design and Implementation (OSDI), 2002.

[30] C. A. Waldspurger. “Memory Resource Management in VMwareESX Server,”Proceedings of the ACM/USENIX Symposium on Op-erating System Design and Implementation (OSDI), 2002.

[31] E. Walker, T. Minyard, and J. Boisseau. “GridShell: A LoginShell for Orchestrating and Coordinating Applications in aGrid En-abled Environment,”Proceedings of the International Conferenceon Computing, Communications and Control Technologies (CCCT),2004.

[32] J. Yu and R. Buyya. “A Taxonomy of Workflow Management Sys-tems for Grid Computing,”Journal of Grid Computing (JGC), Vol.3, Num. 3-4, 2005.

16

Documents

Remote Control: Distributed Application Conﬁguration, Ma ...cseweb.ucsd.edu/~snoeren/papers/plush-lisa07.pdf · systems like UNIX [28] which deﬁned the standard abstractio ns