IEEE TRANSAC TIONS ON COMPUTER S, VOL. 52, NO. 2, …dtipper/3957/Paper16.pdf · 2 U SING R EDUNDANCY AND A DAPTATION 2.1 Overvi ew A survivable service designed to operate across

Building Survivable Services UsingRedundancy and Adaptation

Matti A. Hiltunen, Member, IEEE Computer Society, Richard D. Schlichting, Fellow, IEEE, andCarlos A. Ugarte, Student Member, IEEE

Abstract—Survivable systems—that is, systems that can continue to provide service despite failures, intrusions, and otherthreats—are increasingly needed in a wide variety of civilian and military application areas. As a step toward realizing such systems,

this paper advocates the use of redundancy and adaptation to build survivable services that can provide core functionality forimplementing survivability in networked environments. An approach to building such services using these techniques is described and

a concrete example involving a survivable communication service is given. This service is based on Cactus, a system for buildinghighly configurable network protocols that offers the flexibility needed to easily add redundant and adaptive components. Initial

performance results for a prototype implementation of the communication service built using Cactus/C 2.1 running on Linux are alsogiven.

Index Terms—Survivability, dependability, trustworthiness, redundancy, adaptation, intrusion tolerance, distributed systems.

!

1 INTRODUCTION

A survivable system is one that is able to continueproviding service in a timely manner even if signifi-

cant portions are incapacitated by attacks or accidents [4].The challenges in building such systems are significant,especially if they are part of a large public network such asthe Internet. In addition to having to deal with network andmachine failures, a survivable system must have facilities toprotect against threats and intrusions of different types, todetect intrusions when they occur, and to react to intrusionsand repair damage. As such, survivability builds onresearch in security, reliability, fault tolerance, safety, andavailability, as well as the combination and interaction ofthese different properties [53]. Note that, while we use theterm “survivability” to emphasize certain system attributes,either “dependability” [40] or “trustworthiness” [51] couldalso be used.

This paper focuses on using the key enabling techniquesof redundancy and adaptation to build survivable servicesthat provide core functionality for implementing surviva-bility in a networked environment. Such a service mayprovide, for instance, survivable or intrusion tolerantinterprocess communication or data storage. Redundancyinvolves using extra resources to reduce the chance that anincident will compromise the entire system and can be usedfor data, communication, or in the form of application ofmultiple security techniques. For example, for a survivablecommunication service, redundancy might involve imple-menting message integrity by calculating redundant

independent signatures or implementing confidentialityby encrypting the message with a combination of algo-rithms with keys established using different methods.

Adaptation is the ability of software to modify its behaviorat runtime and can be used as a technique to react to a

suspected intrusion or to change execution unpredictably tocomplicate attacks. For example, for a communicationservice, adaptation might involve replacing a compromised

encryption or key distribution method with a functionallyequivalent method or periodically changing the keys used

for existing secure channels. These two techniques are thenused in combination with other techniques such asintrusion detection [18], [44], data dispersion [7], [20], [23],

[39], firewalls [12], and deception [14] to construct asurvivable system.

The primary goal of this paper is to present an approachto building survivable services based on redundancy and

adaptation. To do so, we first describe various ways inwhich survivability can be enhanced by using these

techniques. We then illustrate the approach by giving aconcrete example—the construction of a survivable com-munication service using a secure service called SecComm

as a starting point. SecComm is a highly configurablecommunication service based on Cactus [33] that provides

various alternatives for security properties such as privacy,authenticity, and integrity. This example also relates to oursecondary goal of highlighting the type of system support

needed to build a service of this type. In this case, theflexibility afforded by a system such as Cactus greatlysimplifies the task of adding redundant and adaptive

components to SecComm. Performance results from aprototype implementation built using Cactus/C 2.1 running

on Linux are given to quantify the cost of survivability inthis context.

IEEE TRANSACTIONS ON COMPUTERS, VOL. 52, NO. 2, FEBRUARY 2003 181

. M.A. Hiltunen and R.D. Schlichting are with AT&T Labs-Research, 180Park Ave., Florham Park, NJ 07932.E-mail: {hiltunen, rick}@research.att.com.

. C.A. Ugarte is with the Department of Computer Science, University ofArizona, Tucson, AZ 85721. E-mail: [email protected].

Manuscript received 1 Feb. 2002; revised 7 Sept. 2002; accepted 20 Sept. 2002.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number 117445.

0018-9340/03/$17.00 ! 2003 IEEE Published by the IEEE Computer Society

2 USING REDUNDANCY AND ADAPTATION

2.1 Overview

A survivable service designed to operate across multiplehosts in a network must continue to implement some or allof its functionality despite external impacts, such as failuresor intrusions, and while maintaining security guarantees,such as confidentiality and integrity. Survivability hasmany parallels with fault tolerance and, as such, many ofthe techniques used for fault tolerance can be adapted forsurvivability. With fault tolerance, the assumption is thatany component can fail and the goal is to increase theprobability that a system as a whole can continue to operatecorrectly despite the failure. Similarly, with survivability,the assumption is that any security mechanism can becompromised and the goal is to increase the probability thata system as a whole can continue to operate in anuncompromised fashion despite the attack.

This section explores the use of redundancy andadaptation—both techniques long associated with faulttolerance—in the context of survivability. Of course, asmentioned above, these techniques alone do not solve theproblem, but rather should be used in combination withother techniques to develop a comprehensive solution.

2.2 Redundancy

Redundancy in fault tolerance usually takes the form ofeither space redundancy, such as replication of data orcomputation, or time redundancy, such as repeated executionor repeated message transmission. Redundancy that hasproven useful for survivability includes techniques such aslayered protection and data fragmentation and replication. Anapplication of layered protection may involve, for example,encrypting critical files, which can protect data from anintruder even if the operating system itself is compromised[25]. The technique of data fragmentation and replication,which can improve availability as well as protect data fromintruders, has been used in numerous systems [20], [23],[26], [39]. Threshold cryptography [19], in which a key isfragmented and the fragments replicated so that any k outof n fragments are required to access encrypted informa-tion, is based on a similar idea.

Another type of redundancy that has been explored lessis the use of redundant methods. The basic idea is simple—with redundant methods enforcing a given securityattribute such as privacy or integrity, the attribute shouldremain valid if at least one of the methods remainsuncompromised. For example, to tolerate an attack againsta public key-based PKI, a service might use two completelydifferent authentication mechanisms (e.g., PKI and Ker-beros). The same principle can be applied to communica-tion security, where a message can be encrypted usingcombinations of different methods and signed usingdifferent signature algorithms. The value of using multipledifferent methods in this way is that it introduces diversityand can reduce the dangers associated with a vulnerabilityin any given single method that might be exploited by asystematic attack across multiple machines.

Redundancy techniques can be used to realize a numberof different survivability principles. For example, redun-dancy helps avoid single points of vulnerability, including

vulnerabilities and weaknesses in distributed securityalgorithms. The use of different combinations of redundantmethods also introduces artificial diversity and unpredict-ability, which reduces the chances that an attack will besuccessful. In short, redundancy can help increase thesurvivability of a service by increasing the probability that itremains operational with appropriate security guaranteesdespite portions being compromised.

While redundancy can be a useful survivability techni-que, as with fault tolerance, its effectiveness depends on thedetails of how it is used. One important goal is maximizingthe independence of the redundant elements, where twoelements A and B are independent if compromising Aprovides no information that makes it easier to compromiseB and vice versa. For example, if two independent methods,m1 and m2, are used to authenticate a user, breaking m1

does not make it easier to break m2. A simple example ofnonindependence is when two encryption methods use thesame key since, if one method is broken or the key stolen,privacy is compromised. This type of independence is verymuch analogous to the fault tolerance concept of indepen-dent failure modes for redundant hardware or softwarecomponents. Components are independent in this sensewhen the failure of one component does not affect thecorrect execution of any other component.

Like fault tolerance, the idea ofmaximizing independenceapplies to many different aspects of a system’s securityarchitecture. In addition to affecting the choice of keys orpasswords as described above, it applies to the use ofcryptographic techniques, to the location in which differentkeys are stored, to the methods used for creating anddistributing keys, and even to the choice of which parts of asystem different administrators can access. Determining thedegree of independence touse in each case depends on a cost-benefit analysis, that is, ascertaining how expensive is it toincrease independence versus how likely it is that a particularaspect of the architecture will be attacked.

Another factor that determines effectiveness is the way inwhich the redundant elements are combined. For example,assume the goal is to protect a file using redundantencryption and two possible approaches are considered. Inapproach A, the first half of the file is encrypted using onemethod and the other half using the other method, while, inapproach B, the whole file is encrypted using both methodssequentially. With approach A, an intruder can break thehalves independently and knows when each method hasbeen compromised, so the total effort required is propor-tional to the sum of the efforts required to break eachmethod. On the other hand, with approach B, an intruderdoes not know whether either method has been compro-mised until both have been broken, so the effort requiredis proportional to the product of the efforts needed tobreak each method. Note, however, that using techniquesin combination can often have unexpected subtleties. Inthe case of multiple encryption, for instance, if themethods form an algebraic group, breaking the doubleencryption is as easy as breaking single encryption [9]. Inaddition, more sophisticated attacks can be used to breakmultiple encryption faster than brute force attacks, such as

182 IEEE TRANSACTIONS ON COMPUTERS, VOL. 52, NO. 2, FEBRUARY 2003

the “meet in the middle” attack used against double DES(2DES) [42].

While the cost of redundancy can typically be measuredexperimentally, the benefit—i.e., the increase in survivabil-ity—is much harder to quantify. Unlike fault tolerance,where the failure rate of a component can be measured andused to determine the necessary level of redundancy toachieve a given reliability, with survivability the attack ratesvary unpredictably and adversaries constantly develop newattack techniques that can invalidate previous assumptionsabout the level of security provided by specific techniques.Indeed, quantifying the survivability of a system requiressolving a number of difficult research challenges beyondthe scope of this paper, many of which are the focus ofresearch efforts elsewhere. These include quantifyingattacker behavior [37], analysis of the reliability, latency,and cost-benefit for a system given that the probabilities forfailures and successful intrusions are known for eachcomponent [36], model-based quantification of survivabilitymetrics [50], and analysis of the security achievable usingmultiple cryptographic techniques [2].

The use of redundancy—especially method redundancy—to enhance survivability is explored further in the contextof a survivable communication service in Section 3. In thisservice, the ability of Cactus to configure modules inflexible ways is exploited to allow redundant methods to beused to ensure such security attributes as privacy, authen-ticity, and integrity.

2.3 Adaptation

Adaptive software, i.e., software that changes its behavior atruntime, has been used in a number of contexts. For example,the Transmission Control Protocol (TCP) of the Internetprotocol suite uses adaptive mechanisms for flow control,retransmission, and congestion control [35]. Other examplesinclude adaptive media access control [13], adaptive encod-ing and compression [24], and adaptive routing algorithms[5]. Adaptation has typically been limited to changingparameter values such as timeout or transmission windowsize, butmore recentwork addresses algorithmic adaptationswhere a software component changes the algorithms used toimplement its service at runtime [11], [47]. Many faulttolerance techniques can be viewed as specialized forms ofadaptive behavior [10], [27], [31].

Like redundancy, adaptation is a useful technique forsurvivable systems and can be used to realize a number ofsurvivability principles. For example:

. A service such as a communication protocol thatchanges its behavior deterministically increases arti-ficial diversity, while a service that adapts nondeter-ministically can also increase its unpredictability.

. Adaptation can be used to implement gracefuldegradation; for example, if portions of the servicehave been compromised by an intrusion, the servicemay be able to adapt to an operating mode thatexcludes those portions but still satisfies some levelof client requirements.

. Adaptation can be used to deal with changes insecurity and survivability requirements, as well as torespond to detected or suspected attacks; an example

of the former is when stronger encryption may berequired, while an example of the latter isterminating a suspected connection [8].

Coordination among software components on a given hostor across hosts might be required, depending on the specificcontext. For example, if the encryption algorithm used ingroup communication is changed, all group members mustbe notified and the switchover synchronized so that nomessages are lost. Such coordination requires specialprotocols [11], [47].

The adaptation process itself can be characterized by ageneral multiphase framework independent of the contextin which it is used and the specific actions taken [31].Specifically, the process is divided into three phases: changedetection, agreement, and action. Change detection involvesdetecting a condition that might require an adaptation.Agreement follows change detection and involves reachingan agreement among the participants of a service onwhether an adaptation should be made and what theadaptation should be. The action phase involves performingthe adaptation itself—i.e., changing parameter values oralgorithms—and includes any coordination that might berequired. Note that agreement is not required if theadaptation only involves one host or if each host can adaptindependently.

In addition to clarifying the process, these three phasesalso provide a framework for structuring an adaptiveservice and for implementing portions of the process asreusable software components. For example, the agreementphase and the coordination required in the action phase canoften be implemented as algorithms that can be used indifferent contexts. The example communication servicepresented in Section 3 is structured in this way.

Using adaptation techniques as the basis for survivableservices requires addressing a number of issues. Perhapsthe most important problem is ensuring that the adaptationmechanism itself does not make the service more vulner-able. That is, the mechanism must prevent an intruder fromcompromising the service by having it deactivate securitymechanisms or switch to weaker security mechanisms. Forexample, suppose that an adaptation mechanism is used forgraceful degradation where the system switches to lesscostly encryption when resources are lost. In this scenario,an intruder may be able to trigger this adaptation byattacking those resources and then gain access to the systemby breaking the weaker encryption. This particular problemcan be resolved by not weakening security as part of agraceful degradation, but the issue is general. As was thecase with redundancy, quantifying the increase in surviva-bility afforded by adaptation is an open research issue thatis beyond the scope of this paper.

Like redundancy, adaptation is a core technique used inthe survivable communication service example in Section 3.In this case, the ability of Cactus to change executionpatterns and module configurations dynamically allows theservice to adapt in unpredictable ways and to react toexternal events.

HILTUNEN ET AL.: BUILDING SURVIVABLE SERVICES USING REDUNDANCY AND ADAPTATION 183

3 COMMUNICATION: FROM SECURE TO

SURVIVABLE

Asan example,wedemonstrate how the approaches outlinedin the previous section can be applied to a secure commu-nication service called SecComm. SecComm provides custo-mizable communication security by allowing its user tochoosewhich security attributes are required for a connectionandwhich algorithms are used to implement these attributes.After giving an overview of SecComm, we describe how theservice can be augmented with redundancy and adaptationtechniques to make it more survivable.

3.1 SecComm Overview

SecComm is a highly configurable secure communicationservice with the inherent flexibility needed to realizeredundancy and adaptation-based survivability techniques.We assume a typical distributed computing scenarioconsisting of a collection of machines connected by a localor wide-area communication network. Application-levelprocesses communicate by using a communication sub-system that typically consists of IP, some transport-levelprotocol such as TCP or UDP, and, potentially, somemiddleware-level protocols. SecComm provides the ab-straction of a bidirectional point-to-point communicationchannel for each connection that is opened.

SecComm uses established algorithms to provide secur-ity guarantees for the following properties:

. Authenticity. Ensures that a receiver can be certain ofthe identity of the message sender. Can be implemen-ted using public key cryptography, any shared secret,or a trusted intermediary such as Kerberos.

. Privacy. Ensures that only the intended receiver of amessage is able to interpret the contents. Can beimplemented using any shared secret, public keycryptography, or combinations of methods.

. Integrity. Ensures that the receiver of a message canbe certain that the message contents have not beenmodified during transit. Some authenticity andprivacy methods also provide integrity as a sideeffect if the message format has enough redundancyto detect violations. Additional redundancy can beprovided using message digest algorithms such asMD5 [48]. Integrity can be provided without privacy,but, at a minimum, the message digest itself must beprotected.

. Nonrepudiation. Ensures that a receiver can beassured that the sender cannot later deny havingsent the message. Relies on authenticity provided bypublic key cryptography and requires that thereceiver store the encrypted message as proof.

. Key distribution. The keys needed by previousmethods can be established in several ways. Theuser can distribute keys manually and have theapplication pass the keys to SecComm. Alterna-tively, SecComm can establish the keys itself,making use of protocols such as Diffie-Hellman[22] or external Certification Authorities and KeyDistribution Centers such as Kerberos [43], [52].

. Replay prevention. Prevents an intruder fromgaining an advantage by retransmitting old mes-sages. Can be implemented using timestamps,sequence numbers, or other such nonces in mes-sages. Typically used in conjunction with authenti-city, privacy, or integrity since, otherwise, it wouldbe trivial for an intruder to generate a new messagethat appears to be valid.

. Known plain text attack prevention. Prevents anintruder from utilizing known plain text-basedattacks by including additional random information(“salt”) at the beginning of a message.

Each of these properties is associated with one or moresoftware modules that implement the property usingdifferent algorithms. SecComm is configurable in the sensethat specific modules can be selected based on user andapplication requirements. Such configurability is useful ontwo levels. First, it allows the user to determine whichproperties to use—if a certain property is not required or ifit is already implemented by some other system layer, thenno module that implements the property needs to beincluded. Second, it allows the user to select which moduleto use to implement a particular property. The user canevaluate the modules by various metrics, including therelative level of security and resource utilization.

All modules are syntactically independent from oneanother by virtue of the execution model supported byCactus, which means that any combination is syntacticallyvalid. However, the different security properties havecertain semantic dependencies and ordering constraintsthat are reflected as equivalent semantic dependencies andordering constraints between the modules that implementthese properties. These dependencies can be recorded as aconfiguration graph that can be used to ensure any chosenconfiguration is semantically consistent [29]. Configurationconstraints in SecComm are discussed in Section 4.2.

3.2 Redundancy

Security services that provide attributes such as privacy,integrity, and authenticity typically implement each attri-bute using a single method. For example, in a securecommunication service, privacy may be provided by DESand integrity by keyed MD5. In choosing which algorithmto use to satisfy a given security property (e.g., privacy), onenormally bases the selection on the tradeoff between anacceptable level of security and an acceptable cost. If moresecurity is deemed necessary, a better algorithm or betterversion of the same algorithm (e.g., by using a longer key) ischosen; this often results in greater resource utilization.

Although such an approach may be secure in thetraditional sense, it is not survivable—once a method iscompromised, all security guarantees on the connectionrelated to that attribute are gone. Note that a securitymethod has multiple different points of attack. In the case ofsecure communication, the encryption algorithm may bebroken, the key may be stolen from a user or user’smachine, or the key distribution method may be brokenand, thus, the key assigned to a connection may be knownby an intruder. Each method has, in essence, multiple singlepoints of vulnerability very much analogous to a single


point of system failure when considering fault toleranceattributes. This problem is the same for many other aspectsof security, including authentication and access control.

The redundancy techniques described in Section 2 can beused to enhance the survivability of SecComm. Thesetechniques can be used within a single SecComm connectionor by creating multiple redundant connections. Within aconnection, this is done by using two or more techniques toguarantee an attribute rather than a single method. Giventhat, an attackmay be successful against onemethod, but thesystem itself will remain secure if the other methods remainuncompromised. For maximal independence, each methodshould use a separate key established using different keydistribution methods to avoid an attack based on stealingkeys or compromising a key distribution method. If multipleSecComm connections are created, the data sent can befragmented between the connections and different combina-tions of methods can be used for the different connections. Inthis case, the SecComm connections can also be establishedbetween different pairs of machines and the message datafragmented so that even the full compromise of one machinedoes not reveal the information or make it impossible totransmit the information. Finally, these machines should bediverse enough that it is unlikely they can be compromisedusing the same attack.

Here, we focus on redundancy within a single connec-tion since extending the ideas to multiple connections isstraightforward. The simplest way to make use of thisredundancy is to apply the different methods successivelyon the same data. For example, a message might first beencrypted using DES and then AES [17]. However, there aremany other approaches, including:

. Alternating the order in which methods are applied,e.g., apply DES before AES for some messages andAES before DES for others.

. Applying different methods to different parts of thedata, e.g., encrypt different parts of one message ordifferent messages in a stream using differentmethods.

The first approach is static since every message is processedby the same methods, while the other two are dynamicsince different methods can be used in different ways fordifferent messages. The resulting unpredictability is in-tended to make an adversary’s task more difficult.

Maximizing the independence of the methods and keysused within a single SecComm connection or betweenmultiple connections increases the survivability of theservice. Given that different types of attacks can bemounted against a secure communication service, thereare different aspects of independence. These include:

. Independence of keys, meaning that keys are different,are generated using different methods, are stored indifferent locations, and are administered by differentadministrators.

. Independence of methods, meaning that the methodsused are diverse, both within a connection andacross connections.

For example, if an attack attempts to determine a key byguessing, then using two independent keys to secure a

connection is sufficient to increase survivability. On theother hand, if the attack exploits a weakness in thealgorithm, it would be necessary to use two differentalgorithms. Similarly, if the attack compromises one of themachines that serves as the endpoint of the secureconnection, it would be necessary to use connectionsbetween multiple pairs of diverse machines to minimizethe chance of the attack compromising both.

In the case of encryption methods, independence isdifficult to argue rigorously, but the risk of methods notbeing independent is likely to be minimized if the methodsare substantially different or if they encrypt data in differentsize blocks. It is also possible to develop combinations thatattempt to maximize independence by not simply encrypt-ing the same data multiple times, but by combining themethods in different ways. For example, suppose thatm is acleartext message and E1 and E2 are different encryptionmethods. A ciphertext message cm could be constructed ascm ¼ fE1ðm# rÞ; E2ðrÞg, where # is the exclusive-or opera-tion and r is a random bit sequence the same length as themessage. The message sender generates r individually foreach message sent. The receiver can reconstruct m bydecrypting the parts of the message using E1 and E2,respectively, and then performing exclusive-or operationbetween the parts to eliminate r. Given this method,breaking only E1 or E2 does not produce any usefulinformation, which means that the attacker has to breakboth simultaneously to compromise the system. As a result,the effort required is multiplicative. Note that the randombit sequence should be truly random or at least difficult toguess or, otherwise, the attacker may not need to break E2.

Determining independence of methods is easier for othersecurity attributes such as message integrity. Let m be themessage to be protected and d1ðmÞ; d2ðmÞ; . . . be differentcryptographic message digests of m. Since the messagedigest algorithms operate on the message independently, anattacker would need to compromise each integrity algo-rithm separately. In this case, the increase in the breakingeffort is additive since the attacker knows when eachmethod has been broken.

3.3 Adaptation

Fine-grained configurability allows SecComm to be tailoredto the needs of an application and the execution environ-ment. In many cases, these needs are not static, but rathercan change at runtime. For example, the quality andsecurity of a network connection may change as a mobileuser changes location. It may also be necessary to changesecurity algorithms that are suspected to have beencompromised and introduce new security algorithms tostrengthen the system. Adaptation can be applied in thesetypes of situations to increase system survivability.

Adaptation in SecComm can be seen as dynamicconfigurability of methods augmented with adaptationalgorithms to handle the adaptation process. The samelibrary of methods that makes the system configurable isavailable so that the best method—or combination ofmethods—can be chosen based on the current needs. Thechoice of configuration can be made by the application orby the adaptation algorithm. The adaptation algorithmmust also handle any monitoring for changes that might


lead to an adaptation, any negotiation that might benecessary, and the actual switchover from the old config-uration to the new configuration. The current version ofSecComm performs adaptation between methods that havebeen linked statically into the service. While it is possible toload new methods at runtime using dynamic libraries, suchan approach has additional security risks that are notaddressed in this paper.

As noted above, the first phase in an adaptation ischange detection, which triggers the process. The changemay occur for any number of different reasons, including:

. Availability of local resources. As processor andmemory utilization increases, it may be beneficialto change to an algorithm that uses less resources.Similarly, if processor and memory utilizationdecreases, it may be beneficial to change to a moreeffective algorithm that uses more resources.

. Availability of network resources. Bandwidth andlatency may change, prompting the switch to adifferent algorithm. Routing changes may alsotrigger adaptations.

. Intrusion detection. When an intrusion is detected by,for example, an intrusion detection system (IDS), theservice may adapt by replacing existing securityalgorithms with stronger ones. Similarly, if theintruder has managed to compromise completelythe service on a given host, any existing connectionsto that host may need to be terminated.

. User directive. The user may trigger an adaptationdirectly. The user’s options range from signaling thatan adaptation is required to specifying what the newconfiguration should be.

The task of monitoring to detect changes can be done withinSecComm as a separate module. Alternately, monitoringcan be done externally and a signal sent to SecComm whenan appropriate event is detected.

The second phase is agreement. Within SecComm, thereis an evaluation module that determines when an adapta-tion is needed and what the new configuration should be. Insimple adaptations, this decision can be made by a singlehost and no agreement is needed. In more complex cases,agreement between both hosts may be required. InSecComm, the host that initiates the adaptation alsocoordinates the agreement. This host is currently assignedstatically for each connection, although it could, inprinciple, be done dynamically.

The third phase is to perform the adaptive action, i.e., toswitch the algorithms. One simple approach is as follows:The initiating host signals the other host that an adaptationis about to occur, stops transmitting messages with the oldconfiguration and awaits confirmation from the secondhost. Upon reception of the initiator’s message, the secondhost deactivates the old configuration, activates the newconfiguration, and transmits an acknowledgment message.Once this acknowledgment is received, the initiating hostswitches from receiving messages using the old configura-tion to receiving messages using the new configuration.

It is important that no messages are lost during thetransition, so the initiating host stops transmitting messagesuntil it is certain that the second host is ready to accept

messages using the new configuration. Before it sends theacknowledgement, the second host uses the old configura-tion to send messages; the initiating host keeps the oldconfiguration active only for message reception. Note thatthis scheme relies on the underlying protocols to providereliable ordered delivery. If either of these properties is notguaranteed, a different protocol must be used.

One of the main concerns in adding a new module to asecure or survivable system is that no new vulnerabilities beintroduced. With respect to adaptations, some of thepossible vulnerabilities are:

. Providing information to an adversary. An intrudershould not be able to examine the adaptationmessages and benefit from it.

. Triggering adaptations. An intruder should not be ableto trigger an adaptation.

. Denial of service. An intruder should not be able toprevent an adaptation from completing.

Techniques for addressing these issues in SecComm arediscussed in Section 4.

4 IMPLEMENTATION

The traditional security methods offered by SecComm canbe combined with techniques that offer redundancy andadaptation to produce a version of the service that is moresurvivable. Here, we discuss implementation aspects,focusing on how the abstract methods discussed abovemap into concrete components. The basis for this is Cactus,which allows these components to remain independent, yetinteract with each other as needed. We first give anoverview of Cactus, then explain in detail how SecCommmakes use of these features.

4.1 Cactus

Cactus is a system for constructing configurable networkprotocols and services where each service property orfunctional component is implemented as a separate soft-ware module called a microprotocol [33]. A customizedinstance of a service is then created by choosing a collectionof microprotocols based on the properties to be enforcedand configuring them together with the Cactus runtimesystem to form a composite protocol that implements theservice on each machine. A microprotocol is structured as acollection of event handlers that are executed when aspecified event occurs. Events can be raised explicitly bymicroprotocols or by the Cactus runtime.

The primary event-handling operations are:

. bind(event, handler, order, static_args). Specifies thathandler is to be executed when event occurs. order is anumeric value specifying the relative order in whichhandler should be executed relative to other handlersbound to the same event. When the handler isexecuted, the arguments static_args are passed aspart of the handler arguments.

. raise(event, dynamic_args, mode, delay). Causes event tobe raised after delay time units. If delay is 0, the eventis raised immediately. The occurrence of an eventcauses handlers bound to the event to be executed


with dynamic_args (and static_args passed in the bindoperation) as arguments. Execution can either blockthe invoker until the handlers have completedexecution (mode = SYNC) or allow the caller tocontinue (mode = ASYNC).

Other operations are available for unbinding handlers fromevents, creating and deleting events, halting event execu-tion, and canceling a delayed event. Handler execution isatomic with respect to concurrency, i.e., a handler isexecuted to completion before execution of any otherhandler is started unless the handler voluntarily yieldsexecution by either raising another event synchronously orby invoking a blocking semaphore operation. In the case ofa synchronous raise, the handlers bound to the raised eventare executed before control returns to the handler thatissued the raise. In addition to the flexible event mechan-ism, Cactus supports shared data that can be accessed by allmicroprotocols configured into a composite protocol.

Finally, Cactus provides a message abstraction, calleddynamic messages, that is designed to facilitate developmentof configurable services. The main features provided bydynamic messages are named message attributes and acoordination mechanism that only allows a message to betransferred out of a composite protocol when agreed by allmicroprotocols. Message attributes are a generalization oftraditional message headers and have scopes correspondingto a single composite protocol (local), all the protocols on asingle machine (stack), and the peer protocols at the senderand receiver (peer). A customizable pack routine concate-nates peer attributes to the message body for networktransmission or for operations such as encryption andcompression. A corresponding unpack routine extracts thepeer attributes from a message at the receiver.

The flexibility of Cactus allows abstract service proper-ties and functions to be implemented as independentmodules without enforcing artificial ordering betweenmodules as in hierarchical composition frameworks suchas the x-kernel [34]. Furthermore, the indirection providedby the event mechanism makes it easy to change thecollection of microprotocols dynamically without affectingother microprotocols.

The facilities provided by Cactus are not tied to anyspecific programming language, architecture, or operatingsystem. Several prototype implementations of Cactus havebeen constructed, including versions written in C, C++, andJava, running on Linux, Solaris, and other platforms. Inaddition to SecComm, other prototype services that havebeen successfully implemented using Cactus or the pre-decessor Coyote system [6] include group RPC [30],membership [32], and a real-time channel abstraction [33].

4.2 SecComm Implementation

SecComm executes in user space with either IP, UDP, orTCP as the underlying protocol. As described above,SecComm allows fine-grain customization of a range ofsecurity attributes including privacy, authenticity, messageintegrity, replay prevention, nonrepudiation, and keydistribution.

4.2.1 Microprotocol Structure

The abstract security attributes and key distribution areimplemented by one or more microprotocols. When anumber of microprotocols implement variations of the sameabstract property, we collectively refer to them as a class ofmicroprotocols. For example, the class of privacy micro-protocols includes DESPrivacy, RSAPrivacy, and IDEAPriv-acy microprotocols that use the DES, RSA, and IDEAalgorithms, respectively. Fig. 1 illustrates the main micro-protocol classes and typical event interactions between them.

The design of the SecComm service allows any combina-tion of security microprotocols to be used together in bothstatic and dynamic ways. The ability to use multiplemicroprotocols within a given class at the same time isone way in which redundancy can be used to support asurvivable service. Naturally, there may be configurationconstraints between microprotocols that affect whichcombinations are feasible.

SecComm has four major types of microprotocols: basicsecurity microprotocols that perform simple security trans-formations such as encryption or integrity checks, keydistribution microprotocols that allow the safe exchange ofkeys used by other microprotocols, meta-security micropro-tocols that build more complex security protocols using thebasic security microprotocols as building blocks, andadaptation microprotocols that allow the configuration ofexisting basic and meta-security microprotocols to bechanged dynamically. An example of a simple securitymicroprotocol is DESPrivacy, which provides privacy ofdata exchange using the DES algorithm. An example of ameta-security microprotocol is MultiSecurity, which usesmultiple basic security microprotocols to provide strongerguarantees. An example of an adaptation microprotocol isSimpleAdaptation, which dynamically switches the micro-protocol used to implement a given security property. Basicsecurity and key distribution microprotocols are discussedfurther below, while meta-security and adaptation micro-protocols are described in Sections 4.3 and 4.4, respectively.

Our prototype implementation of SecComm uses theCryptlib cryptographic package [28] to provide basiccryptographic functionality. Any cryptolibrary with thenecessary functions could be used, however.


Fig. 1. Microprotocol classes and interactions.

4.2.2 Application Programming Interface

SecComm allows a higher level service or application toopen secure connections and then send and receivemessages through these connections. The specific opera-tions exported by SecComm are the following:

. Open(participants,role,config). Opens a session for anew communication connection, where participantsis an array identifying the communicating princi-pals, role identifies the role of this participant inopening the connection (active or passive), andconfig is a configuration that captures the desiredsecurity properties of the session.

. Push(msg). Passes a message from a higher-levelprotocol or application to a SecComm session to betransmitted with the appropriate security attributesto the participants.

. Pop(msg). Passes a message from a lower levelprotocol to a SecComm session to be decrypted,checked, and potentially delivered to a higher levelprotocol. When the SecComm protocol passes amessage to the higher level and authentication isrequired, it adds a stack attribute that is the ID of theauthenticated sender.

. Close(). Closes a SecComm communication session.

. Adapt([config]). Triggers an adaptation. The newconfiguration is provided via the optional argumentconfig or can be selected internally by the adaptationprotocol.

We assume that the participants of the communicationconnection negotiate the properties for the connection on ahigher level. Once negotiated, they are specified in theOpen()operation as two ordered lists of microprotocols and theirarguments, the first for messages going downward throughthe composite protocol and the second for messages goingupward. Thus, for example, the following specifies thatmessages going downward are processed first by DES andthenbyMD5,whilemessagesgoingupwardareprocessedbythe same microprotocols but in the reverse order:

fDESðDESkeyÞ;MD5ðMD5keyÞ;MD5ðMD5keyÞ;DESðDESkeyÞg:

Our eventual goal is to develop an approach in whichproperties are given as formal specifications that are thentranslated automatically into collections of microprotocolsand arguments.

4.2.3 Basic Security Microprotocols

The basic security microprotocols are simple, typicallyconsisting of two event handlers and an initializationsection. One of the event handlers is used for the datapassing down through the SecComm protocol and the otherone is used for data passing up through the protocol. Theinitialization section of the microprotocol is executed whena new SecComm connection is opened, i.e., when a sessionis created.

A basic security microprotocol (Fig. 2) typically takesfour or five arguments. In this parameter list, dEvnt anduEvnt are events that signify message arrival from an upperand lower-level protocol, respectively. The two handlers in

the microprotocol are bound to these events to initiateexecution at the appropriate time. The dOrd and uOrdparameters are the relative orders in which this particularsecurity microprotocol is to be applied to messages flowingdown and up, respectively. Some algorithms call for the useof keys; those basic security microprotocols that implementsuch algorithms allow for its optional specification via thekey argument.

Note that if the key used by the security microprotocolhas yet not been established, it raises an event keyMiss thatis handled by the key distribution microprotocols. Thisevent is raised synchronously and, thus, the handler isblocked until the associated event handlers have completedexecution. This allows the key distribution microprotocolsto block the appropriate handler until the key has beenestablished. The design uses event pointers as argumentsrather than fixed event names to allow multiple types ofconfigurations, an approach that demonstrates the inherentflexibility provided by an event-based execution model.

4.2.4 Key Distribution Microprotocols

If the keys used by the secret key cryptographic methodsare not agreed upon a priori, they must be established afterthe communication session is opened. As with the othersecurity properties, we use established algorithms to do keydistribution. Each algorithm is implemented as a separatemicroprotocol and each basic security microprotocol can beassociated with a different key distribution microprotocol.This association can be specified through the Open()operation by the application; if omitted, the default keydistribution microprotocol is selected.

All key distribution microprotocols support the follow-ing operations that are accessed indirectly through theevent mechanism:

. KeyRegister(key). A basic security microprotocol thathas been supplied with a key must register it withthe key distribution microprotocol.

. KeyMiss(key). A basic security microprotocol that ismissing a key invokes this operation. Upon comple-tion, the key argument contains the necessary key.

By requiring that these two operations be provided, all basicsecurity microprotocols can use the same interface. The


Fig. 2. Generic basic security microprotocol.

other operations—the ones that implement the bulk of thework—can differ from one key distribution microprotocolto the next. We classify key distribution microprotocolsbased on who is responsible for generating the key; amongthe potential options are:

. Asymmetric. One communicating principal (e.g., aclient or a server) creates a session key anddistributes it to the other principals.

. Symmetric. A session key is created using the Diffie-Hellman algorithm.

. External. Some external security principal createsthe session key and distributes it to communicatingprincipals (e.g., Kerberos, certification authority).

Note that whether the key distribution protocol is sym-metric or asymmetric is orthogonal to the type of encryptionmethod for which they keys are used and, in particular,whether the encryption method itself is symmetric (e.g.,DES) or asymmetric (e.g., RSA).

4.2.5 Configuration Constraints

A number of factors must be considered when micropro-tocols are combined into a custom instance of the SecCommservice. In particular, there are both algorithmic orproperty-based constraints that are independent of aparticular implementation and implementation constraintsthat are specific to our Cactus-based prototype. Algorithmicconstraints are those that result from the inherent nature ofthe properties being enforced or the algorithms used. Forexample, the nonrepudiation microprotocol requires the useof an authenticity microprotocol based on public keys.Similarly, all microprotocols that use a key require eitherthat the key is provided when the session is created or that akey distribution microprotocol is included.

Other algorithmic constraints affect the order in whichvarious security algorithms are applied. For example, allattack prevention microprotocols should execute beforeprivacy, integrity, or authenticity microprotocols at thesender to ensure that the mechanism used for attackprevention is protected from modification. Similarly, non-repudiation microprotocols should be executed immedi-ately before authentication at the receiver so that only thesender’s public key is required to later prove the messagewas sent by the sender. Other ordering constraints havebeen identified elsewhere [1], [3].

Implementation constraints are those that result from thespecific design of the SecComm microprotocols. Comparedwith systems that support linear or hierarchical composi-tion models, the nonhierarchical model supported byCactus introduces minimal implementation constraints onconfigurability. That is, with Cactus, it is generally possibleto implement independent service properties so that thisindependence is maintained in the microprotocol realiza-tion. When extra constraints do get imposed, it is usuallybecause making an extra assumption about which othermicroprotocols are present significantly simplifies theimplementation.

In the current SecComm prototype, the only additionalimplementation constraint is that each integrity and replayprevention microprotocol can be used at most once in agiven configuration. Thus, for example, two instances of

MD5Integrity cannot be used together, while MD5Integrityand SHAIntegrity can be. This restriction results from theuse of fixed message attribute names for each microproto-col, which could be avoided by dynamically assigningattribute names at startup time.

4.3 Redundancy Using Meta-SecurityMicroprotocols

The basic SecComm design supports redundancy implicitlysince multiple microprotocols for a given security propertycan be included in the same service. For example, multipleencryption microprotocols can be configured together toprovide privacy through redundant methods. However, themeta-security microprotocols have been designed specifi-cally for implementing more complex redundancy schemesby allowing basic security microprotocols to be combined insophisticated ways. For example, a meta-security micro-protocol may apply multiple or alternating basic securitymicroprotocols to a message.

The basic structure of a meta-security microprotocol isshown in Fig. 3. In this design, the microprotocol is passedvectors of down and up events that correspond to theevents to which handlers in the basic microprotocols havebeen bound as arguments downBasicEvnts and upBasicEvnts.

Examples of different specific meta-security microproto-cols include:

. MultiSecurity. Applies multiple basic security pro-tocols to a message in sequence.

. AltSecurity. Applies one security microprotocol toeach message, with the method chosen successivelyfrom a specified list. If the sequence of methods isdeterministic or agreed upon by the sender andreceiver, no additional information is requiredprovided that the underlying communication isreliable and maintains FIFO ordering.

. RandomAltSecurity. Similar to AltSecurity but usesa randomly chosen method for each message. Eachmessage must carry an identifier than can be used bythe receiver to determine which method to use todecrypt the message.

. ExpansionSecurity. Uses the technique mentioned inSection 3.2 that xors the message body with a


Fig. 3. Generic meta-security microprotocol.

random bit sequence and encrypts the result (partone) and the random bit sequence (part two) withgiven basic security microprotocols.

A meta-security microprotocol can also be configured to useother meta-security microprotocols. For example, we canimplement AltMultiSecurity that applies alternating differ-ent multiple encryption methods to each message bycombining AltSecurity with MultiSecurity.

The concept of meta-security microprotocols can beapplied to increase the survivability of any security propertyfor which using multiple or alternating methods reduces thechance of successful attack. Privacy, authenticity, andmessage integrity, among others, fall in this category. TheSecComm design does not prevent the same idea from beingused for other properties such as replay prevention andnonrepudiation, but the benefit for such properties is morequestionable. Finally, note that the ease with which suchmeta-security microprotocols can be constructed is again adirect result of flexibility provided by Cactus.

Key distribution has security risks analogous to datacommunication, but with greater potential impact since thecompromised key will likely be used for a period of time.Thus, the same redundancy techniques used for datasecurity can also be applied for key distribution security.Multiple key distribution microprotocols can also be used toobtain keys redundantly.

Redundancy and key distribution can mix in differentways. For example, relying on redundant trusted arbitratorsto obtain a key in an external scheme can avoid some of theproblems that occur if a single arbitrator is used andcompromised. Moreover, if the multiple arbitrators arethought to be vulnerable to the same attack, differentalgorithms can be used.

In the above example, the multiple methods are usedcollaboratively to obtain the same key. The scheme can alsobe used to collect different keys, however. The simplestscenario has each key assigned to a separate basic securitymicroprotocol. A more complex configuration wouldallocate multiple keys to the same microprotocol, whichcould use alternate keys on a message by message basis.

4.4 Adaptation Using Adaptation Microprotocols

The action phase of the adaptation process is implementedin SecComm by adaptation microprotocols in concert withadaptation-aware microprotocols, which are basic securitymicroprotocols augmented with the ability to be activatedand deactivated. An adaptation microprotocol handles boththe local and remote aspects of adaptation. Local processinginvolves activating and deactivating the microprotocolsassociated with the old and new configurations on the localhost. Activation is done by binding an event handler to theevents associated with messages; deactivation removes suchbindings. Activation and deactivation are specific to mes-sages flowing in a given direction—either from the applica-tion to the network or vice versa—which allows finer-graincontrol over the process. The remote aspects of adaptationdeal primarily with coordination. An adaptation micropro-tocol communicates with its peer on the remote host andinitiates the exchange of messages that determines when

individual microprotocols are activated or deactivated. Theparticular order depends on the goals of the adaptation.

SimpleAdaptation is an example adaptation microproto-col in which the order of activation and deactivationguarantees that all messages sent using the old configura-tion are delivered before any of the messages sent using thenew configuration. The structure of this microprotocolfollows the description in Section 3 and is shown in Fig. 4.Here, the originating host (master) first disables the oldconfiguration for outgoing messages and informs the otherhost (slave) that an adaptation is occurring. Upon receipt ofan adaptation message from the master, the slave sends anacknowledgment, disables the old configuration, andenables the new configuration. Upon receipt of theacknowledgment message from the slave, the masterdisables the old configuration for incoming messages andenables the new configuration for both incoming andoutgoing messages. As noted above, this algorithm requiresreliable ordered message delivery. A more symmetricgroup-oriented version of the protocol that requires moreextensive coordination is presented in [11].

Other adaptation algorithms can be constructed tohandle different requirements or environments. For exam-ple, message delivery might not be reliable or ordered, sothe adaptation algorithm might have to retransmit controlmessages. Another adaptation algorithm might requiresome form of agreement between both hosts before anyadaptation can take place.

Three vulnerabilities arising from adaptation methodswere identified in Section 3. The first is the problem ofpotentially providing additional information to an adver-sary. For example, since the messages exchanged by theadaptation microprotocol can be distinguished from othermessages, it is possible to determine that an adaptation isoccurring. The contents of the message need not reveal


Fig. 4. SimpleAdaptation microprotocol.

details, however. In the common case where the protocol isused essentially as a signaling mechanism, all references aregeneric (e.g., old configuration, new configuration). In othercases where the protocol is used to distribute the details ofthe new configuration, privacy methods separate from theones being adapted can be used to secure the exchange.

The second issue relates to an adversary potentiallytriggering security-weakening adaptations. However, thesecan be prevented by using authentication and existingmethods for preventing replay attacks. In combination,these techniques prevent an intruder from injecting validadaptation requests into the message stream.

The final issue relates to denial of service (DoS) attacks.Specific types of such attacks—such as where the intrudersends a reply before the trusted host it is impersonating—can be prevented again by authentication and replayprevention. Other types—such as where the intruder isable to prevent messages from arriving at their destination—are more difficult to handle. Note, however, that this is ageneral problem and not one specifically related to ouradaptation approach.

5 EXPERIMENTAL RESULTS

A prototype of SecComm with extensions for redundancyand adaptation has been implemented using the C versionof Cactus on a cluster of 600 Mhz Pentium III PCs runningLinux 2.4.7 connected by a 1 Gb Ethernet. This sectionprovides performance results that illustrate the cost ofredundancy and adaptation in SecComm.

The current prototype implements a subset of themicroprotocols presented in this paper, including privacymicroprotocols based on DES, RSA, IDEA, Blowfish, andXOR, integrity microprotocols based on MD5 and SHA, anauthentication microprotocol based on DSA, a timestampbased replay prevention microprotocol, a nonrepudiationmicroprotocol, two meta-security microprotocols, and oneadaptation protocol. Other microprotocols are currentlybeing added.

We have conducted a number of experiments usingdifferent subsets of microprotocols. Table 1 gives roundtriptimes (RTT) in microseconds and the throughput ofSecComm for processing outgoing messages using differentconfigurations. The RTT test used 100-byte messages, thethroughput test used 1,400-byte messages, and the figureswere computed over 1,000 or more roundtrips. The systemwas lightly loaded during testing and all SecCommconfigurations use IP as the underlying protocol, with nopacket reordering or drops observed during the experi-ments. In these tests, DESPrivacy uses a 56-bit key runningin CFB mode, BlowfishPrivacy uses a 448-bit key running inCFB mode, XORPrivacy uses a 64-bit “key,” and IDEA-Privacy uses a 128-bit key running in CFB mode. TheNonRepudiation microprotocol tested ensures that mes-sages are written to disk before the message is delivered tothe next level. Other nonrepudiation variants that allowdelayed write to disk are naturally less expensive. As abaseline, an average roundtrip time using IP directly on thiscluster is 365 !s. The notation “+ microprotocol name” in thetable indicates that the named microprotocol is added to theconfiguration on the previous row of the table.

The entry for base SecComm reflects times for a skeletonversion of SecComm that does not use any microprotocols;its additional cost indicates the approximate cost of addinga new protocol to the stack. The cost over IP columnindicates the roundtrip time overhead of the configurationcompared to using just IP. Similarly, the cost over basecolumn indicates the overhead of the configuration com-pared to just the base.

The cost over base column provides the most realisticindication of the cost of using redundancy techniquesimplemented as meta-security microprotocols. For Multi-Security, these numbers indicate that the cost is roughlyequal to the sum of the costs associated with thecorresponding microprotocols. For example, the overheadof using MultiSecurity to combine DES and Blowfish is483 !s, which is actually slightly less than the sum of thecosts of DES and Blowfish since the cost of using Cactusmechanisms is amortized over multiple microprotocols. ForAltSecurity, the cost is approximately the same as theaverage cost of the individual microprotocols.

The throughput test measured how much data anapplication could push through SecComm on the sendinghost. This experiment measured SecComm in isolationwithout including either the lower-level protocols (IP andEthernet) or the network transmission. The measurementswere performed by inserting an additional protocolbetween SecComm and IP that simply drops messageswhen a throughput test is performed. The base SecCommentry shows the highest achievable throughput, obtainedwhen there is the least overhead and the message data is notinspected. The use of XOR illustrates the cost of using asingle microprotocol that does minimal computation on


TABLE 1Cost of Redundancy

every byte of the message. The established algorithmsperform significantly more work and reduce the through-put by greater amounts. The high throughput ofNonRepudiation is due to the fact that most of the workof NonRepudiation is done on the receiving host and thistest does not capture the throughput at the receiving host.Combining microprotocols for MultiSecurity results inlower throughput that is consistent with the throughputvalues of the individual microprotocols. For AltSecurity, thethroughput is slightly less than the average throughput ofthe individual microprotocols.

We also measured adaptation time and the costassociated with supporting adaptive changes. The experi-ment involved a simple adaptation that switches betweenDES and XOR when the user gives an adaptation signal.The delay—which reflects the amount of time that elapsedon the initiating host from the time the adaptation istriggered until the time it completed—was measured at442 !s. In these initial tests, adaptation messages were notencrypted or otherwise protected, but we anticipate that theadditional overhead would be comparable to those givenabove. To estimate the overhead associated with supportingadaptation, we performed the RTT and throughput testsusing a configuration where adaptation was possible butnever triggered. In this configuration, the adaptation-awareDES microprotocol was active throughout the lifetime of theconnection. The RTT was measured at 668 !s and thethroughput at 50.28 Mb/s.

The experiments demonstrate two aspects of adaptation.First, the numbers indicate that the adaptation microproto-col has almost no overhead if the adaptation does not occur,i.e., the only cost is associated with the actual adaptationprocess. Second, the experiments in which adaptation doesoccur suggest that the process does not impose significantadditional execution overhead or communication delay.Experiments that include protection of the adaptive processitself are continuing and are expected to allow precisequantification of this delay under more realistic conditions.

6 RELATED WORK

The basic idea of using redundancy to increase thesurvivability of services in networked systems has beenused elsewhere. For example, redundancy in the form offragmentation and scattering has been used for intrusion-tolerant data storage [20], [25], [26]. Replication has alsobeen used for authentication and authorization services [7].Finally, redundancy in communication has been used in[15] in terms of sending each message along multipledisjoint paths from the sender to the receivers. Althoughthis work does not explicitly address intrusions, thealgorithm is designed to tolerate arbitrary failures andensure message integrity and service availability.

In contrast with the above examples that use data orspace redundancy, this paper has focused on using methodredundancy. The combination of these two types ofredundancies can provide an even higher level of surviva-bility for distributed services. To our knowledge, the onlysimilar services that use method redundancy are intrusiondetection systems (IDS) [18]. If an IDS employs redundantdetection modules with different detection algorithms, it

stands a greater chance of detecting more intrusions andgiving fewer false alarms.

The basic idea of adaptation has also been used in anumber of services. Many intrusion-tolerant services adaptwhen intrusions are detected by terminating suspectedconnections or quarantining infected machines [8]. TheITUA project [16] proposes using unpredictable adaptationsto enhance survivability for replicated servers that usegroup communication. The unpredictable adaptations in-clude starting new replicas in unpredictable locations andchanging the replication policy of a replica group. Finally,although it does not specifically address survivability, [10]presents an adaptive version of [15], where the algorithm isswitched to tolerate arbitrary failures only when such afailure is detected. To our knowledge, no other systemperforms algorithmic adaptations as a reaction to suspectedintrusions.

Work specifically related to SecComm can be dividedinto secure communication standards and other configur-able secure communication services. Some degree ofcustomization is supported in several recent standards.For example, IPsec allows a choice of security options,including message integrity and privacy, using a selectedcryptographic method [38]. It is also possible to applymultiple security methods to a given communicationconnection. TLS (Transport Level Security) [21] offers achoice of privacy (e.g., DES or RC4), integrity (e.g., keyedSHA or MD5), and optional message compression, but doesnot directly support the use of redundant methods. None ofthese protocols support runtime adaptation or provideflexible facilities to implement redundant methods as donein SecComm.

Configurable secure communication services have beenimplemented using various configuration frameworks,including the x-kernel [46], Ensemble [49], and the frame-work described in [45]. All these models are similar in thesense that a communication subsystem is constructed as adirected graph of protocol objects. Although this allowsarbitrary combinations of security components, the struc-ture is limiting compared to Cactus and would make itdifficult to implement some of our more dynamic redun-dancy techniques. However, Antigone [41] has adopted anapproach similar to Cactus in which microprotocols andcomposite protocols are used to implement secure groupcommunication with customizable policies, including re-keying and message security. To our knowledge, none ofthese projects focus on using redundancy and adaptation toenhance survivability.

7 CONCLUSIONS

The use of redundancy and adaptation can increase thesurvivability of services for networked systems by makingthem tolerant to intrusions and other attacks and by allowingthem to change execution behavior to increase unpredict-ability or to react to attacks. This paper has discussed the useof these two techniques and presented a concrete examplethat involves augmenting the SecComm secure communica-tion servicewith redundant securitymethodsandsupport foradaptation. This service has been constructed using Cactus, asystem that provides the type of flexible interaction and


configuration mechanisms needed to build services of thistype. Experimental results from a prototype implementationsuggest that the performance of the service is proportional tothe cost of its constituent methods.

Futureworkwill include further experimentationwith thesurvivable SecComm prototype and the development of afamily of adaptation protocols with different executioncharacteristics. A special focus will be on designing adapta-tion protocols that are scalable and that minimize thesynchronization required to support coordinated changesin execution behavior. We intend to experiment with theseprotocols for such uses as building location-specific mobileservices, as well as within the context of survivability.

ACKNOWLEDGMENTS

Gary Wong implemented the Cactus framework used forthe SecComm implementation. He also provided excellentcomments and suggestions that improved the paper. TrevorJim provided useful comments on an earlier version of thepaper. This work was supported in part by the US DefenseAdvanced Research Projects Agency under grant N66001-97-C-8518 and the US National Science Foundation undergrant ANI-9979438.

REFERENCES

[1] M. Abadi and R. Needham, “Prudent Engineering Practice forCryptographic Protocols,” IEEE Trans. Software Eng., vol. 22, no. 1,pp. 6-15, Jan. 1996.

[2] W. Aiello, M. Bellare, G. Di Crescenzo, and R. Venkatesan,“Security Amplification by Composition: The Case of Double-Iterated, Ideal Ciphers,” Proc. Advances in Cryptology: Crypto ’98,H. Krawczyk, ed., 1998.

[3] R. Anderson and R. Needham, “Robustness Principles for PublicKey Protocols,” Proc. Crypto ’95, pp. 236-247, 1995.

[4] M. Barbacci, “Survivability in the Age of Vulnerable Systems,”Computer, vol. 29, no. 11, p. 8, Nov. 1996.

[5] P. Bell and K. Jabbour, “Review of Point-to-Point NetworkRouting Algorithms,” IEEE Comm. Magazine, vol. 24, no. 1,pp. 34-38, 1986.

[6] N. Bhatti, M. Hiltunen, R. Schlichting, and W. Chiu, “Coyote: ASystem for Constructing Fine-Grain Configurable CommunicationServices,” ACM Trans. Computer Systems, vol. 16, no. 4, pp. 321-366, Nov. 1998.

[7] L. Blain and Y. Deswarte, “Intrusion-Tolerant Security Server forDelta-4,” Proc. ESPRIT ’90 Conf., pp. 355-370, Nov. 1990.

[8] P. Brutch, T. Brutch, and U. Pooch, “Electronic Quarantine: AnAutomated Intruder Response Tool,” Proc. Information Surviva-bility Workshop 1998, pp. 23-27, Oct. 1998.

[9] K. Campbell and M. Wiener, “DES Is Not a Group,” Advances inCryptology—CRYPTO ’92, E. Brickell, ed., pp. 512-520, Aug. 1992.

[10] I. Chang, M. Hiltunen, and R. Schlichting, “Affordable FaultTolerance through Adaptation.,” Parallel and Distributed Processing,J. Rolin, ed., pp. 585-603, Springer, Apr. 1998.

[11] W.-K. Chen, M. Hiltunen, and R. Schlichting, “ConstructingAdaptive Software in Distributed Systems,” Proc. 21st Int’l Conf.Distributed Computing Systems, pp. 635-643, Apr. 2001.

[12] W. Cheswick and S. Bellovin, Firewalls and Internet Security.Reading, Mass.: Addison-Wesley, 1994.

[13] M. Choi and C. Krishna, “An Adaptive Algorithm to EnsureDifferential Service in a Token-Ring Network,” IEEE Trans.Computers, vol. 39, no. 1, pp. 19-33, Jan. 1990.

[14] F. Cohen et al., “Deception Toolkit,” http:///www.all.net/dtk/,1999.

[15] F. Cristian, H. Aghili, R. Strong, and D. Dolev, “Atomic Broadcast:From Simple Message Diffusion to Byzantine Agreement,” Proc.15th Symp. Fault-Tolerant Computing, pp. 200-206, June 1985.

[16] M. Cukier, J. Lyons, P. Pandey, H. Ramasamy, W. Sanders, P. Pal,F. Webber, R. Schantz, J. Loyall, R. Watro, M. Atighetchi, and J.Gossett, “Intrusion Tolerance Approaches in ITUA,” FastAbstractin Supplement of the 2001 Int’l Conf. Dependable Systems andNetworks, pp. 64-65, July 2001.

[17] J. Daemen and V. Rijmen, “The Block Cipher Rijndael,” Smart CardResearch and Applications, J.-J. Quisquater and B. Schneier, eds.,pp. 288-296, Springer-Verlag, 2000.

[18] D. Denning, “An Intrusion-Detection Model,” IEEE Trans. SoftwareEng., vol. 13, no. 2, pp. 222-232, Feb. 1987.

[19] Y. Desmedt and Y. Frankel, “Threshold Cryptosystems,” Proc.Advances in Cryptology Crypto ’89, G. Brassard, ed., pp. 307-315,1990.

[20] Y. Deswarte, J.-C. Fabre, J.-M. Fray, D. Powell, and P.-G. Ranea,“Saturne: A Distributed Computing System which ToleratesFaults and Intrusions,” Proc. Workshop Future Trends of DistributedComputing Systems, pp. 329-338, Sept. 1990.

[21] T. Dierks and C. Allen, “The TLS Protocol, version 1.0,” RFC(Standards Track) 2246, Jan. 1999.

[22] W. Diffie and M. Hellman, “New Directions in Cryptography,”IEEE Trans. Information Theory, vol. 22, no. 6, pp. 644-654, 1976.

[23] J.-C. Fabre, Y. Deswarte, and B. Randell, “Designing Secure andReliable Applications Using Fragmentation-Redundancy-Scatter-ing: An Object-Oriented Approach,” Proc. First European Depend-able Computing Conf., pp. 21-38, Oct. 1994.

[24] A. Fox, S. Gribble, E. Brewer, and E. Amir, “Adapting to Networkand Client Variation via On-Demand, Dynamic Distillation,” Proc.Seventh Architectural Support for Programming Languages andOperating Systems (ASPLOS) Conf., Oct. 1996.

[25] J. Fraga and D. Powell, “A Fault and Intrusion-Tolerant FileSystem,” Proc. IFIP Third Int’l Conf. Computer Security, pp. 203-218,1985.

[26] J. Fray, Y. Deswarte, and D. Powell, “Intrusion-Tolerance UsingFine-Grain Fragmentation-Scattering,” Proc. 1998 IEEE Symp.Security and Privacy, pp. 194-201, Apr. 1986.

[27] J. Goldberg, I. Greenberg, and T. Lawrence, “Adaptive FaultTolerance,” Proc. IEEE Workshop Advances in Parallel and DistributedSystems, pp. 127-132, Oct. 1993.

[28] P. Gutmann, “Cryptlib,” Dept. of Computer Science, Univ. ofAuckland, 1998.

[29] M. Hiltunen, “Configuration Management for Highly-Customiz-able Software,” IEE Proc.: Software, vol. 145, no. 5, pp. 180-188, Oct.1998.

[30] M. Hiltunen and R. Schlichting, “Constructing a ConfigurableGroup RPC Service,” Proc. 15th Int’l Conf. Distributed ComputingSystems, pp. 288-295, May 1995.

[31] M. Hiltunen and R. Schlichting, “Adaptive Distributed and Fault-Tolerant Systems,” Computer Systems Science and Eng., vol. 11, no. 5,pp. 125-133, Sept. 1996.

[32] M. Hiltunen and R. Schlichting, “A Configurable MembershipService,” IEEE Trans. Computers, vol. 47, no. 5, pp. 573-586, May1998.

[33] M. Hiltunen, R. Schlichting, X. Han, M. Cardozo, and R. Das,“Real-Time Dependable Channels: Customizing QoS Attributesfor Distributed Systems,” IEEE Trans. Parallel and DistributedSystems, vol. 10, no. 6, pp. 600-612, June 1999.

[34] N. Hutchinson and L. Peterson, “The x-Kernel: An Architecturefor Implementing Network Protocols,” IEEE Trans. Software Eng.,vol. 17, no. 1, pp. 64-76, Jan. 1991.

[35] V. Jacobson, “Congestion Avoidance and Control,” Proc. SIG-COMM ’88 Symp., pp. 314-332, Aug. 1988.

[36] S. Jha and J. Wing, “Survivability Analysis on NetworkedSystems,” Proc. 23rd Int’l Conf. Software Eng. (ICSE 2001), pp. 307-317, 2001.

[37] E. Jonsson and T. Olovsson, “A Quantitative Model of the SecurityIntrusion Process Based on Attacker Behavior,” IEEE Trans.Software Eng., vol. 23, no. 4, pp. 235-245, Apr. 1997.

[38] S. Kent and R. Atkinson, “Security Architecture for the InternetProtocol,” RFC (Standards Track) 2401, Nov. 1998.

[39] H. Kiliccote and P. Khosla, “Borg: A Scalable and SecureDistributed Information System,” Proc. Information SurvivabilityWorkshop 1998, pp. 101-105, Oct. 1998.

[40] Dependability: Basic Concepts and Terminology, J.C. Laprie, ed.Vienna: Springer-Verlag, 1992.

[41] P. McDaniel, A. Prakash, and P. Honeyman, “Antigone: A FlexibleFramework for Secure Group Communication,” Proc. EighthUSENIX Security Symp., pp. 99-114, Aug. 1999.


[42] R. Merkle and M. Hellman, “On the Security of MultipleEncryption,” Comm. ACM, vol. 24, no. 7, pp. 465-467, July 1981.

[43] B. Neuman and T. Ts’o, “Kerberos: An Authentication Service forComputer Networks,” IEEE Comm. Magazine, vol. 32, no. 9, pp. 33-38, Sept. 1994.

[44] P. Neumann and P. Porras, “Experience with EMERALD to Date,”Proc. First USENIX Workshop Intrusion Detection and NetworkMonitoring, Apr. 1999.

[45] P. Nikander and A. Karila, “A Java Beans Component Architec-ture for Cryptographic Protocols,” Proc. Seventh USENIX SecuritySymp., Jan. 1998.

[46] H. Orman, S. O’Malley, R. Schroeppel, and D. Schwartz, “Pavingthe Road to Network Security or the Value of Small Cobble-stones,” Proc. 1994 Internet Soc. Symp. Network and DistributedSystem Security, Feb. 1994.

[47] R. v Renesse, K. Birman, M. Hayden, A. Vaysburd, and D. Karr,“Building Adaptive Systems Using Ensemble,” Software Practiceand Experience, vol. 28, no. 9, pp. 963-979, July 1998.

[48] R. Rivest, “The MD5 Message-Digest Algorithm,” RFC 1321, Apr.1992.

[49] O. Rodeh, K. Birman, M. Hayden, Z. Xiao, and D. Dolev, “TheArchitecture and Performance of Security Protocols in theEnsemble Group Communication System,” Technical ReportTR98-1703, Dept. of Computer Science, Cornell Univ., Dec. 1998.

[50] W. Sanders, M. Cukier, F. Webber, P. Pal, and R. Watro,“Probabilistic Validation of Intrusion Tolerance,” FastAbstract inSupplement of the 2002 Int’l Conf. Dependable Systems and Networks,pp. B 78-79, June 2002.

[51] Trust in Cyberspace, F. Schneider, ed. Washington, D.C: Committeeon Information Systems Trustworthiness, Nat’l Research Council,Nat’l Academy Press, Sept. 1998.

[52] J. Steiner, C. Neuman, and J. Schiller, “Kerberos: An Authentica-tion Service for Open Network Systems,” USENIX Conf. Proc.,pp. 191-202, Winter 1988.

[53] J. Voas, G. McGraw, and A. Ghosh, “Reducing Uncertainty aboutSurvivability,” Proc. 1997 Information Survivability Workshop, Feb.1997.

Matti A. Hiltunen received the MS degree incomputer science from the University of Helsinkiin 1989 and the PhD degree in computer sciencefrom the University of Arizona in 1996. He iscurrently a senior technical staff member in theDependable Distributed Computing Departmentat AT&T Labs-Research in Florham Park, NewJersey. He is a member of the ACM and theIEEE Computer Society. His current researchinterests include dependability, performance,

timeliness, and security in distributed systems and networks.

Richard D. Schlichting received the BA degreein mathematics and history from the College ofWilliam and Mary and the MS and PhD degreesin computer science from Cornell University. Heis currently head of the Dependable DistributedComputing Department at AT&T Labs-Researchin Florham Park, New Jersey. He was on thefaculty at the University of Arizona from 1982-2000 and spent sabbaticals in Japan in 1990 atthe Tokyo Institute of Technology and in 1996-

1997 at the Hitachi Central Research Lab. He is an ACM fellow, an IEEEfellow, and a member of IFIP Working Group 10.4 on DependableComputing and Fault Tolerance. He is on the editorial board of the IEEETransactions on Parallel and Distributed Systems and has been active inthe IEEE Computer Society Technical Committee on Fault-TolerantComputing, serving as chair from 1998-1999. His research interestsinclude distributed systems, highly dependable computing, andnetworks.

Carlos A. Ugarte received the BS degree incomputer science from the Georgia Institute ofTechnology and the MS degree in computerscience from the University of Arizona. He iscurrently working toward the PhD degree incomputer science at the University of Arizona.His research interests are in distributed systemsand networks. He is a student member of theIEEE.

. For more information on this or any computing topic, please visitour Digital Library at http://computer.org/publications/dlib.


Documents

IEEE TRANSAC TIONS ON COMPUTER S, VOL. 52, NO. 2, …dtipper/3957/Paper16.pdf · 2 U SING R EDUNDANCY AND A DAPTATION 2.1 Overvi ew A survivable service designed to operate across