17
Un-identical federate replication structure for improving performance of HLA-based simulations Zengxiang Li a,, Wentong Cai b , Stephen John Turner b a Institute of High Performance Computing, 1 Fusionopolis Way, Singapore 138632, Singapore b Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore article info Article history: Received 22 October 2013 Received in revised form 19 June 2014 Accepted 24 June 2014 Keywords: Parallel and distributed simulation High Level Architecture Time synchronization Performance Replication Software diversity abstract The execution of an HLA-based simulation (federation) is usually time consuming, as it usually involves a number of compute-intensive simulation components (federates). To improve simulation performance, an un-identical federate replication structure is proposed in this article. For the same federate, multiple replicas are developed in software diversity manner by employing different synchronization approaches. The simulation performance is improved by always choosing the fastest replica to represent the federate in the federa- tion. The replication structure is implemented in a transparent manner without increasing federation scale. Message exchange and time management mechanisms are developed to handle the different behaviors of those un-identical replicas. Correctness of the replication structure is proved in theory and verified by experiments. The experimental results have also shown that the un-identical federate replication structure achieves significant perfor- mance enhancement with good scalability and marginal overhead. Ó 2014 Elsevier B.V. All rights reserved. 1. Introduction The High Level Architecture (HLA), IEEE standard [1] provides a framework to build a parallel and distributed simulation (federation) by re-using and inter-operating a group of simulation components (federates). HLA-based simulations, i.e., par- allel and distributed simulations developed following HLA standard, are usually developed to study the problems of a com- plex system, e.g., supply chains, computer networks, and ecological systems. Their executions are usually time consuming, as they involve a number of compute-intensive federates which simulate corresponding subsystems in desired detail and fidel- ity. In addition, the simulation is usually executed in many times with various parameter inputs for the purpose of obtaining statistical simulation results. Moreover, real time responses is required in most symbiotic simulations [2] which are embed- ded in the real system. Therefore, accelerating simulation executions is very important to make simulation economical and attractive to industry and research communities. In order to improve simulation performance, an un-identical federate replication structure is proposed in this article, inspired by the concepts of active replication and software diversity [3]. Using active replication, the computation is per- formed at each replica independently and concurrently. Using software diversity, also known as N-Version Programming [4], multiple functionally equivalent programs are developed independently following the same initial specifications. For HLA-based simulations, we propose to develop un-identical replicas for the same federate using either a conservative http://dx.doi.org/10.1016/j.simpat.2014.06.016 1569-190X/Ó 2014 Elsevier B.V. All rights reserved. Corresponding author. E-mail addresses: [email protected] (Z. Li), [email protected] (W. Cai), [email protected] (S.J. Turner). Simulation Modelling Practice and Theory 48 (2014) 112–128 Contents lists available at ScienceDirect Simulation Modelling Practice and Theory journal homepage: www.elsevier.com/locate/simpat

Un-identical federate replication structure for improving performance of HLA-based simulations

Embed Size (px)

Citation preview

Page 1: Un-identical federate replication structure for improving performance of HLA-based simulations

Simulation Modelling Practice and Theory 48 (2014) 112–128

Contents lists available at ScienceDirect

Simulation Modelling Practice and Theory

journal homepage: www.elsevier .com/locate /s impat

Un-identical federate replication structure for improvingperformance of HLA-based simulations

http://dx.doi.org/10.1016/j.simpat.2014.06.0161569-190X/� 2014 Elsevier B.V. All rights reserved.

⇑ Corresponding author.E-mail addresses: [email protected] (Z. Li), [email protected] (W. Cai), [email protected] (S.J. Turner).

Zengxiang Li a,⇑, Wentong Cai b, Stephen John Turner b

a Institute of High Performance Computing, 1 Fusionopolis Way, Singapore 138632, Singaporeb Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore

a r t i c l e i n f o

Article history:Received 22 October 2013Received in revised form 19 June 2014Accepted 24 June 2014

Keywords:Parallel and distributed simulationHigh Level ArchitectureTime synchronizationPerformanceReplicationSoftware diversity

a b s t r a c t

The execution of an HLA-based simulation (federation) is usually time consuming, as itusually involves a number of compute-intensive simulation components (federates). Toimprove simulation performance, an un-identical federate replication structure is proposedin this article. For the same federate, multiple replicas are developed in software diversitymanner by employing different synchronization approaches. The simulation performanceis improved by always choosing the fastest replica to represent the federate in the federa-tion. The replication structure is implemented in a transparent manner without increasingfederation scale. Message exchange and time management mechanisms are developed tohandle the different behaviors of those un-identical replicas. Correctness of the replicationstructure is proved in theory and verified by experiments. The experimental results havealso shown that the un-identical federate replication structure achieves significant perfor-mance enhancement with good scalability and marginal overhead.

� 2014 Elsevier B.V. All rights reserved.

1. Introduction

The High Level Architecture (HLA), IEEE standard [1] provides a framework to build a parallel and distributed simulation(federation) by re-using and inter-operating a group of simulation components (federates). HLA-based simulations, i.e., par-allel and distributed simulations developed following HLA standard, are usually developed to study the problems of a com-plex system, e.g., supply chains, computer networks, and ecological systems. Their executions are usually time consuming, asthey involve a number of compute-intensive federates which simulate corresponding subsystems in desired detail and fidel-ity. In addition, the simulation is usually executed in many times with various parameter inputs for the purpose of obtainingstatistical simulation results. Moreover, real time responses is required in most symbiotic simulations [2] which are embed-ded in the real system. Therefore, accelerating simulation executions is very important to make simulation economical andattractive to industry and research communities.

In order to improve simulation performance, an un-identical federate replication structure is proposed in this article,inspired by the concepts of active replication and software diversity [3]. Using active replication, the computation is per-formed at each replica independently and concurrently. Using software diversity, also known as N-Version Programming[4], multiple functionally equivalent programs are developed independently following the same initial specifications. ForHLA-based simulations, we propose to develop un-identical replicas for the same federate using either a conservative

Page 2: Un-identical federate replication structure for improving performance of HLA-based simulations

Z. Li et al. / Simulation Modelling Practice and Theory 48 (2014) 112–128 113

(e.g., CMB protocol [5,6]) or an optimistic (e.g., Time Warp protocol [7]) synchronization approach. Therefore, the replicas arelikely to exhibit different runtime performances during the simulation execution. Consequently, simulation performance canbe improved by always choosing the fastest replica to represent the federate in the federation.

Using active replication, the execution cost (e.g., the number of computers) may increase dramatically. Fortunately, com-puting resources can be obtained more and more easily in modern computing environments, e.g., multi-core or many-coreprocesses and on-demand virtual machine instances on Cloud. Hence, it is meaningful to improve simulation performancerelying on active replication in spite of the increased resource consumption. Furthermore, active replication can also be usedfor fault-tolerance purpose [8].

There are two reasons to design federate replicas in software diversity manner by employing different synchronizationapproaches. Firstly, simulation developers usually hesitate to make the choice on synchronization approaches, as it is difficultto predict their performances for simulation executions with different parameters and inputs [9]. The un-identical federatereplication structure relieves simulation developers from the burden by taking advantages of both conservative and optimis-tic synchronization approaches in the same simulation execution. Secondly, conservative and optimistic synchronization gotowards two extremes: the former preserves causality constraint by blocking fast federates; while the latter allows federatesto progress freely but may rollback execution when a causality error occurs. Hence, their performance difference is generallysignificant, and thus, the replication structure is expected to achieve significant performance improvement. Most real-worldsimulations employ conservative synchronization due to its simplicity. Additional costs are introduced to develop an OPTreplica for the same federate. Fortunately, some approaches and frameworks have been proposed to relieve simulation devel-opers from the burden of handling complex optimistic synchronization details. Wang et al. [10] have proposed a rollback con-troller using a middleware approach to handle the complex rollback procedure on behalf of the simulation model. Santoro andQuaglia [11] have implemented a MAgic State Manager (MASM) to handle state management issues for optimistic synchro-nization in a way completely transparent to the federate itself. They have further designed and implemented a Time Manage-ment Converter (TiMaC) to perform mapping of the conservative HLA synchronization interface onto the optimistic one [12].Such a mapping allows transparent optimistic execution for simulation models originally designed using conservative syn-chronization. Besides state-based optimistic execution, lsik [13] and Rensselaers Optimistic Simulation System (ROSS)[14] support reverse computation-based optimistic execution. Furthermore, LaPre et al. [15] have proposed a tool capableof automatic emission reverse event handlers, with execution speed comparable to the hand-written code.

Traditionally, replicas of a federate are treated as individual federates. As a result, federation scale, communication trafficand time synchronization overhead increase accordingly. It is also difficult to keep consistent among replicas of the samefederate. In contrast, the un-identical federate replication structure is designed in a transparent manner. Replicas of the samefederates are connected to the federation through a middleware called replication manager, which masks the presence ofmultiple replicas of the federate without increasing federation scale. Replicas employing different synchronizationapproaches have different behaviors on message exchange and time advancement. Message exchange and time managementmechanisms are developed to handle the different behaviors and keep replicas eventually consistent, i.e., processing thesame events in TS order. Consequently, the simulation execution with our replication structure will produce the same resultsas normal simulation execution using either conservative or optimistic synchronization.

As a follow-up work of [16], this article (i) describes the principles of the un-identical federate replication structure usinga middleware approach; (ii) revises the implementation of message exchange and time management mechanisms; (iii)proves that the same simulation results are produced with and without replication structure; and (iv) reports more exper-imental results regarding to performance improvement, scalability and overhead.

This article is structured as follow: Section 2 briefly introduces HLA-based simulations. Section 3 and 4 respectivelyprovides the implementation details and correctness proof of the un-identical federate replication structure. Section 5introduces the P-hold simulation model and reports the experiment results. The related work on improving simulationperformance is discussed in Section 6. Section 7 concludes the article and outlines the future work.

2. HLA-based simulations

In an HLA-based simulation, federates participate in a federation execution through underlying Runtime Infrastructure(RTI), as shown in Fig. 1. Several groups of management services are defined in the HLA standard. They are usually

HLA Interface

Run Time Infrastructure (RTI)RTI-Amb

RTI ServiceCallback

FederateFederate

RTI-Amb

Object Management ModuleTime Management Module

Fig. 1. Overview of an HLA-based simulation.

Page 3: Un-identical federate replication structure for improving performance of HLA-based simulations

114 Z. Li et al. / Simulation Modelling Practice and Theory 48 (2014) 112–128

implemented by RTI through corresponding functional modules. The communication between federates and RTI follows theinterfaces defined in HLA standard. Federates invoke services provided by the RTI through RTI ambassador (RTI-Amb), whilethe RTI-Amb delivers callbacks to federates. The RTI-Amb works as a communication agent of the federate, taking care of itsneed to exchange messages with other federates and to request time advancement along federation’s time axis.

Object management services are responsible for exchanging messages. For instance, the sending federate sends a messagethrough invoking Send Interaction RTI service; and the message is delivered to subscribed federates in the form of ReceiveInteraction callback. Each message is attached with a time stamp (TS), denoted as Receive Virtual Time (RVT), which indicateswhen the message must be processed by the receiving federate. The message is referred to as an external event; while theevent generated by the federate itself is referred to as an interval event.

Time management services are responsible for synchronizing the time advancement of each joined federate [17], and thusensure that internal or external events are processed in TS order. Two values, i.e., simulation time and logical time, are used todescribe federate execution progress. The simulation time of a federate is equal to the TS of the event which is being pro-cessed currently. It is maintained and advanced by the federate independently. In contrast, the logical time of a federateis determined by all federates together. It is maintained and advanced by the RTI after federation wide synchronization.The updated logical time, i.e., the Last Granted Time (LGT) issued by the RTI, is delivered to the federate through a ‘‘timeAd-vanceGrant’’ callback, which completes the time advancement request initiated by the federate. In addition, the lookahead(LA) defined in [18], indicates that messages will affect the receiving federate after LA time period. That is, RVT of a messageis not smaller than simulation time of the sending federate plus LA. The RTI prevents advancing federate logical time beyondits Greatest Available Logical Time (GALT), which expresses the lower bound of RVT of any future message destined for thefederate. Hence, the RTI will never deliver federate messages with RVT smaller than its logical time in the future. In themeantime, the federate will never generate a message with RVT smaller than LGT þ LA.

The HLA standard supports both conservative and optimistic synchronization. Conservative synchronization preservescausality constraint compulsorily. the RTI never delivers a conservative (CON) federate any message with RVT greater thanits logical time; and the CON federate is not allowed to process events with TS greater than its logical time. Therefore, sim-ulation time of a CON federate is never greater than its logical time. Next Messages Request (NMR) service is usually used by aCON federate for time advancement request. The logical time is advanced to the TS of earliest internal or external event, onlyif the TS is smaller than GALT of the federate.

In contrast, optimistic synchronization allows an optimistic (OPT) federate to process events without any restriction forbetter exploitation of parallelism. Flush Queue Request (FQR) service is usually used by an OPT federate for time advancementrequest. It forces the RTI to deliver all buffered messages to the OPT federate. On receiving a straggler message whose RVT issmaller than simulation time, the federate needs to rollback its execution. Besides restoring execution state, the OPT federatemay also invoke Retract services to un-send incorrect messages which were sent previously. If the messages have alreadybeen delivered to the receiving federates, the RTI informs those federates to remove the effect of the incorrect messagesthrough Request Retraction (RR) callbacks. This may cause secondary execution rollbacks in the receiving federates. TheFQR time advancement request can always be granted immediately, without waiting for other federates to advance. Thegranted logical time is equal to the smaller one between GALT of the federate and the TS of earliest internal or externalevents. Since the RTI never delivers messages with RVT smaller than its logical time to the federate, the optimistic federatenever rollbacks execution before logical time. Therefore, logical time can be used for fossil collection on saved states. In sum-mary, the simulation time of an OPT federate can be greater than its logical time and may decrease during an executionrollback.

3. Un-identical federate replication

3.1. Overview

Traditionally, RTI-Amb is provided as a library of each federate, as shown in Fig. 1. Because of the tight-coupling, the fed-erate and its RTI-Amb have to be replicated as one unit. Therefore, each replica is treated as an individual federate in thefederation. Hence, the federation scale increases significantly. As a result, the communication traffic among federates andthe overhead of federation wide time synchronization increase dramatically. What is worse, for each object simulated bya federate, there are multiple active instances in the replicas. It is very difficult to keep these instances consistent and toensure simulation execution correct.

In contrast, our un-identical federate replication structure, as shown in Fig. 2, is implemented in a transparent mannerusing a middleware approach. A middleware called replication manager is inserted to break the tight-coupling between fed-erate and RTI-Amb. Replicas of the same federates are connected the replication manager, and finally are connected to thefederation through a RTI-Amb. The replication manager provides HLA interfaces to federate replicas, processes and relaysmessages between the federate replicas and the RTI. It is able to support concurrent executions of the replicas while keepingthem consistent. It is also able to mask the presence of the multiple replicas of the federate without increasing federationscale.

Although replicas of the same federate simulate multiple instances of an object, the replication manager only sends outone object instance registration request to the RTI. So, there is only one instance of this object for the federation. The

Page 4: Un-identical federate replication structure for improving performance of HLA-based simulations

Federate

ReplicationManager

CONReplica

OPTReplica

Run Time Infrastructure (RTI)

RTI-Amb

HLA Interface

HLA Interface

Federate

ReplicationManager

CONReplica

RTI-Amb

HLA Interface

HLA Interface

FederateOPT

Replica

RTI-Amb

HLA Interface

Fig. 2. Overview of un-identical federate replication structure.

Z. Li et al. / Simulation Modelling Practice and Theory 48 (2014) 112–128 115

replication structure is transparent to the federate itself. Its replicas do not know the existence of replication manager andother replicas of the same federate. The replication of a federate is also transparent to remote federates. They have no ideaabout how many replicas are executed for the federate and which replica of the federate they are actually communicatingwith. Since the replication manager pretends the federate to connect to the RTI, the replication structure is also transparentto the RTI.

Similarly, Barrier [19] and No-Barrier [20] replication structures have been developed in transparent manner using mid-dleware approach. Barrier replication degrades simulation performance significantly, as the execution is held back by theslowest replica for each RTI service. No-Barrier replication improves simulation performance by always choosing the fastestreplica to represent the federate in the federation. Both Barrier and No-Barrier replication structures require replicas of thesame federate to be Piecewise Equivalent. That is, replicas with the same initial state and the same incoming callbacks mustprocess the same events, update to the same federate state, and invoke the same RTI services.

However, CON and OPT replicas of the same federate does not satisfy the Piecewise Equivalent restriction. They may invokedifferent RTI services for time advancement requests. CON replicas process events conservatively, so they modify federatestate and generate messages correctly. In contrast, OPT replicas modify federate state incorrectly and cancel the incorrectmodification in the future. They may also generate incorrect messages and retract them in the future. Furthermore, messages(callbacks) are delivered to CON and OPT replicas in different manners. Correct messages are delivered to CON replicas in TSorder. In contrast, correct messages, incorrect messages, and retractions may be delivered to OPT replicas out of TS order.

Therefore, No-Barrier replication can improve simulation performance with the restriction that replicas of the samefederate must employ the same synchronization approach. The replicas may have different performance due to differentcomputer power and network latency, or different synchronization optimization methods, e.g., infrequent state saving inoptimistic synchronization [20]. In contrast, our un-identical federate replication structure, as shown in Fig. 2, allows repli-cas of the same federate to employ either a conservative or an optimistic synchronization approach. They are denoted as CONor OPT replica respectively. As the performance difference between CON and OPT replicas is usually more significant, ourreplication structure is expected to achieve more significant performance improvement.

In the un-identical replication structure, CON and OPT replicas are required to be Committed Equivalent, that is, replicaswith the same initial state should be updated to the same federate state and generate the same messages while processingthe same committed events (i.e., the events with TS < logical time). This is a weaker restriction compared to Piecewise Equiv-alent restriction. The replicas of the same federate are eventually consistent, if they process the same committed events in TSorder. To achieve this, the replication manager should perform the following tasks:

� Provide a message exchange mechanism to support exchanging messages for federate replicas. One and only one instanceof each correct message generated by replicas should be delivered to all replicas of the receiving federates. The incorrectmessages and their retractions should be delivered properly to ensure that the effects of incorrect messages are removed.� Provide a time management mechanism to perform federation wide time synchronization and to coordinate time advance-

ments of federate replicas. It grants the time advancement requests from replicas respectively and delivers eligiblecallbacks to replicas according to their corresponding logical time and synchronization approaches.

As shown in Fig. 2, the replication manager deployed with message exchange and time management mechanisms is atransparent middleware between federate and RTI. Generally, it can be implemented as a wrapper of the RTI. However,for efficiency consideration, the replication manager can be embedded within the RTI-Amb as a replication module. In thisway, the communication between the replication manager and the RTI-Amb can be avoided. Furthermore, the replicationmodule can cooperate with other modules in the RTI-Amb seamlessly, sharing some functions and data structures. This ispurely an implementation choice. Since all function calls and data access, as shown in Figs. 4 and 8, could be replaced bythe standard HLA interfaces, there is no reason why the replication manager could not be implemented as wrapper of the RTI.

Page 5: Un-identical federate replication structure for improving performance of HLA-based simulations

116 Z. Li et al. / Simulation Modelling Practice and Theory 48 (2014) 112–128

For simplicity, the current version of the replication module supports only two replicas, i.e., a CON and an OPT replica, foreach federate. Some additional variables and flags are defined in Table 1. Since CON and OPT replicas have different runtimeperformance, they may have different requested time (ConReqTime and OptReqTime) and different logical time (ConLGT andOptLGT). LGT, the logical time of the federate, equals the bigger value between ConLGT and OptLGT, i.e., LGT = Max(ConLGT,OptLGT). A replica is in time advancing state if its corresponding flag, i.e., ConInTAS or OptInTAS, is true. Otherwise, it is intime granted state.

3.2. Message exchange mechanism

The message exchange mechanism is shown in Fig. 3. The solid lines represent the trace of outgoing messages sent fromthe federate, while the dashed lines show the flow of incoming messages destined to the federate. The message exchangemechanism requires the replication module to cooperate with the existing Object Management (OM) module in the originalRTI-Amb. The OM module is responsible for exchanging messages and retractions with other federates; whereas, the repli-cation module is responsible for distributing incoming messages to replicas of the federate and filtering redundant outgoingmessages generated by the replicas.

3.2.1. Mechanism for handling incoming messagesAccording to the HLA standard [1], different callbacks are eligible to be delivered to CON and OPT federates. Hence, two

individual TSO queues are deployed: one for CON replica (ConTSOQueue) and the other for OPT replica (OptTSOQueue).When a message (either a correct or an incorrect message) is received by the OM module (step R1), it is transferred tothe replication module. Then, a callback is generated and buffered in both TSO queues (step R2). The callbacks buffered inTSO queues are sorted by their RVTs. ConTSOHead (or OptTSOHead) refers to the minimum RVT of callbacks in the ConTSO-Queue (or OptTSOQueue). Finally the buffered callbacks are delivered to the corresponding replicas and removed from theTSO queues (step R3). Only the callbacks in the ConTSOQueue with RVTs 6 ConLGT are delivered to CON replica; whereas, allcallbacks in the OptTSOQueue are delivered to OPT replica.

When a retraction is received, it is also transferred to the replication module. In the case that the corresponding incorrectmessage is kept in the TSO queues, it will be annihilated with the retraction. Otherwise, a RR callback will be generated andfinally delivered to the corresponding replica to remove the effect of the incorrect message. As will be proved in Appendix,the callbacks generated from incorrect messages and their retractions are annihilated in the ConTSOQueue instead of beingdelivered to CON replica. In contrast, they might be delivered to OPT replica and the effects of the incorrect messages areremoved using state saving and rollback mechanisms.

In addition, two individual Non-message Callback Queues (not shown in the figure) are deployed: one for CON replica tobuffer the TAG callbacks, and the other for OPT replica to buffer the TAG and RR callbacks.

3.2.2. Mechanism for handling outgoing messagesOutgoing messages are generated by both CON and OPT replicas (step S1). To filter the redundant messages, two message

filters, ConMsgFilter and OptMsgFilter, are deployed for CON replica and OPT replica respectively. The messages buffered inmessage filters are sorted by their RVTs. ConMsgHead (or OptMsgHead) denotes the minimal RVT of the messages in the Con-MsgFilter (or OptMsgFilter). When a message is generated by a replica, the replication module searches the same message(i.e., the one with the same RVT and message content) in the message filter of the other replica (step S2). If the same messageis found, the message must have been generated by the other replica. So, the message is discarded and the same message isalso removed from the message filter of the other replica. Otherwise (that is, if the same message is not found), the messageis forwarded to the OM module and buffered in the message filer of the replica (step 3). In this way, the first instance of eachcorrect message is sent to the receiving federates, while the second one is discarded. The incorrect messages generated byOPT replica will be kept in the OptMsgFilter after they are forwarded to the OM module. They will be removed from theOptMsgFilter when their corresponding retractions are generated and forwarded to the OM module.

An example of the message exchange mechanism is shown in Fig. 4. The incoming messages, Msg_k and Msg_k + 1, arebuffered in the ConTSOQueue and the OptTSOQueue in the form of callbacks. They are eligible to be delivered to OPT replica.While processing them, OPT replica generates Msg_l and Msg_l + 1 respectively. These outgoing messages are sent toFederate2, as they are not generated by CON replica previously. In contrast to OPT replica, Msg_k + 1 is not delivered to

Table 1Additional variables and flags.

Name Type Description

ConReqTime Double Requested time from CON replicaOptReqTime Double Requested time from OPT replicaConLGT Double Last Granted Time of CON replicaOptLGT Double Last Granted Time of OPT replicaLGT Double Last Granted Time of FederateConInTAS Boolean CON replica is in time advancing or granted stateOptInTAS Boolean OPT replica is in time advancing or granted state

Page 6: Un-identical federate replication structure for improving performance of HLA-based simulations

RTI-Amb

FederateCON OPT

Replication Module

SendMessage

ReceiveMessage

S1

DeliverCallbacks

S1R3 R3

R2

R2

S2

S2

S3 R1

R3

R3

OM Module

Outgoing Message Incoming Message

OptTSOueue

ConTSOueueConMsgFilter

OptMsgFilter

Fig. 3. Message exchange mechanism.

RTI-Amb of Federate2RTI-Amb of Federate1Federate1

CO OP OM ModulReplication Module

Msg_l

Msg_l+1

Msg_l

Retraction of Msg_l+1

Buffer Msg_k in TSOQueues

Msg_l+1Buffer Msg_l+1 in OptMsgFilter

Msg_l

Msg_l+1

Msg_l

Msg_kMsg_k

Buffer Msg_l in OptMsgFilter

Buffer Msg_k+1 in TSOQueues Msg_k+1Msg_k+1

Msg_k & Msg_k+1

Msg_k

Retraction of Msg_k+1Retraction of Msg_k+1

Retraction of Msg_k+1RemoveMsg_k+1 from ConTSOQueue

Retraction of Msg_l+1 Retraction of Msg_l+1

Remove Msg_l+1 from OptMsgFilter

Remove Msg_l from OptMsgFilter

OM Module

Fig. 4. An example of message exchange mechanism.

Z. Li et al. / Simulation Modelling Practice and Theory 48 (2014) 112–128 117

CON replica because its RVT is greater than ConLGT. Since replicas are committed equivalent, CON replica generatesMsg_l while processing Msg_k. The message is simply discarded, as the same message is kept in the OptMsgFilter. Sometimelater, Federate2 sends retraction of Msg_k + 1 which is identified as an incorrect message. The retraction is annihilated withMsg_k + 1 in the ConTSOQueue. However, it is delivered to OPT replica, as Msg_k + 1 has been removed from theOptTSOQueue when it was delivered to OPT replica. While processing the retraction, OPT replica retracts Msg_l + 1 andthe OM module sends the corresponding retraction to Federate2.

In addition, outgoing messages in the following cases can be simply discarded without searching the same messages inthe message filters:

Remark 1. (i) The message is discarded if its RVT < LGT + LA. (ii) The message generated by OPT replica is discarded if itsRVT < min(ConReqTime, ConTSOHead) + LA and ConInTAS is true.

In case (i), the message can be discarded because the other replica must have logical time = LGT and have generated allcorrect messages with RVTs < LGT + LA. For this reason, OptMsgFilter and ConMsgFilter can be reclaimed once LGT isupdated, by removing those messages with RVTs < LGT + LA. In case (ii), if the RTI-Amb will not receive any message with

Page 7: Un-identical federate replication structure for improving performance of HLA-based simulations

118 Z. Li et al. / Simulation Modelling Practice and Theory 48 (2014) 112–128

RVT < ConTSOHead, CON replica will be granted to min(ConReqTime, ConTSOHead), and thus, it will not send any messagewith RVT < min(ConReqTime, ConTSOHead) + LA. Otherwise, (i.e., if the RTI-Amb receive message with RVT < ConTSOHead),CON replica can only generate messages with RVTs < min(ConReqTime, ConTSOHead) + LA while handling the new messagesreceived by the RTI-Amb. These generated messages are obviously different from those messages generated by OPT replicabefore the new messages are received by the RTI-Amb. In summary, CON replica in time advancing state will not generatethe same message as the one generated by OPT replica with RVT < min(ConReqTime, ConTSOHead) + LA. The message gen-erated by OPT replica in case (ii) is a correct message, if CON replica has generated the same message before entering thetime advancing state. It can be discarded as the same message generated by CON replica has been sent by the RTI-Amb pre-viously. Otherwise, the message is generated by OPT replica only. It must be an incorrect message and can be discarded.

3.3. Time management mechanism

The time management mechanism of the un-identical federate replication structure is shown in Fig. 5. The solid lines rep-resent the time advancement requests initiated by CON and OPT replicas of the federate, while the dashed lines show timeadvancement grant delivered to the replicas. The time management mechanism requires the replication module to cooperatewith the existing Time Management (TM) module in the original RTI-Amb. The TM module calculates GALT values using asynchronous algorithm proposed in [21] based on the Conditional Information (CI) [17]. A CI report is a conditional guaran-tee ensuring that the sending federate will not send a message with RVT smaller than the reported value, if it does not later-on receive a message with RVT smaller than the TS of its earliest unprocessed events. The synchronous algorithm calculatesGALT value of a federate as the minimum of the CI values reported by all federates at the same time. The calculated GALTvalue is the minimum TS of all unprocessed events in the federation at the moment.

The replication module handles the time advancement requests from CON and OPT replicas based on GALT calculation. Inthe case that GALT is not great enough, replication module will send time management information (TMInfo) of replicas to theTM module for a new round of GALT calculation. In general, the TMInfo of the faster replica is used to increase GALT value.With the increased GALT, the time advancement request from the slower replica will be granted immediately. Besides therequested time (i.e., ConReqTime or OptReqTime), the TS of those messages buffered in the TSO Queues (i.e., ConTSOQueueor OptTSOQueue) should also be considered on handling the time advancement request and calculating the TMInfo of thecorresponding replica. More details will be illustrated below in Fig. 7.

As shown in Fig. 6, a GALT calculation may be carried out by TM module in two cases: (i) receiving TMInfo from thereplication module (CalculateGALT() function); and (ii) receiving CI reports from other federates (ReceiveCI() function).CISeqNum (or GALTSeqNum) keeps track of the number of sent CI reports (or calculated GALT values).

On receiving TMInfo, a new CI report may be created. The CI report with CISeqNum attached is denoted as CI_CISeqNum. Inthe synchronous algorithm [21], it is unnecessary to create a new CI report, if the previous CI report has not been used forGALT calculation (i.e., if CISeqNum > GALTSeqNum (Lines 1 and 2 of Fig. 6)). If a new CI report is created, it is sent to otherfederates by invoking their ReceiveCI() functions (Lines 3–6). In the case that CI reports with the sequence number equal toCISeqNum have been received from all other federates, the value of GALT can be calculated by taking the minimum value ofthe sent and received CI_CISeqNum (Lines 9–10) and the return value of CalculateGALT() function is true (Line 11). The

RTI-Amb

FederateCON OPT

Replication Module

NMR

NMR(ConReqTime)

FQR

TAG(ConLGT)

CalculateGALT(TMInfo)

GALTUpdate(GALT)

TM ModuleGALT Calculation Algorithm

FQR(OptReqTime)

TAG(OptLGT)

CalculateGALT(TMInfo)

Time Advancement Request Time Advancement Grant

ConTSOueue

GALT

OptTSOueue

Fig. 5. Time management mechanism.

Page 8: Un-identical federate replication structure for improving performance of HLA-based simulations

bool CalculateGALT(TMInfo){1. If (CISeqNum>GALTSeqNum) 2. return false;3. CI=min(TMInfo+LA, minTSOfTransientMessages);4. CISeqNum++;5. send CI_CISeqNum to DRCs of other federates 6. reset minTSOfTransientMessages;7. if (CI_CISeqNum from a DRC not received yet)8. return false;9. GALT = min (all CI_CISeqNum);10. GALTSeqNum=CISeqNum;11. return true;}

void ReceiveCI(sendingFederate, CI_CISeqNum){12. if (CI_CISeqNum from a DRC not received yet)13. return;14. GALT = min (all CI_CISeqNum);15. GALTSeqNum=CISeqNum;16. ReplicationModule.GALTUpdated();}

Fig. 6. GALT calculation (TM Module).

Z. Li et al. / Simulation Modelling Practice and Theory 48 (2014) 112–128 119

calculated GALT value is denoted as GALT_GALTSeqNum, where GALTSeqNum equals CISeqNum. If the GALT value is notupdated, the return value of CalculateGALT() function will be false (Lines 2 and 8). The transient message problem, whichmight happen in the synchronous algorithm [21], is solved using Grid service acknowledgements in a way similar to theGlobal Virtual Time (GVT) algorithm described in [22]. The variable minTSOfTransientMessages, which keeps track of theminimum RVT of transient messages and retractions sent by the RTI-Amb, is considered when creating a CI report (Lines3 and 6). A new value of GALT may be also calculated when a CI_CISeqNum is received from one of other federates (Lines12–15). In this case, the TM module informs the replication module about the updated GALT by invoking GALTUpdated() func-tion (Line 16).

The Next Messages Request (NMR) service with parameter t is usually used by a CON federate for time advancementrequest, where t is typically the TS of the earliest event in the federate. The pseudo code of NMR service is shown inFig. 7(a). As discussed in case(ii) of Remark 4.1, when CON replica is in time advancing state, the messages in the OptMsg-Filter with RVTs < TMInfo + LA (note that TMInfo = min(ConReqTime, ConTSOHead)) can be identified as incorrect messagesand be retracted (Lines 6 and 7 of Fig. 7(a)). The replication module sends TMInfo to the TM module for GALT calculations

Fig. 7. Time advancement of federate replicas (Replication Module).

Page 9: Un-identical federate replication structure for improving performance of HLA-based simulations

120 Z. Li et al. / Simulation Modelling Practice and Theory 48 (2014) 112–128

until either the GALT value is greater than TMInfo or GALT can no longer be updated (Lines 8–12). If the GALT value is greaterthan TMInfo, ConLGT and LGT are updated; ConMsgFilter and OptMsgFilter are reclaimed; and the NMR request is grantedthrough a TAG callback (Lines 13–17). Unfortunately, the GALT calculation, as shown in Fig. 6, might be blocked if a CI reportfrom one of the federates is not received yet. In this case, the GALT cannot be updated to a value greater than TMInfo imme-diately. As a result, the NMR request is simply returned without granting the time advancement request.

While CON replica is in the time advancing state, ConReqTime is a constant value; whereas ConTSOHead may change onreceiving new messages and retractions. TM module waits for CI reports from other federates for GALT calculation. Once theGALT value is updated, it informs the replication module by invoking GALTUpdated() function, whose pseudo code is alsoshown in Fig. 7(a). When the GALT value is great enough, the pending time advancement request of CON replica can begranted. Based on the above discussion, the following remark is obvious:

Remark 2. The following must be true when NMR request is granted:

Con=

ConLGT ¼ minðConReqTime;ConTSOHeadÞ ð1ÞConLGT < GALT ð2Þ

The Flush Queue Request (FQR) service is usually used by an OPT federate for time advancement request. The pseudo codeof FQR service is shown in Fig. 7(b). In the case that there are RR callbacks buffered in the OPT Non-message Callback Queue,the replication module should deliver them to OPT replica immediately, without advancing its logical time (Lines 5–7 ofFig. 7(b)). The replication module sends TMInfo (i.e., min(OptReqTime, OptTSOHead)) to the TM module for GALT calculations(Lines 8–13). If GALT P MInfo, the logical time of OPT replica is advanced to the minimum of OptReqTime and OptTSOHead(Lines 16–20). Otherwise, instead of waiting for CI reports from other federates for further GALT calculations, the replicationmodule simply grants OPT replica to advance its logical time to the current value of GALT. Thus, we have the followingremark:

Remark 3. FQR request will be always granted immediately and when it is granted, the following must be true:

OptLGT ¼ minðOptReqTime;OptTSOHead;GALTÞ ð3ÞOptLGT 6 GALT ð4Þ

An example of time management mechanism is shown in Fig. 8. When CON replica of Federate1 invokes an NMR request,the replication module requests the TM module to calculate a new value of GALT, as GALT_k-1 6min(ConReqTime, ConTSO-Head). Since CI_k has been received from Federate2, GALT_k can be calculated immediately. As GALT_k is still not greater thanTMInfo, the replication module requests for another GALT calculation. However, the TM module cannot calculate GALT valuebefore receiving CI_k + 1 of Federate2. Hence, the NMR request is returned and CON replica remains in the time advancing

RTI-Amb of Federate2

RTI-Amb of Federate1Federate1

CON OPT TM ModuleReplication Module

CalculateGALT(TMInfo)GALT_k-1 TMInfo

CI_k

Return

True

TAG (TMInfo)

TM Module

NMR (t)

GALT updated

GALT_k TMInfoCalculateGALT(TMInfo)

False

CI_k+1GALT updated

Messages/Retractions

Messages/RetractionsConTSOHead changed

GALTUpdatedGALT_k+1 > TMInfo

FQR (t)GALT_k+1 < TMInfo

CalculateGALT(TMInfo)

CI_k

CI_k+1

CI_k+2

False

TAG (GALT_k+1) CI_k+2GALT updated

GALTUpdated

InTASTrue

OptInTAS=True

Return

GALT_k+2

Fig. 8. An example of time management mechanism.

Page 10: Un-identical federate replication structure for improving performance of HLA-based simulations

Z. Li et al. / Simulation Modelling Practice and Theory 48 (2014) 112–128 121

state. After that, the ConTSOHead might change on receiving some messages and retractions; GALT_k + 1 is calculated by theTM module on receiving CI_k + 1 from Federate2. The new GALT value is passed to the replication module through GALTUp-dated() function. Since GALT_k + 1 is now greater than TMInfo, the CON replica is granted to advance its logical time to themin(ConReqTime, ConTSOHead). When OPT replica of Federate1 invokes a FQR request, the replication module also requestsfor GALT calculation, as GALT_k + 1 is smaller than TMInfo of the OPT replica. Although the TM module cannot calculateGALT_k + 2 immediately, the replication module can simply grant OPT replica to advance its logical time to GALT_k + 1.

In summary, un-identical federate replication structure can improve simulation performance as the faster replica ischosen to represent the federate. The messages are delivered to the receiving federates when they are first generated bythe faster replica. The faster replica is also more likely to produce greater TMInfo for GALT calculation. The slower replicais able to get benefit from the replication structure and catch up with the faster one easily. If CON replica is slower, its timeadvancement requests can be granted immediately using the GALT value increased by OPT replica. If OPT replica is slower, itcan follow the execution of CON replica without suffering from any execution rollback. On the other side of a coin, thereplication structure may introduce some overheads. For instance, message comparisons are needed before sending ordiscarding the messages generated by replicas.

4. Correctness proofs

We first prove the correctness of the time management mechanism by ensuring that the TM module calculates GALTcorrectly and the replication module grants correct logical time to federate replicas. Then we prove the correctness of themessage exchange mechanism by ensuring that the messages (correct messages, incorrect messages and retractions) areproperly sent and received under the control of the time management mechanism.

4.1. Correctness of time management mechanism

Lemma 4.1. GALT value is calculated correctly.

Proof. In un-identical federate replication structure, the GALT value is calculated as the minimum value of CI reportsreceived from all the federates in the federation with the same sequence number (see Lines 9 and 14 in Fig. 6). As provedbelow, the value of each CI report sent by a RTI-Amb is the lower bound of RVTs of future outgoing messages and retractions,if the RTI-Amb will not receive any new incoming message or retraction with RVT < CI � LA. Hence, the calculated GALT valuerepresents the lower bound of RVTs of messages and retractions which might be generated in the entire federation. Obvi-ously, it is also the lower bound of RVTs of messages or retractions which might be received by the RTI-Amb in the future.Consequently, we can conclude that GALT value is calculated correctly.

As mentioned in Section 3.3, a CI report is created in the following two cases (see Fig. 7): (i) TMInfo of CON replica is sentby the replication module viaCalculateGALT() request while processing NMR() or GALTUpdated() function; (ii) TMInfo of OPTreplica is sent by the replication module via CalculateGALT() request while processing FQR() function. For case (i), if the RTI-Amb will not receive any new incoming message or retraction with RVT < CI � LA, we can prove:

� CON replica will not generate any message with RVT < CI. On creating the CI report, we can get (Line 5 of Fig. 7(a) and Line 3of Fig. 6)

CI 6 minðConReqTime;ConTSOHeadÞ þ LA ð5Þ

The future messages generated by CON replica should have RVTs P min(ConReqTime, ConTSOHead) + LA, if the RTI-Amb willnot receive any new message or retraction with RVT < CI � LA. Hence, we can derive that CON replica will not generate anymessage with RVT < CI.� Messages generated by OPT replica with RVTs < CI are discarded. If OPT replica is slower than CON replica, it might generate

some messages with RVTs < CI. According to Inequality (5), these messages must have RVTs < min(ConReqTime, ConTSO-Head) + LA. If these messages are generated before the NMR request is granted, they are discarded (see case (ii) of Remark4.1). Otherwise, if NMR is granted, ConLGT = min(ConReqTime, ConTSOHead) (see Remark 4.2). So, these messages arealso discarded as their RVTs < ConLGT + LA 6 LGT + LA (see case (i) of Remark 4.1).� The RTI-Amb will not send out retractions with RVTs < CI. As shown in Fig. 7(a), the messages in the OptMsgFilter with

RVTs < min(ConReqTime, ConTSOHead) + LA are retracted before the CI report is created. In addition, the messages gen-erated by OPT replica with RVTs < CI are discarded (the second bullet). Hence, we can get OptMsgHead P min(ConReq-Time, ConTSOHead) + LA P CI. Since a retraction is sent by the RTI-Amb only if there is a corresponding incorrect messagekept in the OptMsgFilter, all retraction should have RVT P OptMsgHead. Consequtently, we can derive that the RTI-Ambwill never retract messages with RVTs < CI.

For case (ii), if the RTI-Amb will not receive any new incoming message or retraction with RVT < CI � LA, we can prove:

Page 11: Un-identical federate replication structure for improving performance of HLA-based simulations

122 Z. Li et al. / Simulation Modelling Practice and Theory 48 (2014) 112–128

� OPT replica will not generate any message and retraction with RVT < CI. On creating the CI report, we can get (Line 8 ofFig. 7(b) and Line 3 of Fig. 6)

CI 6 minðOptReqTime;OptTSOHeadÞ þ LA ð6Þ

Since the buffered retractions have been handled (see Lines 5–7 of Fig. 7)) and the RTI-Amb will not receive any newmessages or retractions with RVT < CI � LA, we can get that OPT replica will never receive any message or retraction withRVT < CI � LA. Hence, OPT replica will not send any message or retraction with RVT < CI. In other worlds, OPT replica hasgenerated all correct messages with RVTs < CI and retracted all incorrect messages with RVTs < CI.� Messages generated by CON replica with RVTs < CI are discarded. These messages are correct messages. As described in

the first bullet, the same messages must have been generated by OPT replica previously. They have been sent tothe receiving federates and are kept in the OptMsgFilter if not garbage collected. Hence, these message generatedby CON replica are discarded either because their RVTs < LGT + LA or because the same messages are found in theOptMsgFilter.� The RTI-Amb will not send out retractions with RVTs < CI. Again, as described in the first bullet, all incorrect messages with

RVTs < CI have been retracted by OPT replica.

In summary, the CI report sent by the RTI-Amb in either case (i) or case (ii) is correct. It represents the lower bound ofRVTs of future messages and retractions sent by the RTI-Amb, if the RTI-Amb will not receive any new message or retractionwith RVT < CI < LA. h

Since GALT value is calculated correctly, the following lemma is obvious:

Lemma 4.2. Time advancement requests from replicas are granted correctly. That is, message with RVT 6 ConLGT willnot be delivered to CON replica in the future; and message or retraction with RVT < OptLGT will not be delivered to OPT replica inthe future.

4.2. Correctness of message exchange mechanism

Lemma 4.3. For a sending federate, one and only one instance of each correct message is sent by RTI-Amb; each incorrect messageand its corresponding retraction are either both sent or both discarded by RTI-Amb.

Proof. Since replicas of the same federate are committed equivalent, each correct message should be generated by both CONand OPT replicas. In the case that the correct message is generated by CON replica first, we can get its RVT P ConLGT + LA.Since OPT replica will generate the same message some time later, we can also get its RVT P OptLGT + LA. Hence, the mes-sage will be sent by the RTI-Amb to the receiving federates as its RVT P LGT + LA and the same message must not be found inthe OptMsgFilter. The same message generated by OPT replica some time later will be discarded either because one of thecases in Remark 4.1 is true or because the same message can be found in the ConMsgFilter.

In the case that the correct message is generated by OPT replica first, we can similarly get that its RVT P LGT + LA. It isalso obvious that the condition ConInTAS && RVT < min(ConReqTime, ConTSOHead) + LA is not true (otherwise, CONreplica will not generate the same correct message as the one generated by OPT replica (see discussion for Remark 4.1. So,the message will be sent out by the RTI-Amb to the receiving federates as the conditions for both cases in Remark 4.1 arenot true and the same message must not be found in the ConMsgFilter. The same correct message generated by CONreplica some time later will be discarded either because the first case of Remark 4.1 is true or because the same messagecan be found in the OptMsgFilter.

So, one and only one instance of each correct message is sent by the RTI-Amb, regardless that it is first generated by CONor OPT replica.

If an incorrect message generated by OPT replica is discarded by the RTI-Amb, it will not be buffered in theOptMsgFilter. Consequently, the RTI-Amb will never send out its corresponding retraction. If an incorrect messagegenerated by OPT replica is sent by the RTI-Amb, it will be kept in the OptMsgFilter and its corresponding retraction mustbe sent before LGT (i.e., either ConLGT or OptLGT) becomes greater than its RVT � LA. Since OPT replica must haveprocessed all the events with TS smaller than OptLGT correctly, it must have retracted all the incorrect messages withRVT < OptLGT + LA. As shown in Fig. 7(a) (Lines 6–8), before ConLGT increases to min(ConReqTime, ConTSOHead), messageskept in the OptMsgFilter are identified as incorrect messages and are retracted if their RVT < min(ConReqTime,ConTSOHead) + LA. Hence, an incorrect message, if kept in the OptMsgFilter, must have been retracted before ConLGTbecomes greater than its RVT � LA.

Hence, the incorrect message and its corresponding retraction are either both sent or both discarded by the RTI-Amb. h

Lemma 4.4. For a receiving federate, correct messages are delivered to both replicas; and incorrect messages and their corre-sponding retractions are only sent to OPT replica rather than CON replica.

Page 12: Un-identical federate replication structure for improving performance of HLA-based simulations

Z. Li et al. / Simulation Modelling Practice and Theory 48 (2014) 112–128 123

Proof. The correct messages received by the RTI-Amb are buffered in the TSO queues of CON and OPT replicas. As a result,they will receive the same correct messages. The incorrect messages received by the RTI-Amb are also buffered in TSOqueues of the CON and OPT replicas. According to Lemmas 4.1 and 4.3, we can deduce that the retractions of an incorrectmessage must have been received by the RTI-Amb when GALT becomes greater than its RVT. Since only those messages withRVT 6 ConLGT in the ConTSOQueue are eligible to be delivered to the CON replica and ConLGT < GALT (see Remark 4.2), wecan deduce that incorrect messages will not be delivered to CON replica. Hence, all incorrect messages are annihilated withtheir retractions instead of being delivered to CON replica. In contrast, the incorrect messages kept in the OptTSOQueuemight be delivered to OPT replica. Similarly, for an incorrect message, the RTI-Amb must have received its correspondingretraction before GALT becomes greater than its RVT. Hence, the retraction of an incorrect message must have been alreadydelivered to the OPT replica when its OPTLGT (OPTLGT6 GALT, according to Remark 4.3) is greater than the RVT of the mes-sage. Hence, the effects of incorrect messages are removed from OPT replica. h

Based on Lemmas 4.2, 4.3, 4.4, we can then have the following Theorem:

Theorem 4.5. Replicas of each federate executed on un-identical federate replication structure produce the same simulationresults as the federate executed on non-replication structure.

Proof. According to Lemmas 4.3 and 4.4, in the un-identical federate replication structure, correct messages are delivered toCON and OPT replicas of the receiving federates without message loss and duplication. Although some of the incorrect mes-sages might be delivered to OPT replicas, they will be retracted. Hence, the committed events processed in CON and OPTreplicas of the same federate are the same correct events. Since the replicas are committed equivalent and their logical timeare advanced correctly (see Lemma 4.2), they process committed events in the same manner following the same TS order asthe federate executes on the non-replication structure. That is, replicas of each federate executed on the un-identical feder-ate replication structure produce the same simulation results as the federate executed on non-replication structure. There-fore, the correctness of the un-identical federate replication structure is proved. h

5. Experiments and results

Experiments are carried out to evaluate the un-identical federate replication structure in terms of performance improve-ment, while verifying its correctness. Besides that, experiments are also carried out to study the scalability and overhead ofthe un-identical federate replication structure.

5.1. Experiment design

As discussed at the end of Section 3.1, the replication manager is embedded in RTI-Amb for improving efficiency. If so,RTI-Amb should be modified. For this reason, the un-identical federate replication structure has been implemented basedon an open-source Service Oriented HLA RTI (SOHR) [23]. SOHR enables HLA-based simulation executions on Grid througha group of predefined and connected Grid services. A federate is connected to the federation through a Local Service (LS),which plays the role of RTI-Amb in Fig. 1. The replication manager is embedded in the LS. It cooperates with other modulesto implement message exchange and time management mechanisms as described in Section 3.2 and 3.3.

A synthetic P-Hold simulation [24] is used in our experiments. The simulation length is 12,000 time units, excludinganother 12,000 time units as the warmup time. It is composed of two symmetric federates, which are denoted as Federate1

and Federate2. Initially, each federate has an internal event. When processing an internal event, in addition to generateanother internal event, the federate may send an external event to the other federate. The probability of generating an exter-nal event is denoted as PExternEvent . In the case that an external event is received, the federate just processes the external eventwithout generating either an internal or external event.

With an increasing PExternEvent , events are exchanged more frequently between federates, and thus, the probability of execu-tion rollbacks in OPT federates increases. In addition to PExternEvent , other parameters of the P-hold simulation, e.g., simulationtime of generated internal and external events, LAs of federates, event processing time and federate state saving and restorationtime in OPT federates, also affect the performance of conservative and optimistic synchronization approaches. The default set-ting of the parameters is presented as follows. The simulation time of the generated internal event equals the TS of the pro-cessed internal event plus a random number between 10 and 110. The external event is sent using an interaction with RVTequal to TS of the processed internal event plus LA. The LAs of both federates are set to 1. The message size is fixed as 50 Bytes.

Since experiments are executed in a Grid environment based on SOHR, we configure the P-hold benchmark with exag-gerated workload. The event processing time may further increase as a number of events can be combined for processing[25]. Therefore, spin-loops with 300 ms and 100 ms are used to emulate the processing time of internal and external eventsrespectively. Similarly, spin-loops with 50 ms are used to emulate the state saving and restoration overheads in OPT federates.

Experiments are carried out on a cluster, which has computing nodes installed with 2 � Dual core Xeon 3.0GHZ CPU, 4GRAM and Redhat Enterprise 4 OS. Each federate replica, and the LS of each federate in SOHR are executed on separatecomputing nodes.

Page 13: Un-identical federate replication structure for improving performance of HLA-based simulations

124 Z. Li et al. / Simulation Modelling Practice and Theory 48 (2014) 112–128

5.2. Experiment results

Three execution scenarios are investigated in our experiments. In the Conservative and Optimistic scenarios, federatesemploy conservative and optimistic synchronization respectively. They are executed on the original SOHR without replica-tion structure. In the Replicated scenario, each federate has two replicas: CON and OPT replicas. They are executed concur-rently on SOHR with un-identical federate replication structure.

Depending on the execution scenarios, the RTI-Amb (i.e., LS in SOHR) may be connected with a CON replica, an OPT rep-lica, or both CON and OPT replicas of a federate. According to our observation, replicas of the same federate in all executionscenarios produce exactly the same simulation results. This double-confirms the correctness of the proposed un-identicalfederate replication structure. For each execution scenario, the federation execution is repeated five times. Since the execu-tion times of the repeated executions have small variance, only the average execution time is reported in the experimentresults.

5.2.1. Performance improvementTwo series of experiments were carried out to illustrate the advantages of the un-identical federate replication structure

in terms of performance improvement. In the first series of experiments, the PExternEvent value is a constant value during thesimulation execution. As shown in Fig. 9(a), the Optimistic scenario outperforms the Conservative one whenPExternEvent < 50%. The reason is that the federates seldom send external events to each other and only a small number ofrollbacks occur in both federates. So, optimistic approach can explore parallelism more effectively than conservative one.With an increasing PExternEvent , the performance advantage of the Optimistic scenario decreases. When PExternEvent ¼ 50%, Con-servative and Optimistic scenarios have comparable performance. When PExternEvent > 50%, Optimistic scenario performsworse than the Conservative scenario due to the frequent execution rollbacks. The most important observation is that theReplicated scenario can outperform both the Conservative and Optimistic scenarios. For extreme cases, the Replicated sce-nario performs similarly to the Optimistic scenario when PExternEvent ¼ 0%, and to the Conservative scenario whenPExternEvent ¼ 100%.

As shown in the first series of experiments, Conservative scenario and Optimistic scenario can outperform each otherdepending on the value of PExternEvent . If the value of PExternEvent varies in the same execution, the Optimistic scenario can out-perform the Conservative one during one execution phase, while the Conservative scenario can outperform the Optimisticone during another execution phase.

In the second series of experiments, we measure the execution times of the federation with variable PExternEvent , which ischanged every 2400 simulation units (i.e., one fifth of the termination time of the P-hold simulation). Five cases of variablePExternEvent are investigated, as shown in Table 2. The execution times of different execution scenarios are shown in Fig. 9(b).Compared to the better result between Conservative and Optimistic scenarios, simulation performance is improved from 10%to 16% in Replicated scenario.

5.2.2. ScalabilityTo study the scalability of the un-identical federate replication structure, the federation scale is increased from two to

eight. Constant PExternEvent (equal to 50%) is used in the simulation executions. The external events are sent to all otherfederates in the federation.

The execution times of different scenarios with respect to the federation scale are shown in Fig. 10(c). As we can see, theReplicated scenario outperforms other scenarios by 11–17%. The execution times in the Conservative and Optimistic scenar-ios increase linearly with respect to the federation scale, and the same trend can be also observed for the Replicated scenario.This shows good scalability of the un-identical federate replication structure. Fig. 10(a) reports the averaged number of exe-cution rollbacks occurred in the federates. As we can see, OPT replica in the Replicated scenario encounters fewer rollbackscompared with the OPT federate in the Optimistic scenario. Fig. 10(b) reports the number of GALT calculations conducted in

150

200

250

300

Exec

u�oi

n Ti

me

(sec

)

Conserva�ve Op�mis�c Replicated

50

100

0 25 50 75 100

PExternEvent (%)

Conserva�ve Op�mis�c Replicated

140

160

180

200 Conserva�ve Op�mis�c Replicated

100

120

Case 1 Case 2 Case 3 Case 4 Case 5

Exec

u�on

Tim

e (s

ec)

Variable PExternEven Cases

(a) ConstantPExternEvent (b)Variable PExternEvent

Fig. 9. Execution times in different scenarios.

Page 14: Un-identical federate replication structure for improving performance of HLA-based simulations

Table 2Variable PExternEvent

Cases Change patterns of PExternEvent Description

1 0%, 25%, 50%, 75%, 100% Increasing PExternEvent

2 100%, 75%, 50%, 25%, 0% Decreasing PExternEvent

3 0%, 100%, 0%, 100%, 0% Alternative PExternEvent

4 0%, 50%, 100%, 50%, 0% Highest PExternEvent in the middle5 100%, 50%, 0%, 50%, 100% Lowest PExternEvent in the middle

200

300

400

500

600

700

800

900

Num

ber

of r

ollb

acks

Op�mis�c

Replicated

0

100

2 3 4 5 6 7 8

Number of Federates

Op�mis�c

Replicated

2500

3000

3500Conserva�ve

Op�mis�c

Replicated

1000

1500

2000

GA

LT C

alcu

la�

ons

Conserva�ve

Op�mis�c

Replicated

0

500

2 3 4 5 6 7 8

Number of federates

Conserva�ve

Op�mis�c

Replicated

Conserva�ve

Op�mis�c

Replicated

Conserva�ve

Op�mis�c

Replicated

Conserva�ve

Op�mis�c

Replicated

300

400

500

600

700

800

900

Exec

u�on

Tim

e (s

ec)

Conserva�ve

Op�mis�c

Replicated

100

200

2 3 4 5 6 7 8

Number of Federates

Conserva�ve

Op�mis�c

Replicated

(a) Execution rollbacks(Federate1) (b) GALT calculations

(c) Federation execution time

Fig. 10. Scalability of un-identical federate replication structure.

Z. Li et al. / Simulation Modelling Practice and Theory 48 (2014) 112–128 125

different scenarios. As we can see, the Replicated scenario requires fewer GALT calculations compared with the Conservativescenario.

5.2.3. OverheadAccording to the description in Section 3, we can deduce that the major overhead of our proposed un-identical federate

replication structure is introduced while comparing the messages generated by federate replicas and the messages kept inmessage filters. In order to investigate the overhead, two series experiments are carried out. The federation is composed oftwo federates, each of which uses the constant PExternEvent equal to 50%.

In the first series experiments, a group of messages with different RVT may be sent by a federate while processing aninternal event. The numbers of generated and exchanged messages in the federation execution are shown in Fig. 11(a). InConservative and Optimistic scenarios, the number of generated messages and exchanged messages are the same. Accordingto the features of the synchronization approaches, we can deduce that all message generated and exchanged in Conservativescenarios are correct messages; in addition to the correct messages, a number of incorrect messages may be generated andexchanged in Optimistic scenario. For this reason, we can observe from Fig. 11(a) that the number of messages generated andexchanged in Optimistic scenario is much greater than that in Conservative scenario. The difference of the number of mes-sages between Optimistic and Conservative scenarios increases with the increasing message group size.

Due to the redundant executions, the number of messages generated in Replicated scenario is greater than that inConservative or Optimistic scenario. However, it is much smaller than the sum of the numbers of generated messages inConservative and Optimistic scenarios, which can be deduced from Fig. 11(a). This indicates that less incorrect messagesare generated by OPT replicas in Replicated scenario compared with the optimistic federates in Optimistic scenario. Wecan also observe from Fig. 11(a) that the number of messages exchanged in Replicated scenario is smaller than that inOptimistic scenario. However, the number of messages exchanged in Replicated scenario is greater than that in Conservativescenario.

Page 15: Un-identical federate replication structure for improving performance of HLA-based simulations

400045005000

Conserva�ve

Op�mis�c

Replicated (Generated)

15002000250030003500 Replicated (Exchanged)

Replicated (Compared)

0500

1000

1 2 3 4 5 6 7 8

7

8

Leng

th o

f Mes

sage

Filt

ers

Leng

th o

f Mes

sage

Filt

ers

0

1

2

3

4

5

6

1 2 3 4 5 6 7 8

Message Group Size

Message Group Size

Message Group Size

Message Group Size

ConMsgFilter

OptMsgFilter

100

150

200

250

300

350

400

1 2 3 4 5 6 7 8

Exec

u�on

TIm

e (s

ec)

Exec

u�on

TIm

e (s

ec)

Conserva�ve

Op�mis�c

Replicated

150

140

160

170

180

190

200Conserva�ve Op�mis�c Replicated

120

130

50 100 150 200 250 300 350 400 450 500 550 600

(b) Length of message filters VS Messagegroup size

(c) Execution Time VS Message group size (d) Execution Time VS Message size

(a) Number of messages VS Messagesizegroup

Fig. 11. Overhead of un-identical federate replication structure.

126 Z. Li et al. / Simulation Modelling Practice and Theory 48 (2014) 112–128

As aforementioned, the comparison between messages generated by federate replicas and messages buffered in messagefilters is the major overhead of the un-identical federate replication structure. Fortunately, message filters are reclaimed intime, i.e., message with RVT < LGT + LA are removed from message filters (as shown in Fig. 7). Therefore, the averaged lengthof ConMsgFilter and OptMsgFilter are small, as shown in Fig. 11(b). What is more, the generated messages might be simplydiscarded without comparing with the messages in the message filters (see Remark 4.1). In addition, message content com-parison is carried out only if the compared messages have the same RVT. Therefore, the number of message content compar-isons in Replicated scenario illustrated in ‘‘Replicated(Compared)’’ curve in Fig. 11(a) can be reduced to a low value. It is evensmaller than the number of messages exchanged in Conservative scenario regardless to the message group size. That is, inaverage, less than one message comparison is needed for each correct message exchanged in the federation.

Last but not least, the federation execution time in different scenarios are shown in Fig. 11(c). As we can see, theOptimistic scenario performs slightly worse than the Conservative scenarios especially when the message group size is large.This can be explained by the reason that the number of messages in the Optimistic scenario increases more significantly thanthat in the Conservative scenario, with respect to the increasing message group size (refer to Fig. 11(a)). The most importantobservation is that, the Replicated scenario can always outperform other scenarios by 6–17%, regardless of the messagegroup size. Furthermore, similar trends can be observed the curves of Conservative and Replication scenarios in Fig. 11(c).That is, the federation execution increases linearly with respect to the increasing message group size. Therefore, we candeclaim that our replication structure is able to control the overhead to a low level, even if messages are frequentlyexchanged in the federation.

In the second series experiments, the message group size is fixed as one. However, the message size is increased from 50Bytes to 600 Bytes. The execution time of different execution scenarios are shown in Fig. 9(b). As we can see, the Replicatedscenario significantly outperforms Conservative and Optimistic scenarios regardless of the message size. Furthermore, sim-ilar trends can be observed in the curves of these three scenarios. That is the federation execution time increases slightlywith respect to the increasing messages size. Therefore, we can declaim that the overhead our replication structure is notaffect by the message size significantly.

6. Related work

Many research works have been done on evaluating and predicting the performances of conservative and optimisticapproaches [9,26,27]. It is generally agreed that a conservative approach cannot outperform an optimistic approach in everysituation, and vice versa. To take advantages of different synchronization approaches, a number of hybrid protocols have

Page 16: Un-identical federate replication structure for improving performance of HLA-based simulations

Z. Li et al. / Simulation Modelling Practice and Theory 48 (2014) 112–128 127

been proposed since the beginning of 1990s. Since conservative and optimistic approaches go towards two extremes on con-straining event processing, simulation performance can be improved by finding the compromise between them [28]. One caneither introduce optimism in a conservative approach (e.g., the optimistic-conservative synchronization [29]), or introduceconservatism in an optimistic approach (e.g., Moving Time Window protocol [30] and Adaptive Time Warp protocol [31]).Furthermore, some mechanisms [13,32,33] have also been proposed to enable mixing and switching different synchroniza-tion approaches dynamically during the simulation. In these mechanisms, each federate employs only one synchronizationapproach at any time during the federation execution. Differently, we propose to replicate federate using both conservativeand optimistic synchronization approaches.

Active replication technique has inspired a new direction for simulation performance improvement. The concept of rep-licated objects in Time Warp-based simulation was introduced in [34]. The simulation performance can be improved byobtaining states from the local or closest replica of the object. However, the replicated objects are treated as individualobjects in the simulation execution. This leads to the increase of simulation scale and associated overhead. Barrier [19]and No-Barrier [20] replication structures have been proposed using transparent midlleware approaches. Unfortunately,as explained in Section 3.1, these replication structures cannot handle different behaviors of replicas employing differentsynchronization approaches.

7. Conclusions and future work

In this article, we have proposed an un-identical federate replication structure for the purpose of improving performanceof HLA-based simulations. It replicates federates using either a conservative or an optimistic synchronization approach, andchooses the fastest replica to represent the federate in the federation. Using a middleware approach, the un-identical feder-ate replication structure is designed in a transparent manner without increasing federation scale. It is also able to handle thedifferent behavior of the un-identical replicas of the same federate, while guaranteeing the correctness of time advancementand message delivery among federates in the federation. A synthetic P-Hold simulation is used to evaluate the un-identicalfederate replication structure. As experimental results shown, it can significantly accelerate the simulation executions,regardless of the increasing federation scale. In the meantime, its overhead is marginal even in the cases that large messagesare frequently exchanged among federates in the federation.

The un-identical federate replication structure improves simulation performance using duplicated computing resources.In the future, we will investigate the trade-off between simulation performance improvement and execution cost on moderncomputing environments, e.g., data center and Cloud. In the meantime, we will also extend the un-identical federatereplication structure to tolerate both crash-stop failures [8] and Byzantine failures [35].

References

[1] IEEE, IEEE Standard for Modeling and Simulation (M&S) High Level Architecture (HLA)– 1516-2010 Framework and Rules, 1516.1-2010 FederateInterface Specification and 1516.2-2010 Object Model Template Specification (August 2010).

[2] H. Aydt, S.J. Turner, W. Cai, M.Y.H. Low, Symbiotic simulation systems: an extended definition motivated by symbiosis in biology, in: Procs ofWorkshop on Principles of Advanced and Distributed Simulation (PADS’08), 2008, pp. 109–116.

[3] M.R. Lyu, J. hong Chen, A. Avizienis, Software diversity metrics and measurements, in: Procs of Computer Software and Applications Conference(COMPSAC’92), 1992, pp. 69–78.

[4] M.R. Lyu, Software Fault Tolerance, John Wiley & Sons Ltd, 1995.[5] R.E. Bryant, Simulation of Packet Communication Architecture Computer Systems, Tech. rep., Massachusetts Institute of Technology, 1977.[6] K.M. Chandy, J. Misra, Distributed simulation: a case study in design and verification of distributed programs, IEEE Trans. Software Eng. 5 (5) (1979)

440–452.[7] D.R. Jefferson, Virtual time, ACM Trans. Progr. Lang. Syst. 7 (3) (1985) 404–425.[8] Z. Li, W. Cai, S.J. Turner, K. Pan, A replication structure for efficient and fault-tolerant parallel and distributed simulations, in: Procs of Annual

Simulation Symposium, 2010, pp. 151:1–151:10.[9] E. Niewiadomska-Szynkiewicz, A. Sikora, Algorithms for distributed simulation – comparative study, in: Procs of Conf. on Parallel Computing in

Electrical Engineering (PARELEC’02), 2002, pp. 261–266.[10] X. Wang, S.J. Turner, M.Y.H. Low, B.P. Gan, Optimistic synchronization in HLA-based distributed simulation, Simulation 81 (2005) 279–291.[11] A. Santoro, F. Quaglia, Transparent state management for optimistic synchronization in the high level architecture, Simulation 82 (1) (2006) 5–20.[12] A. Santoro, F. Quaglia, Transparent optimistic synchronization in the high-level architecture via time-management conversion, ACM Trans. Model.

Comput. Simul. 22 (4) (2012) 21:1–21:26.[13] K.S. Perumalla, lsik – a micro-kernel for parallel/distributed simulation systems, in: Procs of the 19th Workshop on Principles of Advanced and

Distributed Simulation (PADS’05), 2005, pp. 59–68.[14] C. Carothers, D. Bauer Jr., S. Pearce, Ross: a high-performance, low-memory, modular time warp system, J. Parallel Distrib. Comput. 62 (2002) 1648–

1669.[15] J.M. LaPre, E.J. Gonsiorowski, C.D. Carothers, Lorain: a step closer to the pdes holy grail, in: Procs of Conf. on Principles of Advanced Discrete

Simulation(PADS’14), 2014, pp. 3–14.[16] Z. Li, W. Cai, S.J. Turner, K. Pan, Improving performance by replicating simulations with alternative synchronization approaches, in: Procs of Conf. on

Winter Simulation (WSC’08), 2008, pp. 1112–1120.[17] R.M. Fujimoto, Parallel and Distributed Simulation Systems, Wiley Interscience, 2000.[18] R.M. Fujimoto, Lookahead in parallel discrete event simulation, in: Procs of Conf. on Parallel Processing, 1988, pp. 34–41.[19] C. Berchtold, M. Hezel, An architecture for fault tolerant HLA-Based simulation, in: Procs of European Simulation Multi-Conference (ESM’01), 2001, pp.

616–620.[20] F. Quaglia, Software diversity-based active replication as an approach for enhancing the performance of advanced simulation systems, Int. J. Found.

Comput. Sci. 18 (2007) 495–515.[21] K. Pan, S.J. Turner, W. Cai, Z. Li, A hybrid HLA time management algorithm based on both conditional and unconditional information, Simulation 85

(2009) 559–573.

Page 17: Un-identical federate replication structure for improving performance of HLA-based simulations

128 Z. Li et al. / Simulation Modelling Practice and Theory 48 (2014) 112–128

[22] B. Samadi, Distributed Simulation, Algorithms and Performance Analysis, Ph.D. thesis, University of California, Los Angeles, 1985.[23] K. Pan, S.J. Turner, W. Cai, Z. Li, A service oriented HLA RTI on the Grid., in: Procs of Int. Conf. on Web Services (ICWS’07), 2007, pp. 984–992.[24] R.M. Fujimoto, Performance of Time Warp under synthetic workloads, in: Procs of Multiconf. on Distributed Simulation, 1990, pp. 23–28.[25] Y. Liang, S.J. Turner, B.P. Gan, Predictive-conservative synchronization for commercial simulation package interoperability, in: Procs of Conf. on Winter

Simulation (WSC ’08), 2008, pp. 1103–1111.[26] A. Ferscha, J. Johnson, S.J. Turner, Distributed simulation performance data mining, Future Gener. Comput. Syst. 18 (2001) 157–174.[27] S.B. Yoginath, K.S. Perumalla, Empirical evaluation of conservative and optimistic discrete event execution on cloud and VM platforms, in: Procs of

conf. on Principles of advanced discrete simulation(PADS’13), 2013, pp. 201–210.[28] A. Ferscha, G. Chiola, Self-adaptive logical processes: the probabilistic distributed simulation protocol, in: Procs of Annual Simulation Symposium,

1994, pp. 78–88.[29] S. Xu, L.F. McGinnis, Optimistic-conservative synchronization in distributed factory simulation, in: Procs of conf. on Winter simulation (WSC ’06), 2006,

pp. 1069–1074.[30] L.M. Sokol, D.P. Briscoe, A.P. Wieland, MTW: a strategy for scheduling discrete simulation events for concurrent execution, in: Procs of Multiconf. on

Distributed Simulation, 1988, pp. 34–42.[31] D. Ball, S. Hoyt, The adaptive Time-Warp concurrency control algorithm, in: Procs of Multiconf. on Distributed Simulation, 1990, pp. 174–177.[32] H. Rajaei, R. Ayani, L.-E. Thorelli, The local Time Warp approach to parallel simulation, Simulation Digest 23 (1) (1993) 119–126.[33] V. Jha, R.L. Bagrodia, A unified framework for conservative and optimistic distributed simulation, Simulation Digest 24 (1) (1994) 12–19.[34] D. Agrawal, J.R. Agre, Replicated objects in Time Warp simulations, in: Procs of conf. on Winter simulation, 1992, pp. 657–664.[35] Z. Li, W. Cai, S.J. Turner, K. Pan, A three-phases byzantine fault tolerance mechanism for HLA-based simulation, in: Procs of Symp. on Distributed

Simulation and Real-Time Applications (DS-RT’10), 2010.