8
J. Parallel Distrib. Comput. 74 (2014) 3153–3160 Contents lists available at ScienceDirect J. Parallel Distrib. Comput. journal homepage: www.elsevier.com/locate/jpdc Byzantine broadcast with fixed disjoint paths Alexandre Maurer , Sébastien Tixeuil Sorbonne Universités, UPMC University of Paris 06, LIP6 - UMR CNRS 7606, Paris, France highlights We consider the problem of reliably communicating despite Byzantine failures. We propose an algorithm designed for sparse networks. We give a methodology to determine if two nodes communicate reliably. With simulations, we show that our algorithm outperforms previous solutions. article info Article history: Received 9 March 2014 Received in revised form 11 July 2014 Accepted 29 July 2014 Available online 7 August 2014 Keywords: Byzantine failures Multihop networks Asynchronous networks Reliable broadcast Fault tolerance Distributed computing Protocol Random failures abstract We consider the problem of reliably broadcasting a message in a multihop network. We assume that some nodes may be Byzantine, and behave arbitrarily. We focus on cryptography-free solutions. We propose a protocol for sparse networks (such as grids or tori) where the nodes are not aware of their position. Our protocol uses a fixed number of disjoint paths to accept and forward the message to be broadcast. It can be tuned to significantly improve the number of Byzantine nodes tolerated. We present both theoretical analysis and experimental evaluation. © 2014 Elsevier Inc. All rights reserved. 1. Introduction As modern networks grow larger, they become more likely to fail, sometimes in unforeseen ways. Indeed, nodes can be subject to crashes, attacks, transient bit flips, etc. Many failure and attack models have been proposed, but one of the most general is the Byzantine model proposed by Lamport et al. [15]. The model as- sumes that faulty nodes can behave arbitrarily. In this paper, we study the problem of reliable communication in a multihop net- work despite the presence of Byzantine faults. The problem proves difficult since even a single Byzantine node, if not neutralized, can lie to the entire network. A preliminary version of this work was presented at the DISC conference Maurer and Tixeuil [23]. The conference version of the paper only provides a simplified version of the protocol, that corresponds to the (1, H) setting of the current protocol (see Section 2.2). This paper provides a fully rewritten text that generalizes the protocol to any setting (H 1 ,..., H n ), with newly developed theoretical analysis and experimental evaluation. Corresponding author. E-mail address: [email protected] (A. Maurer). 1.1. Related works Many Byzantine-robust protocols are based on cryptography [4,8]: the nodes use digital signatures to authenticate the sender across multiple hops. However, cryptography itself is not uncon- ditionally reliable, as shown by the recent Heartbleed bug [30]. According to the defense in depth paradigm [16], a good strategy for critical systems is to use multiple layers of security controls, including non-cryptographic layers. For instance, if the crypto- graphic security is compromised by a bug or a virus, the non- cryptographic communication layer can be used to safely broadcast a patch or to update cryptographic keys. Another drawback of cryp- tography is that it requires a centralized infrastructure to initially distribute cryptographic keys. Therefore, if this central weak spot fails, the whole network fails. Here, we would like to have a system where any element can fail independently without compro- mising the whole system. In this paper, we thus consider non-cryptographic strategies for reliable communication in the presence of Byzantine faults. Cryptography-free solutions have first been studied in fully con- nected networks [1,15,18,19,25]: a node can directly communicate http://dx.doi.org/10.1016/j.jpdc.2014.07.010 0743-7315/© 2014 Elsevier Inc. All rights reserved.

Byzantine broadcast with fixed disjoint paths

Embed Size (px)

Citation preview

Page 1: Byzantine broadcast with fixed disjoint paths

J. Parallel Distrib. Comput. 74 (2014) 3153–3160

Contents lists available at ScienceDirect

J. Parallel Distrib. Comput.

journal homepage: www.elsevier.com/locate/jpdc

Byzantine broadcast with fixed disjoint paths

Alexandre Maurer ∗, Sébastien TixeuilSorbonne Universités, UPMC University of Paris 06, LIP6 - UMR CNRS 7606, Paris, France

h i g h l i g h t s

• We consider the problem of reliably communicating despite Byzantine failures.• We propose an algorithm designed for sparse networks.• We give a methodology to determine if two nodes communicate reliably.• With simulations, we show that our algorithm outperforms previous solutions.

a r t i c l e i n f o

Article history:Received 9 March 2014Received in revised form11 July 2014Accepted 29 July 2014Available online 7 August 2014

Keywords:Byzantine failuresMultihop networksAsynchronous networksReliable broadcastFault toleranceDistributed computingProtocolRandom failures

a b s t r a c t

Weconsider the problemof reliably broadcasting amessage in amultihop network.We assume that somenodes may be Byzantine, and behave arbitrarily. We focus on cryptography-free solutions.

We propose a protocol for sparse networks (such as grids or tori) where the nodes are not aware oftheir position. Our protocol uses a fixed number of disjoint paths to accept and forward themessage to bebroadcast. It can be tuned to significantly improve the number of Byzantine nodes tolerated. We presentboth theoretical analysis and experimental evaluation.

© 2014 Elsevier Inc. All rights reserved.

1. Introduction

As modern networks grow larger, they become more likely tofail, sometimes in unforeseen ways. Indeed, nodes can be subjectto crashes, attacks, transient bit flips, etc. Many failure and attackmodels have been proposed, but one of the most general is theByzantine model proposed by Lamport et al. [15]. The model as-sumes that faulty nodes can behave arbitrarily. In this paper, westudy the problem of reliable communication in a multihop net-work despite the presence of Byzantine faults. The problem provesdifficult since even a single Byzantine node, if not neutralized, canlie to the entire network.

Apreliminary version of thisworkwas presented at theDISC conferenceMaurerand Tixeuil [23]. The conference version of the paper only provides a simplifiedversion of the protocol, that corresponds to the (1,H) setting of the current protocol(see Section 2.2). This paper provides a fully rewritten text that generalizes theprotocol to any setting (H1, . . . ,Hn), with newly developed theoretical analysis andexperimental evaluation.∗ Corresponding author.

E-mail address: [email protected] (A. Maurer).

http://dx.doi.org/10.1016/j.jpdc.2014.07.0100743-7315/© 2014 Elsevier Inc. All rights reserved.

1.1. Related works

Many Byzantine-robust protocols are based on cryptography[4,8]: the nodes use digital signatures to authenticate the senderacross multiple hops. However, cryptography itself is not uncon-ditionally reliable, as shown by the recent Heartbleed bug [30].According to the defense in depth paradigm [16], a good strategyfor critical systems is to use multiple layers of security controls,including non-cryptographic layers. For instance, if the crypto-graphic security is compromised by a bug or a virus, the non-cryptographic communication layer canbeused to safely broadcasta patch or to update cryptographic keys. Another drawback of cryp-tography is that it requires a centralized infrastructure to initiallydistribute cryptographic keys. Therefore, if this central weak spotfails, thewhole network fails. Here, wewould like to have a systemwhere any element can fail independently without compro-mising the whole system. In this paper, we thus considernon-cryptographic strategies for reliable communication in thepresence of Byzantine faults.

Cryptography-free solutions have first been studied in fully con-nected networks [1,15,18,19,25]: a node can directly communicate

Page 2: Byzantine broadcast with fixed disjoint paths

3154 A. Maurer, S. Tixeuil / J. Parallel Distrib. Comput. 74 (2014) 3153–3160

with any other node, which implies the presence of a channel be-tween each pair of nodes. Therefore, these approaches are hardlyscalable, as the number of channels per node can be physically lim-ited. We thus study solutions in multihop networks, where a nodemust rely on other nodes to broadcast messages.

A notable class of algorithms assumes restrictions on the con-sequences of Byzantine failures either in space [21,26,29] (nodesfar away from Byzantine nodes are not impacted by their behav-ior) or in time [10,9,11,12,20] (a Byzantine node executes onlya limited number of malicious actions before being ignored bycorrect nodes). Space-local solutions are only applicable to prob-lemswhere the information fromdistant nodes is unimportant (forexample, vertex coloring, link coloring, or dining philosophers).Time-local solutions presented so far can tolerate at most oneByzantine node in the entire network, and are not able to mask theeffect of Byzantine actions (that is, all correct nodes may be per-turbed by a Byzantine node at least once). Thus, this approach isnot applicable to reliable broadcast.

We now present the solutions with the same setting as ourcontribution: a multihop network where each node has a uniqueidentifier, and where cryptography is not allowed. They tolerateeither a certain number or a certain density of Byzantine failures.

Let us start with the solutions that tolerate a certain number ofByzantine failures. It was shown that, for reliable broadcast in thepresence of k Byzantine nodes, it is necessary and sufficient thatthe network is (2k+ 1)-connected [7]. This first solution assumesthat every node knows the entire topology, and that the schedulingis synchronous. Both requirements have been relaxed in [27]: thetopology is unknown and the scheduling is asynchronous. How-ever, in sparse topologies such as grid-shaped networks (wherethe connectivity is 4), both approaches tolerate only one Byzantinenode, independently of the size of the grid.

Now, we present the solutions that tolerate a certain density ofByzantine failures. In dense network (where each node has a largenumber of neighbors), this density is represented by the fractionof Byzantine neighbors per node. Broadcast protocols have beenproposed for nodes organized on a lattice [2,14]. However, thesesolutions require much more than 4 neighbors per node to enablereliable broadcast. These results were later generalized to othertopologies [28], assuming that each node knows the global topol-ogy. A recent paper [17] proved the optimality of this solution forthis setting. However, this approach cannot be applied to sparsernetworks. For instance, in a grid network, only the 8 nodes sur-rounding the source may accept its message.

All aforementioned results assume a large connectivity or nodedegree. Therefore, tolerating more Byzantine failures requires toincrease the number of channels per node, which limits scalability.To overcome this problem, a probabilistic approach has been pro-posed in [22–24]. In this setting, the distribution of Byzantine fail-ures is uniformly random. This hypothesis is realistic if we considerthat each node has a given probability to fail or to be corrupted byan adversary [3,5,6]. We can also consider distributed hash tablesused in the construction of overlay networks, where the identifierof a node joining the network is attributed randomly (therefore, itslocation in the overlay is random). With these assumptions, thosesolutions can tolerate a large number of Byzantine failures [22]with a high communication probability. This approach has beengeneralized to tolerate a constant rate of Byzantine nodes in an un-bounded network, despite a bounded node degree [24]. However,both solutions require a global view of the network: each nodemust know its position in the communication graph. This stronghypothesis is difficult or impossible to satisfy inmany types of net-works, such as self-organizedwireless sensor networks or peer-to-peer overlays.

1.2. Our contribution

In this paper, we consider the case of sparsely connected net-works where the nodes do not know their position. We propose

a new protocol that both contains and outperforms previous solu-tions. Our algorithmuses a fixed number of disjoint paths to acceptand forwardmessages. For instance, when a communication prob-ability of 0.99 is required on a 50 × 50 torus, we can tolerate a 40times more Byzantine nodes than previous solutions.

The paper is organized as follows. In Section 2, we describe theprinciple of our protocol and give the algorithm executed by eachcorrect node. In Section 3, we explain how to determine whichnodes always accept correct messages, and only correct messages.In Section 4, we use this method to evaluate and compare the per-formances of our protocol with simulations.

2. Description of the protocol

2.1. Hypotheses

Let (V , E) be a non-oriented graph representing the topologyof the network. V denotes the nodes of the network. E denotesthe neighborhood relationships. A node can only send messages toits neighbors. Some nodes are correct and follow the protocol de-scribed further. We consider that all other nodes are totally unpre-dictable (or Byzantine) and have an arbitrary behavior.

We assume that any message sent is eventually received, andthat in an infinite execution, any process is activated infinitely of-ten. However, we make no hypothesis on synchronicity: the pro-cesses can be activated in any order. Finally, we consider the oralmodel: each nodehas a unique identifier, andwhen anode receivesa message from a neighbor p, it knows that p is the author of themessage. Therefore, a Byzantinenode cannot forge its own identity.

2.2. Informal description

First, we present an informal description of the problem andof our protocol. Each correct node s wants to broadcast a messages.m0 to the rest of the network. In the ideal case, s sends s.m0 to itsneighbors, which in turn transmit s.m0 to their own neighbors—and so forth, until every node receives s.m0. We call this an unse-cured broadcast.

In our setting however, some nodes can be Byzantine andbroadcast false messages (i.e. messages that are not sent by a cor-rect node) in the network. In the following, we say that a correctnode accepts m from s when it definitively considers that m is themessage broadcast by p. We say that m is correct if m = s.m0, andfalse otherwise. Our objective is to propose a broadcast protocolthat maximizes the number of nodes accepting the correct mes-sages.

To limit the diffusion of false messages, we introduce the fol-lowing mechanism. First, s.m0 is directly accepted and retransmit-ted by the neighbors of s. Then, to accept a message, the othercorrect nodes must receive confirmations from several distinctnodes, through a fixed number of disjoint paths. For instance, inFig. 1, the right node accepts a message if and only if it is receivedthrough 3 disjoint paths of at most H1 = 3 (resp H2 = 4 andH3 = 2) hops. The same requirement stands for every correct node.Once themessage is accepted, the node retransmits it formore dis-tant nodes, and the same principle is repeated over and over.

This specific setting of the protocol canbedescribedby the tuple(H1,H2,H3) = (3, 4, 2). More generally, a setting of the protocol isdescribed by a tuple (H1, . . . ,Hn), each Hi being a positive integer.The integer n (not to confuse with the number of nodes) and thevalues Hi are assigned arbitrarily: we do not know a priori theirimpact on the global performances, which is studied further inSection 4.

The underlying idea is as follows: if the Byzantine nodes aresufficiently spaced, they cannot cooperate to make a correct nodeaccept a falsemessage. Indeed,with setting (3, 4, 2), a correct nodecan accept the first false message only if there exists 3 distinctByzantine nodes distant of at most 3 (resp. 4 and 2) hops. Thiscritical case is illustrated in Fig. 2.

Page 3: Byzantine broadcast with fixed disjoint paths

A. Maurer, S. Tixeuil / J. Parallel Distrib. Comput. 74 (2014) 3153–3160 3155

Fig. 1. Principle of the protocol: correct case.

Fig. 2. Principle of the protocol: critical case.

However, if, for instance, the third Byzantine node is located atmore than 2 hops (e.g. 3 hops), the false message is never accepted.This is illustrated in Fig. 3. This intuitive idea is demonstratedfurther in Theorem 1.

Note that, even if this principle is designed for sparse networks,our approach is also valid in denser networks. However, as pre-vious solutions are less effective in sparse networks, our schemeclearly outperforms previous solutions in this case (see Section 4).

2.3. Preliminaries

The setting of the protocol is described by an n-tuple of integers(H1, . . . ,Hn), with 0 ≤ H1 ≤ · · · ≤ Hn, known by all correctnodes. These values, should be considered as an inherent part of theprotocol: they are hard-coded with the rest of the algorithm, andare not to be learned by correct nodes. The problem of the choice ofthe parameters (H1, . . . ,Hn) is discussed further in Section 4. Notethat previous solutions [2,14,28,7,27] also have fixed parameters.Let H = maxi∈1,...,n Hi.

The correct nodes can send and receive tuples of the form(s,m, Ω), where m is the message broadcast by s (or pretendingto be it) and Ω is a set containing the identifiers of nodes alreadyvisited by the message. This set is used to certify that the pathsare actually disjoint. The Byzantine nodes can, of course, forge andforward any message of the form (s,m, Ω).

Each correct node p holds a dynamic set p.Rec , where the tuples(s,m, Ω) received are recorded. We say that a node multicasts atuple (s,m, Ω) when it sends it to all its neighbors.

2.4. Our protocol

Each correct node p initially multicasts (p, p.m0, ø), then exe-cutes the following algorithm:• When a tuple (s,m, Ω) is received from a neighbor q:

– If q = s:∗ Acceptm from s and multicast (s,m, ø).

– If q ∈ Ω and card(Ω) < H:∗ Add (s,m, Ω ∪ q) to p.Rec .∗ Multicast (s,m, Ω ∪ q).

• When there exists s, m and (Ω1, . . . , Ωn) such that:1. ∀i ∈ 1, . . . , n, we both have (s,m, Ωi) ∈ p.Rec and

card(Ωi) ≤ Hi2. ∀i, j ⊆ 1, . . . , n, Ωi ∩Ωj = ø.Then, acceptm from s and multicast (s,m, ø).

Fig. 3. Principle of the protocol: safe case.

3. Protocol properties

In this section, we give a methodology to characterize the pairsof nodes that always communicate reliably, for a given placementof Byzantine nodes.

The section is organized as follows. In 3.1, we explain why thismethodology is necessary to correctly performworst case analysisof the protocol. In 3.2, we give the condition for safety, that is: nocorrect node can accept a false message. In 3.3, we give a method-ology to characterize which pairs of correct nodes communicatereliably. In 3.4, we show the tightness of our conditions. In 3.5, weshow a linear message complexity.

3.1. Motivation

To evaluate the performances of our protocol, a natural idea isto directly simulate it. However, in the presence of Byzantine fail-ures, things are not that obvious. Indeed, simulating the protocolwould imply tomake restrictive hypotheses on the order of activa-tion of nodes, the order of reception of messages and the behaviorof Byzantine nodes. Thiswould considerablyweaken the adversarymodel, as nothing guarantees that we encompass the worst possi-ble cases. In particular, as the behavior of faulty nodes is restricted(thus not totally arbitrary), these nodes cannot be called Byzantineanymore.

Therefore, instead of simulating the protocol, we provide a de-terministic technique that, for a given correct node s and for a givenset of Byzantine nodes, returns a set of correct nodes that alwaysaccept the correct message from s, independently of the order ofexecution and of the behavior of Byzantine nodes (see Theorem 2).Thus,with this technique,we can evaluate our protocol in Section 4without adding restrictive hypotheses.

Note that this technique is not to be computed by correct nodes(which are not aware of the position of Byzantine nodes). It makesuse of an omniscient external view of the network. Also, we ob-tain worst case results: whenever the adversary is slightly weaker,performance can only improve.

3.2. Safety

We give a condition on the placement of Byzantine nodes toensure that no correct node ever accepts a false message. The fol-lowing theorem is the demonstration of the intuitive idea exposedin 2.2, and partially shows the correctness of our algorithm. Notethat this condition is not sufficient to ensure that the correct nodesactually accept the good messages: this aspect is studied furtherin 3.3.

First, let us present some preliminary definitions.

Definition 1 (Path).AnN-hops path is a sequence of distinct nodes(p0, . . . , pN) such that, ∀i ∈ 1, . . . ,N, pi and pi−1 are neighbors.We say that this path connects p0 and pN . This path is correct if allits nodes are correct.

Definition 2 (Disjoint paths). Two paths (p0, . . . , pN) and (p′0, . . . ,p′N) are disjoint if p1, . . . , pN−1 ∩ p′1, . . . , p′N−1 = ø.

Page 4: Byzantine broadcast with fixed disjoint paths

3156 A. Maurer, S. Tixeuil / J. Parallel Distrib. Comput. 74 (2014) 3153–3160

Note that these paths are internally-disjoint, but not necessarilynode-disjoint, as the first (resp. last) nodes can be identical. We saythat n paths (X1, . . . , Xn) are disjoint if, ∀i, j ⊆ 1, . . . , n, Xiand Xj are disjoint.

Theorem 1 (Safety). For a given correct node u, let Critical(u) bethe following proposition: there exist at least n distinct Byzantinenodes (b1, . . . , bn) and n disjoint paths (X1, . . . , Xn) such that, ∀i ∈1, . . . , n, Xi is a path of at most Hi hops connecting u and bi.

If, for every correct node u, Critical(u) is false, then no correct nodeever accepts a false message.

Proof. The proof is by contradiction. Let us suppose the opposite:for every correct node u, Critical(u) is false, yet at least one correctnode accepts a false message. Let s be a correct node, and let v bethe first correct node to accept a message m = s.m0 from s. In thefollowing, we show that Critical(v) is necessarily true, contradict-ing the previous statement. This contradiction proves the result.

As v is correct and accepts m from s, according to the protocol,there exists (Ω1, . . . , Ωn) such that, ∀i ∈ 1, . . . , n, (s,m, Ωi) ∈v.Rec and card(Ωi) ≤ Hi.

Consider now a given index i ∈ 1, . . . , n, and let q0 = v. LetP i

k be the following proposition: there exists a path (q1, . . . , qk),with q1, . . . , qk ⊆ Ωi, such that qk−1 received (s,m, Ωi − q1,. . . , qk) from qk ∈ Ωi − q1, . . . , qk−1. In our notations, q1, . . . ,qk−1 = ø for k = 1.

• First, we show that P i1 is true. According to the protocol, the

statement (s,m, Ωi) ∈ v.Rec implies that v received (s,m, Ωi−

q1) from a node q1 ∈ Ωi. It is actually possible, as card(Ωi −

q1) ≤ Hi − 1 < H . So P i1 is true.

• Now, let us suppose that P ik is true, for k < card(Ωi). So qk sent

(s,m, Ωi − q1, . . . , qk) to qk−1. Let us suppose that qk is cor-rect. Then, according to the protocol, it implies that qk received(s,m, Ωi−q1, . . . , qk+1) fromanode qk+1 ∈ Ωi−q1, . . . , qk.It is actually possible, as card(Ωi − q1, . . . , qk+1) ≤ Hi − k−1 < H . So either P i

k+1 is true or qk is Byzantine.

Therefore, by induction:

• Either there exists an index k ∈ 1, . . . , card(Ωi) − 1 and apath (q1, . . . , qk) such that qk is Byzantine. Let bi = qk, and letXi be the path (v, q1, . . . , qk).• Or P i

card(Ωi)is true, and qcard(Ωi) sent (s,m, Ωi − q1, . . . ,

qcard(Ωi)) = (s,m, ø) to qcard(Ωi−1). The node qcard(Ωi) cannot bes, as m = s.m0. Also, it cannot be a correct node, as v is thefirst correct node to accept m from s. So qcard(Ωi) is necessarilyByzantine. Let bi = qcard(Ωi), and let Xi be the path (v, q1, . . . ,qcard(Ωi)).

In both cases, the path Xi connects v and bi with at mostcard(Ωi) ≤ Hi hops.

According toDefinition 2, for i, j ⊆ 1, . . . , n, the pathXi andXj are disjoint if (Ωi−bi)∩(Ωj−bj) = ø. As v is correct and ac-ceptsm from s, according to the protocol: ∀i, j ⊆ 1, . . . , n,Ωi∩

Ωj = ø. Therefore, the paths (X1, . . . , Xn) are disjoint, and thenodes (b1, . . . , bn) are distinct. Thus, Critical(v) is true. This con-tradiction achieves the proof.

Note that Critical(u) cannot be true if there are less than ndistinct Byzantine nodes. Thus, a simple sufficient condition forsafety is that the number of Byzantine nodes is inferior or equalto n− 1.

3.3. Reliability

Here, we suppose that the condition of Theorem 1 is satisfied:no false message can be accepted by a correct node. We now con-sider a given correct node s, and give amethodology to characterizea set of nodes reliable for s.

Definition 3 (Reliability). For a given correct node s and a givenset of Byzantine nodes, a correct node is reliable for s if it alwayseventually accepts s.m0 from s, in any possible execution.

The following theorem enables to construct a set of nodes reli-able for s step by step: for a given set R of nodes reliable for s, anda given correct node v, Theorem 2 tells us if v is also reliable for s.

Thus, we proceed as follows:

1. To initialize the construction of R, we take s and its correctneighbors, which are reliable for s according to the protocol.

2. We test every correct node v until the condition of Theorem 2is satisfied, then add this node v to the set (R← R ∪ v).

3. We repeat the process until we obtain the largest possible setof nodes reliable for s.

An implicit condition for starting the construction of R is, ofcourse, that s has at least n correct neighbors. Besides, we do notclaim that this construction is always possible.

Theorem 2 (Reliability). Let s be a correct node, and let us supposethat the condition of Theorem 1 is satisfied (for every correct node u,Critical(u) is false). Let R be a set of nodes reliable for s, and let v ∈ Rbe a correct node. If there exist at least n distinct nodes r1, . . . , rn ⊆R and n disjoint correct paths (X1, . . . , Xn) such that, ∀i ∈ 1, . . . ,n, Xi is a path of at most Hi hops connecting v and ri, then v is alsoreliable for s.

Proof. According to Theorem 1, no correct node can accept a falsemessage. So, if a correct node accepts a message, this is necessarilya correct message.

We consider a given index i ∈ 1, . . . , n. Let Xi = (q0, . . . ,qM), with q0 = ri and qM = v. By definition, we haveM ≤ Hi ≤ H .Let us prove the following property P i

k by induction, ∀k ∈ 1, . . . ,M: the node qk eventually receives (s, s.m0, q0, . . . , qk−2) fromqk−1. In our notations, q0, . . . , qk−2 = ø for k = 1.

• First, we show that P i1 is true. As R is set of nodes reliable for

s, according to Definition 3, the node ri ∈ R eventually acceptss.m0 from s. According to the protocol, it implies that ri alsomul-ticasts (s, s.m0, ø). So q1 eventually receives (s, s.m0, ø) fromq0 = ri, and P i

1 is true.• Now, let us suppose that P i

k is true for k ≤ M . As qk−1 ∈ q0,. . . , qk−2, and card(q0, . . . , qk−2) < M ≤ H , qk eventuallymulticasts q0, . . . , qk−1. So qk+1 eventually receives q0, . . . ,qk−1 from qk, and P i

k+1 is true.

So P iM is true and the node qM = v eventually receives (s, s.m0,

q0, . . . , qM−2) from qM−1. As qM−1 ∈ q0, . . . , qM−2 and card(q0, . . . , qM−2) < M ≤ H , v eventually adds (s, s.m0, q0, . . . ,qM−1) to the set v.Rec . Let Ωi = q0, . . . , qM−1.

So, ∀i ∈ 1, . . . , n, we have (s, s.m0, Ωi) ∈ v.Rec and card(Ωi)< Hi. Besides, as the paths (X1, . . . , Xn) are disjoint, according toDefinition 2,∀i, j ⊆ 1, . . . , n, we have (Ωi−ri)∩(Ωj−rj) =ø. Thus, as the nodes (r1, . . . , rn) are distinct, we haveΩi∩Ωj = ø.Therefore, according to the protocol, v eventually accepts s.m0.Thus, v is reliable for s.

3.4. Bounds tightness

We now show that the condition for safety (Theorem 1) istight, and that the methodology to characterize the reliable nodes(Theorem 2) is optimal in a safe network.

Theorem 3 (Bounds Tightness for Theorem 1). If the condition ofTheorem 1 is not satisfied, it is impossible to guarantee that the net-work is safe.

Page 5: Byzantine broadcast with fixed disjoint paths

A. Maurer, S. Tixeuil / J. Parallel Distrib. Comput. 74 (2014) 3153–3160 3157

Fig. 4. A 7× 7 torus network.

Proof. Let us suppose the opposite: the condition of Theorem 1 isnot satisfied, yet the network is safe, that is: no correct node canaccept a false message. As the condition of Theorem 1 is not satis-fied, there exists at least one correct node u such that Critical(u)is true. In other words, there exist at least n distinct Byzantinenodes (b1, . . . , bn) and n disjoint paths (X1, . . . , Xn) such that,∀i ∈ 1, . . . , n,Xi is a path of at mostHi hops connecting u and bi.

Thus, it is possible that the Byzantine nodes (b1, . . . , bn) unan-imously multicast (s,m′, ø), with m′ = m. If so, with a reasoningsimilar to the proof of Theorem 2, we show that u eventually ac-cepts m′ from s. Therefore, the network cannot be safe. This con-tradiction completes the proof.

Theorem 4 (Bounds Tightness for Theorem 2). Suppose that thenetwork is safe, and let s be a correct node. Then, the set constructedwith Theorem 2 contains all the nodes reliable for s.

Proof. Let us suppose the opposite: the set R constructed withTheorem 2 does not contain all the nodes reliable for z. Let R′ bethe set of nodes reliable for s.

Let there be an execution where all the nodes of R accept s.m0from s, but no other correct node accepted s.m0 from s so far. Suchan execution is possible, as the construction of R with Theorem 2does not require that any node u ∈ R accepts s.m0 from s.

Let v be the first node ofR′−R to accept s.m0 from s in the fol-lowing of the execution. Then, by a reasoning similar to the proof ofTheorem 1, we show that there must exist at least n distinct nodes(u1, . . . , un) that have previously accepted s.m0 from s, and n dis-joint correct paths (X1, . . . , Xn) such that, ∀i ∈ 1, . . . , n, Xi isa path of at most Hi hops connecting v and ri.

As the only correct nodes that have previously accepted s.m0from s are the nodes of R, we have u1, . . . , un ⊆ R, and the con-dition of Theorem 2 is satisfied for R and v. So v could actually beadded to R, and R is not the largest set of nodes reliable for s thatcan be constructed with Theorem 2. This contradiction completesthe proof.

3.5. Message complexity

We now evaluate themessage complexity of our protocol—thatis, the number of tuples (s,m, Ω) sent by the correct nodes. Weonly consider the casewhere all nodes are correct, as the Byzantinenodes can send as many messages as they want.

Let |V | be the number of nodes, and let∆ be themaximal degreeof the network—that is, themaximal number of neighbors for a sin-gle node. Let s be a correct node. According to our protocol, when

a node v accepts s.m0 from s, it sends (s, s.m0, ø) to each neighbor,which makes at most ∆ messages. Then, each neighbor of v mul-ticasts (s, s.m0, v), which makes at most ∆+∆2 messages. Thisprocess is repeated H times, which makes at most ∆+∆2

+ · · · +

∆H= O(∆H) messages. So, O(∆H

|V |) messages related to s aresent by the protocol, which makes O(∆2H

|V |2) messages in total.Therefore, if we consider that the degree ∆ and protocol pa-

rameter H are bounded (for instance, in a torus network, where∆ = 4), the message complexity is O(|V |2): the same as an unse-cured broadcast protocol (see 2.2).

4. Evaluation of the protocol

In this section, we perform simulations to compare the perfor-mance of different settings of our protocol. We detail our motiva-tions, then describe the methodology and comment on the results.Finally, we provide a quantitative comparison with previous solu-tions, and show the improvement.

4.1. Motivation

We would like to evaluate the performances of different set-tings (H1, . . . ,Hn) of our protocol, and also compare them withother existing protocols.

As our protocol is designed for loosely connected networks,we choose torus networks for simulations, where each node has4 neighbors. Very few Byzantine-robust protocols exist for suchsparse networks.

In order to quantify these performances, we assume a randomuniform distribution of Byzantine failures. Our metric is the com-munication probability, that is: for two randomly chosen correctnodes p and q, the probability that q is reliable for p. We choosethis probabilistic model for two reasons:

• As stated in the introduction, it is difficult or impossible toachieve perfect reliable broadcast in sparse networks. There-fore, we give up on deterministic guarantees, and aim atprobabilistic guarantees—for instance, to achieve a 0.99 com-munication probability.• This randommodel has realistic applications: in a network, each

component has a weak yet positive probability to misbehave.Besides, some Byzantine agents can join a peer-to-peer overlayat anymoment, or be introduced in a collection of wireless sen-sors before their deployment on the field.

4.2. Methodology

We perform our evaluation on torus networks (see Definition 4and Fig. 4). This choice is motivated by the simplicity and theregularity of these topologies. They also have many realistic appli-cations, such as large-scale computation grids for industrial simu-lations [13].

Definition 4 (Torus). An N × N torus network is a network suchthat: (i) each node has a unique identifier (i, j), with 1 ≤ i ≤ Nand 1 ≤ j ≤ N , and (ii) two nodes (i1, j1) and (i2, j2) are neighborsif and only if one of these two conditions is satisfied:

1. i1 = i2 and |j1 − j2| = 1 or N2. j1 = j2 and |i1 − i2| = 1 or N .

These topological identifiers (i, j) are not to be confused withthe node identifiers used in the protocol: according to our hypothe-ses, the nodes do not know their position in the network.

We assume a uniform distribution of Byzantine failures: eachnode has the same probability λ to be Byzantine.We call this prob-ability Byzantine rate. We want to evaluate, for a given Byzantinerate λ, the communication probability P(λ).

Page 6: Byzantine broadcast with fixed disjoint paths

3158 A. Maurer, S. Tixeuil / J. Parallel Distrib. Comput. 74 (2014) 3153–3160

Fig. 5. Simulation results on a 10× 10 torus.

For this purpose, we use a Monte-Carlo method:

1. We generate several random distributions of Byzantine nodes,with the Byzantine rate λ.

2. For each distribution, we randomly choose two correct nodes sand v. Then, we use Theorems 1 and 2 to characterize the nodesreliable for s (see Definition 3). If v is reliable for s, it necessaryaccepts the good message, and the simulation is a success. Else,it is a failure.

3. With a large number of simulations, the fraction of successesapproximates P(λ).

4.3. Settings

Now, let us choose a reasonable set of settings (H1, . . . ,Hn) forour simulations, and explain this choice.

First, we exclude the case n = 1, which corresponds to an un-secured broadcast (see 2.2). Besides, any setting with n > 4 wouldnot work: n disjoint paths are required, and a node has only 4neighbors. Therefore, we restrain ourselves to n ∈ 2, 3, 4.

Then, we introduce the notion ofminimal setting.

Definition 5 (Minimal Setting). For a given network, and for n ≥ 1,a setting (H1, . . . ,Hn) is:

Smaller than a setting (H ′1, . . . ,H′n) if, ∀i ∈ 1, . . . , n, Hi ≤ H ′i ,

and there exists j ∈ 1, . . . , n such that Hj < H ′j .Covering if, when there are no Byzantine nodes, all nodes are

reliable (see Definition 3).Minimal if no smaller setting is covering.

We therefore choose the minimal settings for a torus network,for n ∈ 2, 3, 4:

• Setting A: (1, 2)• Setting B: (1, 2, 5)• Setting C: (1, 3, 3)• Setting D: (1, 2, 5, 5).

Fig. 6. Simulation results on a 50× 50 torus.

This choice is motivated by the two following reasons:1. Any smaller setting is not covering, and therefore does not

enable reliable broadcast.2. Any greater setting requires longer paths. This would only

increase the probability to have Byzantine nodes on the paths,and reduce the safety probability without any compensation.

Of course, we do not claim that this preliminary choice is opti-mal. Rigorously determining the optimal settings remains a chal-lenging open problem (see Section 5).

4.4. Results

The simulation results are presented in Figs. 5 and 6. Fig. 5corresponds to a 10 × 10 torus, and Fig. 6 corresponds to a 50 ×50 torus. For each case, we represented the probability that thenetwork is safe (no correct node can accept a false message), thenthe communicationprobability P(λ). Note that the scale is differentfor Figs. 5 and 6.

First, let us comment on the probability that the network is safe,according to Theorem 1.• This probability increases with the complexity of the setting (2

paths for setting A, 3 paths for settings B and C , 4 paths forsetting D). Indeed, as we use more paths, it becomes less likelyto have a critical placement of Byzantine nodes (see Fig. 2).• This probability decreases with the size of the network. Indeed,

for a given Byzantine rate, the frequency of critical placementsincreases with the size of the network. Besides, the disparitiesbetween the different settings also increase with the size: theresults on Fig. 6 are more dispersed than on Fig. 5.

Now, let us comment on the communication probability.• There seems to exist an optimal setting of the protocol. Indeed,

increasing the number of paths makes the safety conditions ofTheorem1 easier to satisfy, but the reliability conditions of The-orem 2 harder to satisfy. Therefore, there is a compromise tofind. This is illustrated in Fig. 5, where setting D (the most com-plex setting) offers the best safety probability, but also theworstcommunication probability.

Page 7: Byzantine broadcast with fixed disjoint paths

A. Maurer, S. Tixeuil / J. Parallel Distrib. Comput. 74 (2014) 3153–3160 3159

Fig. 7. Comparison of existing protocols on a logarithmic scale (50× 50 torus).

Fig. 8. Hexagonal torus obtained after channels removal on a torus.

Fig. 9. Simulations results on a 10× 10 hexagonal torus.

• Besides, the size of the network impacts this optimum. Indeed,as in Fig. 6, setting D now offers the best communication prob-ability for the Byzantine rates λ > 0.005. However, this is notthe case for smaller Byzantine rates.Therefore, setting C offers the best performances in both net-

works. We use this setting for the comparison with previous solu-tions.

4.5. Comparison with previous solutions

Finally, we provide a quantitative comparison with previoussolutions.

According to our hypotheses, the nodes are not aware of theirposition on thenetwork. Therefore, our protocolmust be comparedwith other protocols that still work despite this constraint and thelow degree of the network. This is only the case of protocol [27].

Let us suppose that we want to guarantee a communicationprobability P(λ) of at least 0.99 on a 50 × 50 torus. Then, we cantolerate a Byzantine rate λ of:– 4× 10−6 with an unsecured broadcast (see 2.2)– 5× 10−5 with protocol [27]– 2× 10−3 with our protocol (improvement of factor 40).

These performances are to be compared with protocol [22],which only works when the nodes know their position in thenetwork. With this protocol [22], we can tolerate a Byzantine rateof 8 × 10−3. As we can see, assuming that the nodes do not knowtheir position still has a cost in terms of performances, and fillingthis gap remain a challenging open problem.

All these solutions are represented in Fig. 7, on a logarithmicscale.

4.6. Application to other topologies: the hexagonal torus

In this section, we discuss the possibility to apply our schemeto other topologies. As an example, we consider the case of thehexagonal torus.

Definition 6 (Hexagonal Torus). AnN×N hexagonal torus is anN×N square torus with several channels removed: ∀(i, j) ∈ 1, . . . ,N−1×1, . . . ,N, if i+ j is odd, we remove the channel between(i, j) and (i+ 1, j).

Therefore, in this topology, each node has 3 neighbors insteadof 4. This is illustrated in Fig. 8.

The minimal settings on this topology (following the method-ology used in previous sections) are the following:

• Setting E: (1, 3)• Setting F : (2, 2)• Setting G: (1, 3, 7)• Setting H: (2, 2, 10)• Setting I: (2, 6, 6).

We performed simulations on a 10 × 10 hexagonal torus. Theresults are represented in Fig. 9. Setting E and F gives exactly thesame results, thus we represented it on the same plot.

We notice that the setting has very few influence here. As pre-viously, the settings with 3 parameters ensure a better safety than

Page 8: Byzantine broadcast with fixed disjoint paths

3160 A. Maurer, S. Tixeuil / J. Parallel Distrib. Comput. 74 (2014) 3153–3160

those with only 2 parameters. Yet, the settings with 2 parametersensure a better communication probability. Indeed, when the set-ting has 3 parameters, the nodes with Byzantine neighbors cannotaccept the correct message, as they only have 2 correct neighbors.

We also plotted the performances of both unsecured broadcastand solution [27]. To achieve a communication probability of 0.99,our protocol can tolerate a 1.2× 10−3 Byzantine rate on a 10× 10hexagonal torus (versus 5×10−3 on a 10×10 torus). Thus, the re-moval of channels seems to have a negative impact on the perfor-mances. However,we still tolerate a Byzantine rate 10 times higherthan solution [27].

5. Conclusion

In this paper, we proposed a parameterizable approach forByzantine Broadcast in sparsely connected networks, with a min-imal number of hypotheses. We showed that, by properly tuningthe parameters of the protocol, we can optimize the fault-tolerancein the presence of randomly distributed Byzantine failures, andoutperform previous solutions in the same setting. However, theabsence of position knowledge still has a strong impact on Byzan-tine tolerance.

To go further, it would be interesting to experiment this ap-proach on less regular networks, such as sensor networks, robotnetworks or peer-to-peer overlays, where the lack of global po-sitioning is a common assumption. Also, a motivating open chal-lenge is to obtain theoretical probabilistic guarantees with globalnetwork parameters (diameter, node degree, connectivity. . . ) in or-der to automatically compute the optimal parameter settings.

References

[1] H. Attiya, J. Welch, Distributed Computing: Fundamentals, Simulations, andAdvanced Topics, McGraw-Hill Publishing Company, New York, 1998, 6.

[2] Vartika Bhandari, Nitin H. Vaidya, On reliable broadcast in a radio network,in: Marcos Kawazoe Aguilera, James Aspnes (Eds.), PODC, ACM, 2005,pp. 138–147.

[3] D. Bienstock, Broadcasting with random faults, Discrete Appl. Math. 20 (1988)1–7.

[4] Miguel Castro, Barbara Liskov, Practical Byzantine fault tolerance, in: OSDI,1999, pp. 173–186.

[5] Bogdan S. Chlebus, Krzysztof Diks, Andrezej Pelc, Sparse networks supportingefficient reliable broacasting, Nordic J. Comput. 1 (1994) 332–345.

[6] Krzysztof Diks, Andrezej Pelc, Reliable gossip schemes with random linkfailures, in: Proc. 28th Ann. Allerton Conf. on Comm. Control and Comp., 1990,pp. 978–987.

[7] D. Dolev, The Byzantine generals strike again, J. Algorithms 3 (1) (1982) 14–30.[8] Vadim Drabkin, Roy Friedman, Marc Segal, Efficient Byzantine broadcast in

wireless ad-hoc networks, in: DSN, IEEE Computer Society, 2005, pp. 160–169.[9] Swan Dubois, ToshimitsuMasuzawa, Sébastien Tixeuil, On Byzantine contain-

ment properties of the min+1 protocol, in: Proceedings of SSS 2010, in: Lec-ture Notes in Computer Science, Springer, Berlin, Heidelberg, September 2010,New York, NY, USA.

[10] SwanDubois, ToshimitsuMasuzawa, Sébastien Tixeuil, The impact of topologyon Byzantine containment in stabilization, in: Proceedings of DISC 2010,in: LectureNotes in Computer Science, Springer, Berlin, Heidelberg, September2010, Boston, Massachusetts, USA.

[11] Swan Dubois, Toshimitsu Masuzawa, Sébastien Tixeuil, Bounding the impactof unbounded attacks in stabilization, IEEE Trans. Parallel Distrib. Syst. (TPDS)(2011).

[12] Swan Dubois, Toshimitsu Masuzawa, Sébastien Tixeuil, Maximum metricspanning tree made Byzantine tolerant, in: David Peleg (Ed.), Proceedings ofDISC 2011, in: Lecture Notes in Computer Science (LNCS), Springer, Berlin,Heidelberg, September 2011, Rome, Italy.

[13] Ian T. Foster, Carl Kesselman, Steven Tuecke, The anatomy of the grid: enablingscalable virtual organizations, Int. J. High Perform. Comput. Appl. 15 (3) (2001)200–222.

[14] Chiu-Yuen Koo, Broadcast in radio networks tolerating Byzantine adversarialbehavior, in: Soma Chaudhuri, Shay Kutten (Eds.), PODC, ACM, 2004,pp. 275–282.

[15] Leslie Lamport, Robert E. Shostak, Marshall C. Pease, The Byzantine generalsproblem, ACM Trans. Program. Lang. Syst. 4 (3) (1982) 382–401.

[16] R. Lippmann, K. Ingols, C. Scott, K. Piwowarski, Validating and restoringdefense in depth using attack graphs, in: IEEE Military CommunicationsConference, 2006.

[17] Chris Litsas, Aris Pagourtzis, Giorgos Panagiotakos, Dimitris Sakavalas, On theresilience and uniqueness of CPA for secure broadcast, http://eprint.iacr.org/2013/738.pdf.

[18] D. Malkhi, Y. Mansour, M.K. Reiter, Diffusion without false rumors: onpropagating updates in a Byzantine environment, Theoret. Comput. Sci. 299(1–3) (2003) 289–306.

[19] D. Malkhi, M. Reiter, O. Rodeh, Y. Sella, Efficient update diffusion in Byzantineenvironments, in: The 20th IEEE Symposium on Reliable Distributed Systems,(SRDS’01), IEEE, Washington, Brussels, Tokyo, 2001, pp. 90–98.

[20] Toshimitsu Masuzawa, Sébastien Tixeuil, Bounding the impact of unboundedattacks in stabilization, in: Ajoy Kumar Datta, Maria Gradinariu (Eds.), SSS,in: Lecture Notes in Computer Science, vol. 4280, Springer, 2006, pp. 440–453.

[21] Toshimitsu Masuzawa, Sébastien Tixeuil, Stabilizing link-coloration of arbi-trary networks with unbounded Byzantine faults, Int. J. Princ. Appl. Inf. Sci.Technol. (PAIST) 1 (1) (2007) 1–13.

[22] AlexandreMaurer, Sébastien Tixeuil, Limiting Byzantine influence inmultihopasynchronous networks, in: Proceedings of the 32nd IEEE InternationalConference on Distributed Computing Systems, ICDCS 2012, June 2012,pp. 183–192.

[23] Alexandre Maurer, Sébastien Tixeuil, On Byzantine broadcast in looselyconnected networks, in: Proceedings of the 26th International Symposium onDistributed Computing, (DISC 2012), in: Lecture Notes in Computer Science,vol. 7611, Springer, 2012, pp. 183–192.

[24] AlexandreMaurer, Sébastien Tixeuil, A scalable Byzantine grid, in: Proceedingsof the 14th International Conference on Distributed Computing and Network-ing, (ICDCN 2013), in: Lecture Notes in Computer Science, vol. 7730, Springer,2013, pp. 87–101.

[25] Y. Minsky, F.B. Schneider, Tolerating malicious gossip, Distrib. Comput. 16 (1)(2003) 49–68.

[26] Mikhail Nesterenko, Anish Arora, Tolerance to unbounded Byzantine faults,in: 21st Symposium on Reliable Distributed Systems, (SRDS 2002), IEEEComputer Society, 2002, pp. 22–29.

[27] Mikhail Nesterenko, Sébastien Tixeuil, Discovering network topology in thepresence of Byzantine nodes, IEEE Trans. Parallel Distrib. Syst. (TPDS) 20 (12)(2009) 1777–1789.

[28] Andrzej Pelc, David Peleg, Broadcastingwith locally bounded Byzantine faults,Inform. Process. Lett. 93 (3) (2005) 109–115.

[29] Yusuke Sakurai, Fukuhito Ooshita, Toshimitsu Masuzawa, A self-stabilizinglink-coloring protocol resilient to Byzantine faults in tree networks, in: 8thInternational Conference Principles of Distributed Systems, OPODIS 2004,in: Lecture Notes in Computer Science, vol. 3544, Springer, 2005, pp. 283–298.

[30] The Heartbleed Bug (http://heartbleed.com).

Alexandre Maurer is a Ph.D. student of Sébastien Tixeuilsince 2011. His works are about tolerating Byzantinefailures in sparsely connected networks.

Sébastien Tixeuil got his Ph.D. in 2000 from UniversitéParis Sud-XI. He is now a Professor at Université Pierre &Marie Curie—Paris 6. His research interests include faultand attack tolerance in networks and distributed systems.