Eth Scalability

Notes on Scalable Blockchain Protocols (version

0.0.2)

Vitalik Buterin (Ethereum)

Reviewers and collaborators: Gavin Wood, Vlad Zamfir (Ethereum),Jeff Coleman (Kryptokit)

March 14-31, 2015

Chapter 1

Introduction

Currently, all blockchain consensus protocols that are actively in use havea critical limitation: every node in the network must process every transac-tion. Although this requirement ensures a very strong guarantee of security,making it all but impossible for an attacker even with control over everynode participating in the consensus process to convince the network of thevalidity of an invalid transaction, it comes at a heavy cost: every blockchainprotocol that works in this way is forced to choose between limiting itself to alow maximum transaction throughput, with a resulting high per-transactioncost, and allowing a high degree of centralization.

What might perhaps seem like a natural solution, splitting the relevantdata over a thousand blockchains, has the two fundamental problems thatit (i) decreases security, to the point where the security of each individualblockchain on average does not grow at all as total usage of the ecosystemincreases, and (ii) massively reduces the potential for interoperability. Otherapproaches, like merge-mining, seem to have stronger security, though inpractice such approaches are either very weak in more subtle ways [1] -and indeed, merged mined blockchains have successfully been attacked[2] -or simply fail to solve scalability, requiring every consensus participant toprocess every blockchain and thus every transaction anyway.

There are several reasons why scalable blockchain protocols have provento be such a difficult problem. First, because we lose the guarantee that anode can personally validate every transaction in a block, nodes now nec-essarily need to employ statistical and economic means in order to makesure that other blocks are secure; in essence, every node can only fully keeptrack of at most a few parts of the entire state, and must therefore bea light client, downloading only a small amount of header data and notperforming full verification checks, everywhere else. Second, we need to beconcerned not only with validity but also with data availability; it is entirelypossible for a block to appear that looks valid, and is valid, but for which theauxiliary data is unavailable, leading to a situation where no other validator

1

can effectively validate transactions or produce new blocks since the neces-sary data is unavailable to them. Finally, transactions must necessarily beprocessed by different nodes in parallel, and given that it is not possible toparallelize arbitrary computation [3], we must determine the set of restric-tions to a state transition function that will best balance parallelizabilityand utility.

This paper proposes another set of strategies for achieving scalability, aswell as some formalisms that can be used when discussing and comparingscalable blockchain protocols. Generally, the strategies revolve around asample-and-fallback game: in order for any particular block to be valid,require it to be approved by a randomly selected sample of validators, andin case a bad block does come through employ a mechanism in which anode can challenge invalid blocks, reverting the harmful transactions andtransferring the malfeasant validators security deposit to the challengers asa reward unless the challenge is answered by a confirmation of validity from amuch greater number of validators. Theoretically, under a Byzantine-fault-tolerant model of security, fallback is unnecessary; sampling only is sufficient.However, if we are operating in an anonymous-validator environment, thenit is also considered desirable to have a hard economic guarantee of theminimum quantity of funds that needs to be destroyed in order to cause acertain level of damage to the network; in that case, in order to make theeconomic argument work we use fallback as a sort of incentivizer-of-last-resort, making cooperation the only subgame-perfect equilibrium.

The particular advantage that our designs provide is a very high degree ofgeneralization and abstraction; although other schemes using technologieslike auditable computation[5] and micropayment channels[6] exist for in-creasing efficiency in specific use cases like processing complex verificationsand implementing currencies, our designs require no specific properties ofthe underlying state transition function except for the demand that moststate changes are in some sense localized.

Our basic designs allow for a network consisting solely of nodes boundedby O(N) computational power to process a transaction load of O(N2)(we generally use O(N ) to denote polylogarithmic terms for simplicity),under both Byzantine-fault-tolerant and economic security models; we thenalso propose an experimental stacking strategy for achieving arbitraryscalability guarantees up to a maximum of O(exp(N/k)) transactional load,although we recommend that for simplicity implementers attempt to perfectthe O(N2) designs first.

2

Chapter 2

Definitions

Definition 2.1 (State). A state is a piece of data.

Definition 2.2 (Transaction). A transaction is a piece of data. A transac-tion typically includes a cryptographic signature from which one can verifyits sender, though in this paper we use transaction in some contexts moregenerally.

Definition 2.3 (State Transition Function). A state transition function isa function APPLY (, ) {}, where and are states and is atransaction. If APPLY (, ) = , we consider invalid in the context of. A safe state transition function SAFE(APPLY ) = APPLY is definedby APPLY (, ) = { if APPLY (, ) = else APPLY (, )}.

For simplicity, we use the following notation:

APPLY (, ) becomes + =

SAFE(APPLY )(, ) becomes ++ =

Given an ordered list T = [1, 2, 3...] and a state , we define +T = + 1 + 2 + ...

Given an ordered list T = [1, 2, 3...] and a state , we define 0 = and i+1 = i + +i+1. We denote with T \ the ordered list of alli T such that i + i 6= .

T is valid in the context of if T = T \

[n] is the set {1, 2, 3.....n}

Definition 2.4 (Replay-immunity). A state transition function is replay-immune if, for any , and T , ++T+ = . Generally, we assume thatall state transition functions used in consensus protocols are replay-immune;otherwise, the protocol is almost certainly vulnerable to either exploits ordenial of service attacks.

3

Note. Replay-immunity is not a sufficient condition for being double-spend-proof. Descriptions of blockchains as anti-replay oracles [4] emphasize thefunctionality of preventing the successful application of more than one mem-ber of a particular class of transactions, where the class is usually defined asthe transactions that spend a particular unspent balance, which is differentfrom the kind of replay-immunity that we describe here.

Definition 2.5 (Commutativity). Two transactions or lists of transactionsT1, T2 are commutative in the context of if + T1 + T2 = + T2 + T1.

Definition 2.6 (Partitioning scheme). A partitioning scheme is a bijectivefunction P which maps possible values of to a tuple (1, 2...n) for afixed n. For simplicity, we use [i] = P ()[i], and if S is a set, [S] = {i :P ()[i] for i S}. An individual value [i] will be called a substate, and avalue i will be called a substate index.

Definition 2.7 (Affected area). The affected area of a transaction (or trans-action list) in the context of (denoted AFFECTED(, )) is the set ofindices S such that (+ )[i] 6= [i] for i S and (+ )[i] = [i] for i / S.

Definition 2.8 (Observed area). The observed area of a transaction (ortransaction list) in the context of (denoted OBSERV ED(, )) is de-fined as the smallest set of indices S such that for any where [S] = [S]and = + , we have [S] = [S] and [i] = [i] for i / S.

Note. The observed area of a transaction, as defined above, is a superset ofthe affected area of the transaction.

Theorem 2.0.1. If, in the context of , the observed area of T1 is disjointfrom the affected area of T2 and vice versa, then T1 and T2 are commutative.

Proof. Suppose a partition of [n] into the subsets a = AFFECTED(, T1),b = AFFECTED(, T2) and c = [n] \ a \ b. Let us repartition as (, , )for [a], [b] and [c] respectively. Suppose that + T1 = (

, , ), and + T2 = (,

, ). Because a is outside the observed area of T2 (as it isthe affected area of T1), we can deduce from the previous statement that(x, , ) + T2 = (x,

, ). Hence, + T1 + T2 = (, , ) + T2 = (

, , ).Similarly, because b is outside the observed area of T1, + T2 + T1 =(, , ) + T1 = (

, , ).

Definition 2.9 (Disjointness). Two transactions are disjoint if there existsa partition such that the observed area of each transaction is disjoint fromthe affected area of the other. If two transactions are disjoint then, as shownabove, they are commutative.

Note. Disjointness does not preclude the observed areas of T1 and T2 fromintersecting. However, in the rest of this paper, we will generally follow a

4

stricter criterion where observed areas are not allowed to intersect; this doessomewhat reduce the effectiveness of our designs, but it carries the benefitof allowing a substantially simplified analysis.

Definition 2.10 (Block). A block is a piece of data, encoding a collection oftransactions typically alongside verification data, a pointer to the previousblock, and other miscellaneous metadata. There typically exists a block-levelstate transition function APPLY , where APPLY (, ) = + T , where Tis a list of transactions. We of course denote APPLY (, ) with + . Ablock is valid if T is valid in the context of , and if it satisfies other validitycriteria that may be specified by the protocol. In many protocols, thereexists a special block finalization function; we model this by stipulatingthat every block may include a virtual transaction that effects the desiredstate transition.

Definition 2.11 (Proposer). The proposer of a block is the user that pro-duced the block.

Definition 2.12 (Blockchain). A blockchain is a tuple (G,B) where G is agenesis state and B is an ordered list of blocks. Cryptographic references(ie. hashes) are typically used to maintain the integrity of a blockchain andallow it to be securely represented only by its final block. A blockchain isvalid if B \ G = B (ie. all blocks are valid in their respective contexts ifapplied to G sequentially).

Definition 2.13 (Parent). The parent of a state is defined as the state1 such that 1 + = for some that has actually been produced bysome user in the network. For genesis states, we define PARENT () = .

Definition 2.14 (Ancestor). The set of ancestors of a state is recursivelydefined via ANC() = if PARENT () = else {PARENT ()} ANC(PARENT ()).

Definition 2.15 (Weight function). A weight function is a function W (u, )on the set of users u1...un (each of which is assumed to have a private andpublic key) in the context of a state , such that for all users in the network,n

i=0W (ui, ) = 1. k of all weight will be used to denote the idea of asubset of users such that the sum of their weights is k.

Note. A combination of a weighting function and a threshold 0.5 < t < 1produces a quorum system.

Definition 2.16 (State root function). A state root function is a collision-proof preimage-proof function R() x D for some compactly rep-resentable domain D (usually D = {0, 1}256). R can also be applied tosubstates.

5

Definition 2.17 (Partition-friendliness). A state root function is partition-friendly with respect to a particular partitioning scheme if one can computeR() by knowing only R([0]), R([1])...R([n]).

Assumption 2.1. Any state root function that will be used by any of theprotocols described below will be partition-friendly with respect to any parti-tioning scheme that will be used by the protocols.

Definition 2.18 (Availability). A state is available if, for every i [n],[i] can be readily accessed from the network if needed.

Definition 2.19 (Altruist). An altruist is a user that is willing to faithfullyfollow the protocol even if they have an opportunity to earn a profit or saveon computational expenses by violating it.

Assumption 2.2. There exists a constant 0 < wa p is bounded below by bL(pp) where b is a constant independent of P and L is some economicmetric of activity in the state machine.

Uninfluenceability (II): There exist constants k and b such that for anypredicate P which the value v would have at some given future timewith probability p assuming everyone honestly follows the protocol, anactor controlling less than k of the weight can increase the probabilityto at most p = p (1 + b).

Example 3.0.2. The cryptoeconomically secure entropy source used inNXT[16] is defined recursively as follows:

E(G) = 0

E(+) = sha256(E() +V ()) where V () is the block proposer of.

Assumption 3.1. For any time internal I, there exists some fixed probabil-ity po(I) such that a node randomly selected according to the weight functionused to measure a cryptoeconomic state machines Byzantine fault tolerancecan be expected to be offline for at least the next I seconds starting from anyparticular point in time with at least probability po.

Note. We can derive the above assumption from an altruism assumption bysimply stating in the protocol that nodes should randomly drop offlinewith low probability; however, in practice it is simpler and cleaner to relyonly on natural faults.

Note. Combining the two uninfluenceability criteria into one (it is impos-sible to increase the probability of P from p to p (1+k) without expendingat least b L k resources) is likely very difficult; it is hard to avoid havingways to cheaply multiply the probability of low-probability predicates byonly acting when you are sure that your action will have an influence on theresult.

Note. In practice, the second criterion is important for security, whereas thefirst criterion is important in order to avoid triggering superlinear returnson capital in systems where randomly selected stakeholders are rewarded.

10

Lemma 3.0.3. The NXT algorithm described above satisfies the conditionsfor being a cryptoeconomically secure entropy source.

Proof. To prove unpredictability, we note that the NXT blockchain pro-duces a block every minute, and so the update v sha256(v, V ()) takesplace once a minute. During each round of updating, there is a probabil-ity 1 po(60) that the primary signer will be online, and po(60) that thesigner will be offline and thus a secondary signer will need to produce theblock. Hence, after 1log(po(60)) blocks, there is a probability p

12 that the

resulting value will be the default value obtained from updating v withthe primary signers public keys at each block, and a p 12 probability thatthe resulting value will be different. We model 512 iterations of this pro-cess as a tree, with all leaves being probability distributions over sequencesof 512 public keys of signers, where all probability distributions are dis-joint (ie. no sequence appears with probability greater than zero in multipleleaves). By random-oracle assumption of sha256, we thus know that we havea set of 2512 independently randomly sampled probability distributions from{0, 1}256, and so each value will be selected an expected {0, 1}256 times, withstandard deviation 2128. Hence, the probability distribution is statisticallyindistinguishable from a random distribution.

To show that the first uninfluenceability criterion holds true, note thatthe only way to manipulate the result is for the block proposer to disappear,leading to another proposer taking over. However, this action is costly forthe proposer as the proposer loses a block reward. The optimal strategyis to disappear with probability 0 < q

Note. The above algorithm is only one of a class of algorithms that try toderive entropy from individual faults. There are also other approaches forderiving entropy, including N of N commit-reveal protocols, relyingon at least one of the participants to be altruistic and refuse to share theirentropy with the other participants. It is an open problem to determine ifone can come up with cryptoeconomically secure entropy sources cheaperthan proof-of-work that rely purely on a rationality assumption.

Note. The above algorithm is NOT downward-uninfluenceable. If there isa predicate P with low probability, then one can influence its probabilitydown to P wa by simply bribing all validators who would have created ablock that changed the entropy value to something satisfying P to instead sittheir turn out and let the next validator make a block, almost certainly nottriggering P . The wa multiplier comes from the fact that we are assuminga subset of validators that cannot be bribed. However, for predicates withmedium probability, one can derive a certain degree of resistance to down-ward influence simply by applying the upward-uninfluenceability guaranteesto P .

Definition 3.5 (zk-SNARK scheme). A zk-SNARK scheme is a tuple ofthree functions G, P , V , where:

G(prog) k generates a verification key from a program.

P (prog, Is, Ip) generates a proof that prog(Is, Ip) for some secretinputs Is and public inputs Ip is equal to its actual output, o.

V (k, Ip, o, ) {0, 1} verifies a proof, accepting it only if is actuallya proof that prog(Is, Ip) = o where G(prog) = k.

Additionally, should reveal no information about the value of Is. Schemesexist [17] to perform G and P in a time equal to O(N logk(N)) for a smallk where N is the number of execution steps, and V can be computed insome cases in polylogarithmic time and in some cases in constant time.

Note. We provide algorithms which achieve scalability without zk-SNARKs,however we will show how zk-SNARKs can also be used in scalable blockchainprotocols.

Definition 3.6 (Decentralized oracle scheme). A decentralized oracle schemeis a mechanism which asks a set of participants to provide an answer to asubjective question (eg. what was the temperature in San Francisco on 2015Jan 9?), and attempts to incentivize the participants to answer correctly.The output of the mechanism is generally considered to be the majorityanswer, and as input the scheme usually requires an economic subsidy.

Example 3.0.3. A simple decentralized oracle scheme, called Schelling-Coin, has the following rules:

12

Each of N participants must vote either 1 or 0 on a question (a scalar-valued or multi-valued question is modeled as a series of binary ques-tions).

A participant whose vote aligns with the majority vote receives a re-ward of P .

A participant whose vote does not align with the majority vote receivesa reward of 0.

SchellingCoin is vulnerable to zero-cost equilibrium flips, and so under aformal definition of economic security it cannot be viewed as having anysecurity margin at all, though the difficulty of the inherent coordinationproblem may still allow the scheme to function in practice. An attacker witha budget greater than P and the ability to credibly commit to to providinga bribe under certain conditions in the future can, assuming rationality ofthe participants, effect an equilibrium flip toward a wrong answer at no cost[19].

Note. There are two ways to make SchellingCoin-like protocols more power-ful by increasing their security margin. The first involves a strategy of Sztor-cian counter-coordination[20], which attempts to naturally set the quantityof funds at stake in proportion to the level of controversy in the question; inthe limit, people who do not vote alongside the majority answer lose theirentire security deposit. The game-theoretic argument is that if a medium-sized bribe to vote incorrectly is offered, then voters will vote for the correctanswer with some of their funds and for the incorrect answer with someof their funds, leading to an equilibrium where the majority of the votingpower is still in favor of the correct answer but the users also manage tosteal some of the bribe:

This approach accepts that attackers with a budget even larger thanthe size of everyones security deposits will be able to cause everyone to flipto voting incorrectly as an equilibrium, but advocates note that attackersthat large will also be able to arbitrarily disrupt the underlying blockchainconsensus anyway.

The second is to fall back to subjective resolution: if the majorityof voters vote incorrectly, then it is up to the users to simply reject that

13

block as invalid and go along with the block that they see as correct. Thisreliance on human judgement at the last level makes the budget required forany kind of zero-cost equilibrium flip attack essentially infinite, insteadrequiring the attacker to bribe everyone outright and even then resultingin only a moderately large inconvenience for users, not a total failure. Theprocess does impose inconveniences to users, but properly built designs thatuse decentralized oracle schemes will use the mechanism only as the last stepof a fallback game in a similar function as nuclear deterrence; in practice itwill almost never be used. The security claim essentially becomes somethinglike by paying $100 million, an attacker has the ability to force users tovisit their favorite internet forum and locate a 32-byte hash representing thenew fork of the chain to switch to.

Definition 3.7 (Distributed hash table). A distributed hash table is a mech-anism which gives network-connected nodes access to a functionGET (, i)(V, ) where (V, ) is the Merkle branch of [i] where = R(). GET shouldwork with overwhelmingly high probability provided that (i) (V, ) has atleast once been in the hands of at least one altruist, and (ii) the total quan-tity of data stored in the DHT is at most Nnk where N is the computationaland data storage capacity of a node, n is the number of nodes and k is aconstant.

Note. The Merkle-branch-based definition of DHT given above can be de-duced easily from the more conventional definition of providing a functionGET (h) x where h = H(x) for some hash function, given that Merkletree protocols consist simply of repeated reverse-hash-lookup queries. Thedefinition used above is more relevant to our scalable blockchain use cases.

Example 3.0.4. Kademlia [12] is an example of a distributed hash table. Amore direct implementation of our desired formalism with Merkle brancheswill be available from IPFS [13].

14

Chapter 4

Global Variables

In the rest of this document we will use the following variables:

N - the maximum computational power of a node

L - the level of economic activity in the network

m - the size of the randomly selected pool of validators that need tovalidate each block (usually 50 or 135)

wa - the portion of weight controlled by altruists

will represent transactions, will represent blocks and block headersdepending on context, and B will represent super-blocks.

15

Chapter 5

A Byzantine-fault-tolerantScalable ConsensusAlgorithm

Suppose a construction where the state is partitioned into n O(N1)substates, and each substate is itself partitioned into very small parts (eg.one per account). We define three levels of objects:

A super-block is a block in the top-level consensus process. The headerchain is the blockchain of super-blocks.

A block is a package containing:

T , a list of transactions

D, the set of Merkle branches for the observed area of T on thefine-grained sub-partition level, ie. where the value proven byeach Merkle branch is only constant-size

A header, consisting of:

AP , a map {i : R([i]) for i OBSERV ED(, T )} AN , a map {i : R([i]) for i OBSERV ED(, T )} S = [s1...sm], an array of signatures H(T ) H(D)

A transaction is a transaction as before.

We use any non-scalable consensus algorithm (eg. Tendermint[21]) forprocessing super-blocks, except that we define a custom top-level statetransition function APPLY . For the state processed by APPLY , we use

16

= {i : R([i]) for i [n]}+E() +V () where E() is a cryptoeconomi-cally secure entropy source in and V () is a set of validators registeredin . From the point of view of APPLY , we define a block header asbeing valid in the context of if the following are all true:

For all i AP , [i] = AP [i].

For at least 2m3 values of i [m], s[i] is a valid signature when checkedagainst SAMPLE(W,E()+A,m)[i] where W (, ui) =

1|V ()| if ui

V () else 0 and A is the address or public key of the block proposer (ie.the block has at least 2m3 valid signatures out of a randomly selectedpool of m).

If is valid, we set by:

For all i OBSERV ED(, ), [i] = AN [i]

For all i / OBSERV ED(, ), [i] = [i]

The top-level state transition function has an additional validity con-straint: for the super-block to be valid, the observed areas of all [i] mustbe disjoint from each other.

We do not formally put it into the top-level validation protocol, but weseparately state that a validator is only supposed to sign a block if the blockmeets what we will call second-level validity criteria for :

Every transaction in T is valid in its context when applied to .

D, AP , and AN are produced correctly.

Lemma 5.0.4. Assuming less than 13 of validators are Byzantine, a blockthat is valid under the above top-level state transition function will producea top-level state such that if we define with = {i : R1([i])}, = + T1 + T2 + ... where Ti is the set of transactions in i.

Proof. Suppose that all i satisfy the second-level validity criteria. Then,note that each substate in is only modified by at most one Ti, and so we canmodel the state as a tuple (O,A1, A2, A3...An) if includes k transactions,where Ai is the affected area of Ti and O is the remaining part of . The stateafter T1 will be (O,A

1, A2, A3...An), by disjointness the state after T2 will be

(O,A1, A2, A3...An), and so forth until the final state is (O,A

1, A

2, A

3...A

n).

If we use the top-level state transition function on , we can partition similarly into (O,B1, B2, B3...Bn), and a similar progression will take place,with state roots matching up at every step. Hence the final state at thistop-level state transition will be made up of the roots of the substates in .

17

Now, suppose that one i does not satisfy the second-level criteria. Then,validators following the protocol will consider it invalid, and so 2m3 validatorswill not be willing to sign off on it with very high probability, and so theblock will not satisfy the top-level criteria.

Hence, we know that the resulting will represent a state which is valid,in the sense that it can be described as a series of transactions applied tothe genesis. This is the validity criterion that we will repeatedly use in orderto prove that validity is preserved by other mechanisms that we introducesuch as revert mechanics.

Because a sampling scheme is inherently probabilistic, it is theoreticallyprone to failure, and so we will precisely calculate the probability that theprotocol will fail. Assuming a 96-bit security margin, we can consider aprobability negligible if the average time between failures is longer than thetime it takes for an attacker to make 296 computations. If we assume thatthe cryptoeconomically secure entropy source updates every super-block,and a super-block comes every 20 seconds, then one can reasonably expecta hostile computing cluster to be able to perform 248 work within that time,so we will target a failure probability of 248 (this implies roughly one fail-ure per 445979 years of being under attack). We can determine the failureprobability with the binomial distribution formula:

PE(n, k, p) = pk (1 p)nk n!k!(nk)!

Where PE(n, k, p) is the probability of an event that occurs with proba-bility p in an attempt will achieve exactly k occurrences out of n attempts.For convenience, we define PGE(n, k, p) =

ni=k PE(n, i, p), ie. the proba-

bility of at least k occurrences.We note that PGE(135, 90, 13) = 2

48.37, so 135 nodes will suffice for ak = 13 fault tolerance. If we are willing to lower to k =

15 , then PGE(39, 31,

15) =

248.58, so 39 nodes will be sufficient. Additionally, note that if we wantgreater efficiency under normal circumstances, then a bimodal scheme whereeither 44 out of a particular 50 nodes (PGE(50, 44, 13) = 2

49.22) or 90 out ofa particular 135 nodes is a sufficient validity condition will allow the cheaperbut equally safe condition to work in the (usual) case where there are almostno malicious nodes, nearly everyone is online and the network is not underattack, and fall back to the more robust but bulkier validation rule undertougher conditions.

Additionally, note that the combinatoric formula is only valid if the en-tropy used to determine validator sets is actually random, or at least closeenough for our purposes. For this, our second uninfluenceability criterion,stipulating that an attacker with less than 13 of all validators will be ableto influence it by at most a reasonably small linear amount (in the NXTcase, by a factor of 32), is crucial, showing that with such a powerful attacker

18

the probability of a successful attack remains below 247.4. The first unin-fluenceability criterion is important in order to avoid excessive superlinearreturns resulting from super-block proposers manipulating the entropy in or-der to increase the chance that their validators will be involved in verifyingblocks, but does not affect security.

To see the role of the unpredictability criterion, note that the protocolas described here provides ex ante Byzantine fault-tolerance, but not ex postByzantine fault tolerance; after the validators are selected, only that smallfixed number of validators acting maliciously will compromise the system.If entropy becomes unpredictable very slowly (or never at all), then thealgorithm is only Byzantine fault-tolerant against faults very far in advance,leading to potential practical vulnerabilities like an attacker locating the 135validators that will be validating their block one year from now and takingthe time to locate and hack into 90 of them.

Lemma 5.0.5. The above protocol is scalable.

Proof. The load of a validator consists of two parts:

validating the header chain

validating the blocks

For the header chain, note that the size of each super-block is n+k1b+k2,where n is the size of the state partition (as each index can be included asan observed area at most once), b is the number of blocks, k is a constantrepresenting the size of H(T ) + H(D) + [s1...sm], and k2 is a constant formiscellaneous super-block data (eg. validators entering and leaving, the en-tropy source, timestamp, previous block hash). We can require n O(N1)and b n so b O(N1). One can come up with an algorithm to evalu-ate the top-level validity of the block in less than O(N) time; showing thatobserved areas do not intersect as a set non-intersection algorithm can takeup to O(k log(k)) time for k items and O(N1 log(N1)) < O(N), andthe individual block validations should simply take O(N1) time.

For block validation, suppose that there are v O(N1) validators.Each super-block may contain a maximum of n blocks, and n O(N1).Thus, in total, validators must process at most m n blocks (as each blockmust be processed by m validators), and so each validator will on averageend up processing m n/v O(1) blocks. If each block has O(N1)transactions, then the computational load of this is m O(N1) < O(N).Hence, the total load of a validator is less than O(N) while the networkprocesses a total of O(N2) transactions.

From a bandwidth perspective, the header chain is sent to everyone, butfor blocks the block proposer need only send the block to the validatorsdirectly, so once again the load per validator is O(N1). The only data

19

that a validator needs to verify the block is the observed areas on the sub-partition level, which are provided with proofs inside of .

The incentives inside such a scheme would be provided as follows:

Transactions can pay transaction fees to the producer of the block thatincludes them.

Blocks can pay transaction fees to the proposer of the super-block thatincludes them, as well as the the validators that will sign off on them.

The proposer of the super-block can pay transaction fees to the valida-tor pool of the super-block; this fee can be mandatory (ie. protocol-enforced) or enforced by validators refusing to sign a block with a feethat is too small for them.

The protocol may optionally add a mandatory fee to transactions andblocks, and this fee can get destroyed. The protocol may also addan ex nihilo reward to the validators of the super-block in place ofrequiring it to be paid from transaction fees.

We can also require validators to have security deposits of size O(N1)which are destroyed if the validators are malfeasant, as is standard in modernproof-of-stake consensus schemes; this provides the basis for an economicsecurity margin. Note that the fees paid to validators, as well as the depositsize of a validator, is stored in the header chain alongside the identity ofthe validator; only when a validator leaves the pool does the fee balance getadded to the main state.

20

Chapter 6

Fallback Schemes

The previous algorithm provided a statistical guarantee of data availabilityand validity by simply assuming that at least two thirds of validators arehonest; however, it carries an inherent fragility: if, for any reason, a smallsample of nodes actually does end up signing off on an invalid block, thenthe system will need to hard-fork once the fault is discovered. This resultsin two practical weaknesses. First, larger safety factors are required on thesample sizes in order to ensure that a sample is absolutely never malicious.Second, the economic security of the algorithm, under our formal definition,approaches zero as L increases if defined as a fraction of L. An attacker willonly need to bribe m out of a total O(N1) validators in order to producean invalid block that enters the system. Hence, as N increases, there is nofixed bound b of proportionality between the weight and the security margin;the ratio goes lower the larger the system gets. Sampling provides securityagainst random faults, but not against targeted faults, which bribery cantheoretically achieve.

One approach to dealing with the latter issue is to not bother with eco-nomic security, and simply rely on Byzantine-fault-tolerance guarantees todetermine security. In order to ensure beneficence of the network maintainsover time, we ask validators to censor attempts to join the validator poolby parties that appear to be unreliable or likely to be malfeasant. In somecases, particularly regulated financial systems, such restrictions are neces-sary in any case and so we need to go no further[22]. If we are trying tocreate a more open network, however, then such a solution is unsatisfac-tory, and so we arguably need not just Byzantine fault tolerance but alsoregulation via economic incentives.

If economic security is indeed desired, one natural solution that may beproposed is a fallback game: allow any validator to challenge a partic-ular blocks validity by putting some quantity of funds at stake (if no fundsare at stake, then an attacker can increase load on the network arbitrarilyat no cost), and ask the header chain to be the validator-of-last-resort. If

21

the data for the block is available, then this is easy: the validator providesthe full block to the header chain, and the header chain validates the block;if the block is valid, then the challenger loses some portion of their securitydeposit, and if the block is invalid then the block is reverted and the valida-tors that signed the block lose their entire security deposit (and a portionof that deposit goes to the challenger as a reward). The game that arisesfrom this scenario can be seen as follows:

Above we assume 1 is the block reward, is an expected private payoffto the validator of an invalid block getting into the blockchain (all we knowabout this payoff is that it is less than zero; it could take any magnitude de-pending on the situation of the particular validator and ultimately does notmatter for the purposes of the game-theoretic argument), 100 is the securitydeposit, 10 is the whistleblowers reward and 10 is the penalty for whistle-blowing falsely (ie. on a valid block). Note that the block producer playsfirst. If the block producer makes an invalid block, then the challenger willchallenge, and so the block producer will earn 100; but by making a validblock the block producer will earn 1. Hence, the block producer will makea valid block, and in that case the challenger will earn a higher profit of 0rather than 10 by not challenging and so will not challenge. Thus providinga valid block is a subgame-perfect equilibrium.

There are two problems with this scheme as described above. The firstproblem is a technical one: although the scheme remains scalable, Byzantinefault tolerant and economically secure, and even economically secure undera Byzantine failure or Byzantine-fault-tolerant under an economic attack, itis not scalable under an economic attack. The reason is that an attacker canspend L

O(N1) resources by bribing m of the O(N1) validators in order to

elevate any block to the header level, and so by spending b L effort theattacker can elevate a total of O(N1) blocks. As soon as the size of ablock is at least O(N2) (which happens at L O(N1+)), this means thatby spending b L effort the attacker can make the protocol non-scalable.

The solution that we propose is a multi-round escalation game: a chal-lenger proposes a transaction to point out an invalid previous block, andthat transaction will be validated by 2m validators. If either the challengeror a validator is unhappy with the result, they can then make another chal-lenge, providing twice as high a deposit but resulting in the block being

22

reviewed by 4m validators. Another appeal is possible with a 4x deposit,which will be reviewed by 8m validators, and so on until all validators par-ticipate. Game-theoretically analyzing this, one can find that an attackerthat spends bL resources can at most increase a nodes computational loadby a constant factor, and recursive elimination of dominated strategies leadsto honesty being a subgame-perfect equilibrium in the normal case.

An alternative approach for ensuring validity is to simply use a succinctcryptographic proof in place of a cryptoeconomic one, seemingly removingthe need for fallback schemes entirely. A zk-SNARK can theoretically beattached to the header of a block, proving that all of the transactions in theblock are valid and that the final state roots are correct. However, this hasa few problems. First, zk-SNARKs rely on much less robust cryptographicassumptions than hashes and signatures; particularly, it is not known if zk-SNARKs are even possible to secure against quantum computers, whereasstrong candidates for quantum-proof digital signatures are well-known [23][24] [25].

23

Chapter 7

Data availability

The more difficult challenge in implementing a fallback scheme is that thereare in fact two ways in which a block can be invalid. First, the blockmay incorrectly process transactions and have an invalid final substate root.Second, however, an attacker may simply create a (valid or invalid) blockand refuse to publish the contents, making it impossible for the network tolearn the complete final state. Validating data availability is, unfortunately,a fundamentally harder problem than validating correctness for a simplereason: data availability is quasi-subjective at scale. Although one candetermine that a particular piece of data is available at a particular timeby trying to download it, one cannot with perfect precision verify whetherdata is available if one does not have enough bandwidth to download all ofit, and one cannot determine at all whether or not data was available at aprevious time if one was not paying attention during that time.

A particular consequence of this is that validating data availability throughcascading fallback, as described in the previous section above, is not so sim-ple: the attacker can always cheat the system and destroy others depositsby providing a block with unavailable data at first, waiting for a fallbackgame to start, and then destroying the security deposits of challengers bysuddenly providing the data mid-game. Although such an attack does notcompromise security, it does allow the attacker to siphon resources awayfrom challengers, leading to an equilibrium in which it is not rational tochallenge, at which point the attacker will be able to create and propagatebad blocks unimpeded.

In order to make clear the gravity of the problem, we will start off byshowing several categories of attacks that are possible if data availabilitycannot be effectively ensured.

Example 7.0.5 (Unprovable theft attack). An attacker creates a blockwith transactions T1 ... Tn, where T1 ... Tn1 are legitimate and appliedproperly but Tn is an invalid transaction which seems to transfer $1 millionfrom an arbitrary wealthy user to the attacker. The attacker makes T1 ...

24

Tn1 available, and makes the final state available, but does not make Tnavailable. Once the state is confirmed by a sufficient number of future blocks,the attacker than spends the $1 million to purchase other untraceable digitalassets.

Here, note that there is no way to prove invalidity, because theoreticallyTn very easily could be a transaction legitimately transferring $1 million fromthe millionaire to the attacker, but the unavailability of the data preventsthe network from making the judgement.

Example 7.0.6 (Sudden millionaire attack). An attacker creates a blockwith transactions T1 ... Tn, where T1 ... Tn1 are applied properly but Tn isan invalid transaction. The attacker then makes T1 ... Tn1 available, andmakes most of the final state available, with the exception of one branchwhere the attacker has given themselves $1 million out of nowhere. Twoyears later, the attacker reveals this branch, and the network is forced toeither accept the transaction or revert two years of history.

Note. One way to try to solve both attacks above is to require every validtransaction to show a valid source of its income. However, this leads toa recursive buck-passing problem: what if the attacker performs a suddenmillionaire attack to create a new unexplained account A1, then creates ablock with an unavailable transaction sending funds from A1 to A2 (possiblya set of multiple accounts), then A3, and so forth until finally collecting thefunds at An, and then finally reveals all transactions upon receiving An. Thetotal dependency tree of a transaction may well have total size (N). PeterTodds tree-chains discussion offers the solution[27] of limiting the protocolto handling a fixed number (possibly (N)) of coins, each one with alinear history; however, this solution is not sufficiently generalizeable andabstract for our purposes and carries high constant-factor overhead.

Example 7.0.7 (Payment denial attack). An attacker creates a block withan unavailable transaction which appears to modify another partys account(or spend their UTXO). This prevents the other party from ever spendingtheir funds, because they can no longer prove validity.

Note. Even zk-SNARK protocols cannot easily get around the above prob-lem, as it is information-theoretically impossible for any protocol regardlessof cryptographic assumptions to fully specify the set of accounts that aremodified in a particular super-block in O(N) space if the set itself has (N)Kolmogorov complexity.

Example 7.0.8 (Double-spending gotcha attack). Suppose that a protocoltries to circumvent payment-denial attacks by making payment verificationa challenge-response process: a transaction is valid unless someone can showthat it is a double-spend. Now, an attacker can create a block containing alegitimate but unavailable transaction that spends $1 million from account

25

A1 to A2, also controlled by them. One year later, the attacker sends $1million from A1 (which no one can prove no longer has the money becausethe data is unavailable) to A3. One year after that, the attacker reveals theoriginal transaction. Should the network revert a year of history, or allowthe attacker to keep their $2 million?

Note. Zk-SNARK protocols are not a solution here either, because of thesame information-theoretic issues.

In order to remove the possibility of such issues in the general case, wewill try to target a hard guarantee: given a general assumption that oncedata is available at all it is guaranteed to remain available forever due toa combination of the DHT and altruist-operated archival services, we willattempt to ensure that, for all blocks , it is the case that either is availableor is reverted.

The largest game-theoretic threat to fallback schemes handling the prob-lem by themselves is, as discussed above, the crying-wolf attack: an at-tacker repeatedly produces valid blocks with unavailable data, waits for afallback process to start reverting them, and then immediately publishes thedata, thereby discrediting the agents that contested the block in the eyesof all other viewers that did not have ther eyes on that particular block atthe time that the incident was taking place. In order to prevent the processof contesting the block from becoming an attack vector, the process of con-testing must be costly if it ends up being a false alarm; hence, eventually allchallengers will be weary of wasting their resources, allowing the attacker tocarry out unavailable-data attacks with impunity.

We can formally model this as follows, letting cc be the cost of raising achallenge, rc the reward of challenging successfully, cb the cost of producingan errant block, cv the cost of viewing a block, B the block size, n is thenumber of blocks and nv is the number of validators.

We know that n is proportional to nv, since each validator can validateat most a fixed number of blocks.

We know that cb, cv and B are proportional to each other, since theblock size is proportional to Ln where n is the number of blocks, thecost of viewing a block is equal to the effort spent downloading it, andcb is bounded above by a validator deposit, which is also proportionalto Ln . There is no reason to make cb less costly than needed, so settingcb to the entire deposit is prudent.

We know that cc must be at least proportional to Ln = B, as otherwisean attacker of size L would be able to repeatedly challenge everyblock. Note that cc is proportional to B also allows an attacker ofsize b L for some b to challenge every block; however, the degreeof fallback escalation which the attacker can afford is bounded by a

26

constant, whereas if cc < O(B) then the degree of escalation leads toa strictly complexity-theoretically greater level of attention on eachblock - and if this increased level was scalable then the protocol wouldhave had it by default in the first place.

We know that cr cb, as otherwise producing errant blocks and thencalling them out would be a profitable strategy. Since cb is proportionalto B, we know that cr is proportional to B.

Hence, cc is proportional to cr, so there is some constant k = cccr .

A node starts off having some prior probability distribution D over theprobability p of a random invalid block later revealing data. Assumefor simplicity that this prior was derived from Laplacian succession[28]with s instances of later revelation and f instances of non-revelation.After f k repeats of producing a block with invalid data and laterrevealing it, a node will rationally assume that the next block will haverevelation probability s+fks+f+fk > 1

1k .

Hence, it will not be rational for that node to issue a challenge whenthe next block with unavailable data appears.

This model is not intended to cover all possible scalable protocols - andindeed it cant, since we claim that there are in fact solutions; rather, it is in-tended to illustrate the general difficulty. Challenging this strategy requiresirrational sacrifice of resources, either on the part of the challengers whichwould end up continuing to challenge blocks despite negative expected re-turn or on the part of other actors that start issuing blocks with unavailabledata without ever publishing the data purely in an attempt to manipulatechallengers posteriors in a positive direction.

We provide two possible sets of solutions to this problem. The first is tosimply capitulate to the crying-wolf attack, and explicitly require challengingto be a costly and potentially altruistic activity. The protocol in this casewould work as follows:

After a block is published, initialize i = 0. Anyone has the ability tolay down a challenge against the block at cost c within the next Fdblocks (eg. Fd = 20). Challenging happens at the superblock level (ie.it is part of the top-level state transition function).

Once at least k 2i challenges are laid (eg. k = 5), the block enterspurgatory. Once in purgatory, the block remains there until it iseither rescued (see below) or it remains there for Pd blocks (eg.Pd = 40) at which point the block is reverted.

Rescuing a block requires a rescue transaction, which requires sig-natures from two thirds of a random set of 2i+1 m validators (eg. 180

27

of 270 at i = 0). This also happens at the superblock level, and theproducer and validators of a successful rescue transaction are given asmall reward (eg. c3m per validator plus

c3 to the producer).

Once a block has been rescued, the Fd time restarts, increment i i + 1, and the block can be challenged again. Note that each timethe cycle repeats, the required number of challenges and the requirednumber of validators on a rescue doubles.

If a block is rescued by the entire network, then it can be consideredfinal.

If a block is challenged but never rescued, then the security depositsof the block producer and all validators that signed the block aredestroyed, and equally split among challengers.

Under normal circumstances, challenging is thus profitable; it is onlyduring a crying-wolf attack (which has a very large constant-factor cost,perhaps even enough to make it not the weakest link in all but the largestO(N2) systems) that is becomes altruistic. Because anyone can challenge,one cannot bribe the challengers not to challenge; in fact, doing so willsimply motivate the challenger to challenge again using a different identity.An attacker with at most nc resources can force at most 2nm validatorsto check availability on a block, and so if we set c N1, we can see that anattacker with bL O(N2) resources will be able to force 2mO(N1)validators to check a single block, increasing their computational load bya constant factor; hence, the system cannot be subjected to a denial-of-service attack except at cost bL where the value of b depends on c (there isa linear tradeoff between requiring more altruism and being more susceptibleto denial-of-service attacks).

We also add a rule that the producer of a block is slightly penalized ifit is challenged sufficiently, even if it is later vindicated. This allows us toestablish a bound on the ratio of losses to altruists to losses to the attacker,allowing a bound to be determined on the minimum required budget of anattacker assuming altruists are willing to lose Lk wa for some constant k.Note that rescuing legitimately is a very low-risk activity; since a validatorshould possess the entire data of a block themselves, they have the abilityto convince any larger sample of validity if necessary.

A possible scheme for removing the need for altruism even in extremisworks as follows. First, we require the proposer for each superblock to com-pute a small amount of proof of work, perhaps with per-superblock costw = r 0.01 where r is the superblock reward. Proof of work is required toprovide a highly secure cryptoeconomic entropy source which is completelyunpredictable; alternatives such as NXT can far too easily be foretold inadvance. From this value, we then select a random substate, and we ask

28

all top-level validators to check availability on the most recent block in thatsubstate. If the block is indeed unavailable, and the block has been chal-lenged, then the challengers split a reward of size rc = w 0.49 O(N2).The superblock proposer also gets a reward of equal size.

The scheme can be thought of an inverted application of a standardresult from law and economics: one can achieve compliance with arbitrarilylow enforcement cost by simply decreasing the level of enforcement, and thusthe probability of getting caught, but simultaneously increasing the fine[26]- except, in our case, we are performing the exact opposite task of a policedepartment as we are trying to incentivize a virtuous usually-costly actionwith the occasional application of a reward. The load each validator is atmost the cost of downloading one block, O(N1), and each substate willhave a propbbility of at least 1

O(N1) of being randomly inspected; hence the

scheme does not compromise scalability. However, assuming that a top-levelchallenge has at least a fixed probability p of correctly determining that anunavailable block is unavailable before the attacker manages to broadcast itacross the network, this provides an average incentive of rc

O(N1) O(N)in order to challenge, making it statistically worthwhile to challenge invalidblocks despite the cost.

Note that it is expensive for the challenger to manipulate the proofof work either upward or downward. Once the challenger has computedthe proof of work successfully, a bribe of size rc O(N2) is required todissuade him from starting the top-level challenge procedure for the entireblock, and with judicious choice of constants one can make the bribe exceedour desired threshold bL. If the challenger wants to increase the probabilityof triggering a top-level challenge in order to earn the rc reward, possibly incollusion with the challengers, then note that even if the attacker has placedan invalid block on all substates, re-computing the proof of work has costw and maximum expected collective reward w 0.98.

Given that the expected reward of challenging is now wN1 , we can im-

pose a cost of challenging of at most the same value, and so the level ofdenial-of-service protection we get against challenges is proportional to thequantity of proof of work employed. One can increase the ratio drasticallyat the cost of limiting ourselves to a more qualified security bound by settingrc = k w for some k > 1, making proof-of-work manipulation profitableonly if 12k of substates currently have an invalid block in them (due to theextreme cost of producing an invalid block, we can hence make k quite high).

The second set of solutions to this problem involves changing the questionasked to the fallback game: instead of asking the fallback game to adjudicateon the question of whether the data is available, have it adjudicate on thequestion of whether the data was available. This would then be combinedwith some kind of strategy which each node would use to efficiently determinewhether or not every challenge was valid as soon as it is made. There are

29

two most promising contenders for such strategies. First, one can require thedata of a block to be erasure coded, such that the data can be fully recoveredif at least 50% is available; this allows each node to probabilistically evaluatetotal availability by randomly accessing a sample (without erasure coding,a 99% available block can easily slip through scrutiny). Second, one canrequire each node to independently pick a random sample of challenges toevaluate, and provide the Merkle root of the set of evaluations to othernodes. Other nodes would then probabilistically check the correctness of theMerkle roots that they receive through sampling, and would then themselvessend out a Merkle root of all Merkle roots that they consider trustworthy.This can theoretically be made resistant to bribery because transmission ofMerkle roots is private and so there is no way for a node to prove to a briberthat they sent a compromised Merkle root.

For simplicity, we recommend not applying either of the more advancedstrategies due to their greater complexity, and simply relying on altruismin extremis, noting that in practice attempts to bribe validators have notproven to be a problem in cryptoeconomic systems in practice. If crying wolfattacks become a problem, particularly in the more highly scalable variantswe describe in a later section, then we prefer the random block selectiongame due to its greater simplicity.

30

Chapter 8

Reverting

A critical mechanism used in the previous sections is the concept of revertinga block if it is found to be invalid. However, it is necessary to come up witha scheme to do so with minimal disruption, reverting only what needs tobe reverted while keeping the remaining state transitions intact. To doso, we propose the following mechanism. When an invalid block is foundand agreed upon, construct a dependency cone of all substates that couldhave been affected by that block, and revert each substate to a prior safestate. The algorithm for this relies on a primitive we will call GDC (getdependency cone), which takes a target state , an errant block 0 insidea super-block B and the post-super-block state 0 = 1 +B, and outputsa partial mapping of index to substate. We define GDC as follows:

GDC(, 0, 0) = if ANC(0)

GDC(0, 0, 0) = OBSERV ED(0, 0)

GDC(+B, 0, 0) =B:GDC(,0,0)OBSERV ED()6=OBSERV ED(, )

GDC(, 0, 0)

Assuming an invalid block 0 inside a super-block B with prior state pre,and assuming the current state is f , we useGDC(f , pre+B,OBSERV ED(pre, 0))to get the total set of substates to be reverted. Then, for each substate indexi, we locate the state i such that i GDC(i+B, pre+B,OBSERV ED(pre, 0))but i / GDC(i, pre+B,OBSERV ED(pre, 0)), and simply revert f [i]i[i], ie. revert every substate to just before the point where it got into thecone of potentially invalid blocks.

Lemma 8.0.6. The state obtained after reverting as above is valid.

Proof. For the sake of simplicity, suppose that every super-block containsa block modifying every substate; if not, then we assume a phantom nullblock for each unaffected substate i with AN = AP = {i : R([i])}. First,

31

note that the area of the dependency cone during each super-block corre-sponds exactly to the combined area of some set of blocks; it cannot partiallyinclude any block because the definition states that if the area of a blockis partially included it must be fully included, and it cannot include areaunoccupied by blocks because of our siplifying assumption. Then, for eachsuper-block , let D() be the set of blocks in the dependency cone of thepost-state of that block in the blockchain, and U() be the set of blocks notin the dependency cone. If f = pre + 1 + 2 + ..., the post revert statef will correspond exactly to pre + U(1) + U(2) + ....

We will show why this is true by illustration. Consider a sequence ofstates with a set of blocks updating the state during each super-block:

Now, suppose that one of those blocks is invalid. Then, if we manage torevert immediately in the next block, the state will be moved to the red linehere:

And if we revert later, the state will be moved to the red line here:

Notice that one can apply the set of still-valid blocks sequentially to to obtain f in all cases.

It is important to note that the revert process is scalable only if thenumber of substates that an attacker can revert with a single invalid blockis bounded, assuming some bound on the amount of time that it takes forthe bad block to be detected and for the revert to be included. Otherwise,the attacker can annoy all users by paying less than b L cost, and so the

32

algorithm will no longer be scalable under attack. In order to achieve this,we have two solutions. First, we can specify that each block can have amaximum area size k for some fixed k; the number of substates revertedassuming a revert after l blocks will then be bounded above by kl. Theother approach is to require blocks with area larger than k substates to haveproportionately more validators; eg. a block with area of size 4k substatesshould require 8m3 signatures out of a pool of 4m in order to be valid. Bothstrategies ensure scalability under attack.

33

Chapter 9

Stacking

The above schemes provide scalability up to O(N2). But can we gohigher than that? As it turns out, the answer is yes; we can in fact achievescalability of O(Nd) for arbitrary d while satisfying both economic andByzantine-fault-tolerance guarantees. We accomplish this by essentially lay-ering the scheme on top of itself. Essentially, we define a generalized trans-form T (APPLY, V T,N, d) where APPLY is a state transition function,V T (F, s) F is a function which takes a state transition function F and amaximum header chain size s and outputs the top-level state transition func-tion F (eg. the algorithms we defined above can be seen as implementationsof V T ), N is the maximum computational load and d is the depth.

We define T as follows:

T (APPLY, V T, 2) = V T (APPLY,N1)

T (APPLY, V T, d) = T (V T (APPLY,Nd(1)), V T, d 1) for d > 2

We maintain the restriction that all objects produced at any point musthave a soft maximum size, above which the number of validators requiredto agree on that object increases proportionately.

The definition is simple, and so somewhat obscures the complex innerworkings of what is going on. Hence, let us provide an example. Consider atransform of the fallback-plus-subjective-resolution scheme provided abovewith d = 3. There would now be four levels of data structures. Atthe bottom level, we have the ultimate transactions that are affecting thestate. Then, on the third level we have blocks containing those transactions.To validate blocks on this third level, there are O(N2) validators (eachwith a deposit of size O(N)), which are recorded on the second level; ablock on the third level must obtain signatures from 2m3 out of a pool of mvalidators obtained from the entire pool of second-level validators. However,the process of verifying that signatures from the correct validator pool areprovided itself depends on data that is not shared globally (as it is on the

34

second level), and so that process itself as a state transition function needs tobe carried out using the two-level scalable protocol, ie. a second-level blockwould treat the second-level validators accessed by all third-level blockscontained inside of it as part of its observed area. Blocks containing thesetransactions would themselves be voted on by the header chain, which hasO(N1) validators each with a deposit of size O(N2).

At this point, the algorithm almost has O(N3) scalability, but notquite. There are two reasons why. First, note that each third-level blockindependently picks m validators. Hence, if a second-level block has O(N)transactions, and assuming that there are O(N) second-level substates, wecan expect a single second-level block to have an observed area of size 1 1emof the whole set of substates - which for even the highly unrealistic m = 5amounts to almost all of it. Hence, there can be at most one second-levelblock, and so our scalability is right back to O(N2).

To solve this problem, we make a moderate change: we say that theset of validators for any transactions should always be determined from thestate at the start of the super-block. Formally, we say that a superblock hastwo virtual transactions (, ) at the start, and then (, ) atthe end, and validators are selected from the left pool, not the right pool.

35

This allows us to have validators be selected from anywhere in the statewithout including them into any observed areas. From an implementationperspective, this means that a block at any level would need to come withsignatures [s1...sm] and also Merkle proofs [1...m] where i proves thatthe validator Vi that produced signature si is indeed at the correct positionin the validator pool of the super-blocks pre-state.

Second, note that fallback mechanisms all eventually lead to all val-idators in their validator pool processing a particular block, and producinga transaction which contains all of their signatures. Here, our largest val-idator pool is of size O(N2), and so at least one node somewhere will haveto process O(N2) signatures. To solve this, we institute an additionalrule: when the number of validators reached by a fallback scheme exceedsO(N1), the next level of validation will return to m validators, but onelevel higher.

Now, consider the effort done at each level. In the header chain, we haveO(N1) validators, and each validator must on average perform top-levelverification on m blocks as before. On the second level, there would beO(N2) validators, and there would be a total of O(N2) blocks beingproduced, and so on average each validator would need to perform top-levelverification on m blocks. A second-level block producer would select anarea A of k second-level substates, and fill the block with third-level blocksthat have an effect inside that area; from the second-level validators pointof view, the third-level blocks look just like transactions would under anO(N2) scheme, except with the difference that the validity condition re-quires verifying m signatures. A third-level block producer would select anarea of third-level substates, and fill the block with transactions that havean effect inside that area. Assuming the second-level block producer in-cludes O(N1) third-level blocks, and a third-level block producer includesO(N1) transactions, all actors inside the system have less than O(N) load.

Suppose that the scheme is attacked, by means of a successful invalidblock with transactions at depth d and header at depth d 1. Then, chal-lenges would be placed at depth d 1, and eventually the blocks time inpurgatory would expire and the revert procedure would happen, revertingthe substate roots at depth d 1. The reversion process would itself requirean object at depth d 1 in order to process. Because the total economicvolume is O(Nk), and there are O(Nd) validators at depth d, the cost ofbribing m validators at depth d would be mNdk, and the block would in-convenience roughly O(Ndk) users, so a proportionality of attacker cost tonetwork cost is maintained. The same escalation argument as before showsthat the scheme does not become unscalable until the attacker starts makingeconomic sacrifices superlinear in the economic activity in the blockchain.

In practice, we expect that the extra complexity in implementing suchan ultra-scalable scheme will be sufficiently high that cryptoeconomic statemachine developers will stick to simpler two-level scaling models for some

36

time. However, the result is important as it shows that, fundamentally, thelevel of overhead required in order to have a blockchain of size L is roughlym log(L), a very slowly-growing value. With clever use of zk-SNARKs,perhaps even a completely constant-overhead approach for blockchains ofarbitrary size can be achieved.

37

Chapter 10

Strategy

The above sections described the algorithms that can be used to converta state transition function into validity criteria that can be used in orderto build a scalable blockchain architecture out of traditional non-scalablecomponents. However, there is also a need to develop higher-level strategiesthat would be used by validators in order to allow a maximally expressiveset of state transitions to effectively execute.

One major category of strategy that must be developed is for index se-lection. Each block producer must determine what set of substate indicesthey will be keeping up to date on the state for, and be willing to produceblocks containing. In the event that transactions requiring a larger areaappear, the groups of block producers will likely need to somehow cooper-ate in order to be able to produce a block containing the combined area.One possible strategy is for validators to arrange substate indices in a k-dimensional space, and maintain up-to-date knowledge of [i] for either aradius-r cube or at least two adjacent substates. An alternative approach,specialized to simpler state transition functions involving sending from A toB (such as that of Bitcoin), is to maintain (i, j) pairs and update i and j of-ten. A third approach is adaptive: use some kind of algorithm to determinewhich substate indices appear together most often, and try to keep trackof an increasingly contiguous cluster over time. If transaction senders alsoadapt their activities to substate boundaries, the co-adaptation process canpotentially over time create highly efficient separations.

Additionally, note that index selection also becomes a concern when oneor more blocks is in purgatory. In this case, a block producer producing ablock on top of a block in purgatory has the concern that their block maybe reverted. Hence, a possible strategy that block producer may employin that case is to compute the dependency cones of all presently challengesblocks, and refuse to produce blocks that are fully or partially inside anysuch dependency cone. In that case, however, the block producer may alsowish to check availability on the contested block themselves, and if it is

38

contested create a rescue transaction.The other category of strategy is for transaction sending, particularly

in the context of more complex state transition functions like that used inEthereum. If a transaction only affects a few neighboring substates, thenan index selection strategy as above will be able to provide easy and globaltransfer. However, what happens if an action needs to be performed that haswide-reaching effect, and it is impractical to put it into a single block? Inthis case, one option is to set up an asynchronous routing meta-protocol:if a transaction affects state [i] but also needs to effect a change in [j],then if it is acceptable for the latter change to be asynchronous we can havea message be passed through a network of contracts, first to a contractin a state neighboring [i], then hopping progressively closer to [j] duringsubsequent transaction executions until finally arriving at [j] and effectingthe desired change. Note that this is equivalent to the hypercubes [29]strategy developed for Ethereum in 2014; but instead of being a core partof the protocol, the protocol has been abstracted even further out allowinghypercubes to simply be one of the many possible strategies that the protocolcan lead to.

If the primary purpose of a blockchain is to act as a currency system,then as long as sufficient liquidity exists one can get by with a very lowamount of interoperability. For example, even if it is only possible to movefunds between substates every 1000 blocks, and even then only through asingle central hub state (which is assumed all nodes store as a strategy),that limited movement provides enough fungibility for the currency unitsin the various substates to maintain fungibility. Cross-substate paymentscan be done by means of various decentralized exchange protocols, treatingthe different substates as different currencies with the only difference beingthat the currency exchange rate will always remain equal to 1. However,it is debatable whether this approach is preferable to the other strategydescribed above of having block proposers select random (i, j) pairs.

Finally, there is the protocol-level decision of how to maintain the divi-sions between substates and grow and shrink the total number of substatesif necessary. There are several categories of solution here:

Employ an adaptive algorithms, perhaps adapting Kargers minimal-cut algorithm [30], in order to split and join substates over time. Notethat this can be done at the strategy level, giving the top-level con-sensus the right to split and join substates arbitrarily, but making iteconomically efficient for them to make splits that create a minimalcut. One disadvantage of this approach is that if transaction fees arepriced so as to disincentivize cross-substate activity, fees for specificoperations become unpredictable since substates can be rearranged.

Create new substates every time existing substates get too large, andincentivize users that marginally care less about network effect to act

39

in those substates by lowering transaction fee pricing. This removesthe ability to make arbitrary splits, but also leads to unpredictablepricing since prices do need to go up to incentivize users away fromhigh-activity substates - although the pricing would be far more evenand less discretionary.

Allow users to set the location of objects in the state with arbitraryfineness, but with increased cost for higher degrees of fineness. Hence,objects that need to be very close to each other can pay the costfor extreme co-location, whereas other objects can place themselvesfurther apart. This is the strategy undertaken in Gavin Woods fiber-chains proposal[32].

40

Chapter 11

Further optimizations

The algorithms described above are meant to be starting points, and areby no means optimal. The primary areas of optimization are likely to befourfold:

Reducing the limits imposed on state transition functions while main-taining identical levels of scalability

Achieving constant-factor gains by increasing efficiency of Merkle proofs,block sending protocols, safely reducing the value ofm, reducing churn,etc.

Increasing block speed

Making reverts less harmful by using inclusive blockchain protocols(ie. if 1...n is the set of blocks that was reverted, and

f is the

post-revert state, automatically update to f = f++1++...++n).

Providing the option for even higher gains of efficiency but at thecost of less-than-perfect guarantees of atomicity, ie. reducing m to 8with the associated cost gains but with the understanding that theaction happens within a cordoned-off area of the state where invalidtransitions will sometimes happen.

For the first of these, the most promising direction where one can findeasy immediate gains is to try to separate observed area from affected area.Theoretically, there is no interference risk if two blocks in a super-block shareobserved areas, as long as that observed area is not any blocks affected area.The challenge is (i) figuring out what hard limits to impose on a block inorder to make sure that the scheme remains scalable (eg. if there are n2blocks each using {n2 + i} as their observed area and [

n2 ] as their observed

area, the super-block will have n2

4 hashes and will thus be unscalable), and(ii) figuring out how to manage reverts.

41

If one wants to go even further, one can allow the affected area of oneblock to be the observed area of another block, as long as one can arrangeall blocks in an order such that each block is observed before it is affected.But there is a fundamental tradeoff between expressivity and fragility: themore the state transition function is able to process synchronous updates ofvery many and arbitrary states, the more costly a successful attack becomesif it must be reverted.

For the second, the largest tradeoff is likely to be that between churn andsecurity. If one can keep the same m nodes validating the same substates fora longer period of time, eg. half an hour, one may be able to reduce networkload. However, keeping a constant pool for too long makes the scheme moreattackable.

For the third, the most likely source of gains will be improvements in theunderlying non-scalable consensus algorithm used to maintain consensus onthe top-level state . Inclusive blockchain protocols such as those developedby Sompolinsky and Zohar [31] are a potential route for minimizing disrup-tion under ultra-fast block times, and such protocols can also be used forthe fourth goal of mitigating the impact of reverts.

For the fifth, an ideal goal would be to have a blockchain design whichallows users to pick a point on the entire tradeoff space between cost andprobability of failure, essentially capturing in the base protocol concepts likeauditable computation [5] (where computation is done by a third party bydefault, but if an auditor finds the third party provided an incorrect resultthe auditor can force the execution to be re-done on the blockchain and ifthe third party is indeed in the wrong it loses a security deposit from whichthe auditor will be able to claim a bounty).

Finally, there is plenty of room for optimization in strategies, figuringout how validators can self-organize in order to maximize the expressivityof the state transition function in practice. The ideal goal is to present aninterface to developers that just works, providing a set of highly robustabstractions that achieve strong security bounds, freeing developers of theneed to think about the underlying architecture, math and economics of theplatform unless absolutely necessary. But this problem can be solved muchlater than the others, and solutions can much more easily continue to beiterated even after the underlying blockchain protocol is developed.

42

Bibliography

[1] Peter Todd on merge-mining, Dec 2013: http://sourceforge.net/p/bitcoin/mailman/message/31796158/

[2] What is the story behind the attack on CoiledCoin? (Stack-Exchange, 2013): http://bitcoin.stackexchange.com/questions/3472/what-is-the-story-behind-the-attack-on-coiledcoin

[3] Amdahl and Gustafsons laws and Bernsteins conditions for paral-lelizability (Wikipedia): http://en.wikipedia.org/wiki/Parallel_computing#Amdahl.27s_law_and_Gustafson.27s_law

[4] Gregory Maxwell and jl2012, Bitcointalk: https://bitcointalk.org/index.php?topic=277389.45;wap2

[5] Scalability, Part 1: Building on top, Vitalik Buterin,September 2014: https://blog.ethereum.org/2014/09/17/scalability-part-1-building-top/

[6] Lightning Network: http://lightning.network/

[7] Vlad Zamfirs Formalizing Decentralized Consensus, coming soon.

[8] Thomas Piketty, Capital in the 21st Century: http://www.amazon.com/Capital-Twenty-First-Century-Thomas-Piketty/

dp/1491534656

[9] BAR Fault Tolerance for Cooperative Services, Amitanand S. Aiyer etal: http://cs.brown.edu/courses/csci2950-g/papers/bar.pdf

[10] Anonymous Byzantine Consensus from Moderately-Hard Puzzles: AModel for Bitcoin, Andrew Miller, Joseph J LaViola Jr: https://socrates1024.s3.amazonaws.com/consensus.pdf

[11] Merkle trees (Wikipedia): http://en.wikipedia.org/wiki/Merkle_tree

[12] Kademlia: A Peer-to-peer Information System Based on theXOR Metric http://pdos.csail.mit.edu/~petar/papers/maymounkov-kademlia-lncs.pdf

43

[13] IPDS: http://ipfs.io/

[14] Vlad Zamfir and Pavel Kravchenkos Proof ofCustody: https://docs.google.com/document/d/1F81ulKEZFPIGNEVRsx0H1gl2YRtf0mUMsX011BzSjnY/edit

[15] O(N polylog(N)) Reed-solomon erasure codes via fast Fourier trans-forms: http://arxiv.org/pdf/0907.1788.pdf

[16] nxtinside.org, describing the NXT generationsignature algorithm: http://nxtinside.org/inside-a-proof-of-stake-cryptocurrency-2/

[17] Succinct Zero-Knowledge for a von Neumann Architecture, Eli ben Sas-son et al, Aug 2014: https://eprint.iacr.org/2013/879.pdf

[18] SchellingCoin: A Zero-Trust Universal Data Feed, Vitalik Bu-terin, Mar 2014: https://blog.ethereum.org/2014/03/28/schellingcoin-a-minimal-trust-universal-data-feed/

[19] The P + Epsilon Attack, Vitalik Buterin, Jan 2015: https://blog.ethereum.org/2015/01/28/p-epsilon-attack/

[20] Truthcoin: Trustless, Decentralized, Censorship-Proof, Incentive-Compatible, Scalable Cryptocurrency Prediction Marketplace,Paul Sztorc, Aug 2014: http://www.truthcoin.info/papers/truthcoin-whitepaper.pdf

[21] Tendermint: Consensus without mining, Jae Kwon: http://tendermint.com/docs/tendermint.pdf

[22] Permissioned Distributed Ledger Systems, Tim Swanson:http://www.ofnumbers.com/wp-content/uploads/2015/04/

Permissioned-distributed-ledgers.pdf

[23] NTRUSign: Digital Signatures Using The NTRU Lattice, Jeffrey Hoff-stein et al: http://www.math.brown.edu/~jpipher/NTRUSign_RSA.pdf

[24] Hash Ladders for Shorter Lamport Signatures, Karl Gluck: https://gist.github.com/karlgluck/8412807

[25] SPHINCS: Practical Stateless Hash-based Signatures, Daniel Bernsteinet al: http://sphincs.cr.yp.to/

[26] Economics of Crime, Erling Eide, Paul H Rubin, Joanna Mehlop Shep-herd: https://books.google.ca/books?id=gr95IhFHXHoC&pg=PA45

44

[27] Tree Chains, Peter Todd: https://www.mail-archive.com/[email protected]/msg04388.html

[28] Laplacian succession: http://understandinguncertainty.org/node/225

[29] Scalability, Part 2: Hypercubes, Vitalik Buterin, October 2014:http://www.reddit.com/r/ethereum/comments/2jvv5d/ethereum_

blog_scalability_part_2_hypercubes/

[30] Kargers minimal-cut algorithm, Wikipedia: http://en.wikipedia.org/wiki/Karger%27s_algorithm

[31] Inclusive Blockchain Protocols, Yoad Lewenberg, Yonatan Sompolin-sky, Aviv Zohar: http://fc15.ifca.ai/preproceedings/paper_101.pdf

[32] Blockchain Scalability: Chain Fibers Re-dux: https://blog.ethereum.org/2015/04/05/blockchain-scalability-chain-fibers-redux/

45

Documents

Eth Scalability