Selecting Elliptic Curves for Cryptography: An E …Selecting Elliptic Curves for Cryptography: An E ciency and Security Analysis 3 We investigate the selection of prime moduli that

Selecting Elliptic Curves for Cryptography:

An E�ciency and Security Analysis

Joppe W. Bos, Craig Costello, Patrick Longa and Michael Naehrig

Microsoft Research, USA

Abstract. We select a set of elliptic curves for cryptography and analyze our selection from aperformance and security perspective. This analysis complements recent curve proposals thatsuggest (twisted) Edwards curves by also considering the Weierstrass model. Working withboth Montgomery-friendly and pseudo-Mersenne primes allows us to consider more possibili-ties which improves the overall e�ciency of base �eld arithmetic. Our Weierstrass curves arebackwards compatible with current implementations of prime order NIST curves, while pro-viding improved e�ciency and stronger security properties. We choose algorithms and explicitformulas to demonstrate that our curves support constant-time, exception-free scalar multiplica-tions, thereby o�ering high practical security in cryptographic applications. Our implementationshows that variable-base scalar multiplication on the new Weierstrass curves at the 128-bit secu-rity level is about 1.4 times faster than the recent implementation record on the correspondingNIST curve. For practitioners who are willing to use a di�erent curve model and sacri�ce afew bits of security, we present a collection of twisted Edwards curves with particularly e�cientarithmetic that are up to 1.43, 1.26 and 1.24 times faster than the new Weierstrass curves at the128-, 192- and 256-bit security levels, respectively. Finally, we discuss how these curves behavein a real world protocol by considering di�erent scalar multiplication scenarios in the transportlayer security (TLS) protocol.

1 Introduction

The �rst release of a cryptographic standard specifying elliptic curves for use in practice datesback to 2000 [21]. Nowadays, roughly one out of ten systems on the publicly observable interneto�ers cipher suites in the Secure Shell (SSH) and Transport Layer Security (TLS) protocolsthat contain elliptic curve based cryptographic algorithms [16]. Most elliptic curve standardsrecommend curves for di�erent perceived security levels. These curves are either de�ned overprime �elds or binary extension �elds; on the internet, however, the deployed elliptic curvesare mostly de�ned over prime �elds [16]. This can be partially explained by the increasingskepticism towards the security of elliptic curves de�ned over binary extension �elds (justi�edby recent progress on solving the discrete logarithm problem on such curves [25]). Therefore,in this work, we only consider elliptic curves de�ned over prime �elds.

Recently, part of the cryptographic community has been looking for alternatives to the cur-rently deployed elliptic curves that may o�er better performance and provide stronger overallsecurity (see for example an evaluation of recent curve candidates in [12]). The urge to changecurves has been fueled by the recently leaked NSA documents, which suggest the existence ofa back door in the Dual Elliptic Curve Deterministic Random Bit Generator [54]. Althoughcryptographers have suspected this at least as early as in 2007 [51], these recent revelationshave fueled a controversy on whether the widely deployed NIST curves [56] should be replacedby curves with a veri�ably deterministic generation. Besides such security concerns, there hasbeen signi�cant progress related to both e�ciency and security since the initial standardiza-tion of elliptic curve cryptography. Notable examples are algorithms protected against certainside-channel attacks, di�erent �special� prime shapes which allow faster modular arithmetic,

2 Joppe W. Bos, Craig Costello, Patrick Longa and Michael Naehrig

and a larger set of curve models from which to choose. For example, in 2007, Edwards [24]discovered an interesting normal form for elliptic curves, now called the Edwards model, whichwas introduced to cryptographic applications by Bernstein and Lange [11]. A generalization ofthis curve model, known as the twisted Edwards model [7], facilitates the most e�cient curvearithmetic [34]. Such (twisted) Edwards curves also have other attractive properties: theymay be selected to support a complete addition law and are compatible with the Montgomerymodel, which supports e�cient Montgomery ladder computations [46]. However, twisted Ed-wards curves cannot have a prime number of rational points over the base �eld, and theyare therefore incompatible with the prime-order Weierstrass curves used in all of the currentcryptographic standards [21, 47, 56].

Related Work. The NIST curves [56] have been included in numerous standards (e.g. [21, 47])and are deployed in many security protocols. The most recent speed record on the NIST curvewhich aims to provide 128-bit security is due to Gueron and Krasnov [30]. Alternatives to theNIST curves have been suggested by the German working group Brainpool [23]; their curvechoices followed additional security requirements, one of which demands veri�ably pseudo-random curve generation. Another alternative curve has been proposed by Bernstein [5]; thisis a Montgomery curve, called Curve25519, which allows e�cient computation of ECDH usingthe Montgomery ladder at the 128-bit security level. It was later shown by Bernstein et al. [9]that a twisted Edwards curve, birationally equivalent to Curve25519, can be used for e�cientelliptic curve signature generation and veri�cation. Recently, Bernstein and Lange started aproject to select and analyze secure elliptic curves for use in cryptography: see [12] for a listof the security assessments the project performs and the requirements it imposes. A rangeof curves, targeting di�erent security levels, is also presented in [12], mostly analogous toCurve25519. Following this, several new curves satisfying the requirements from [12], whichfacilitate both the twisted Edwards and Montgomery form, were proposed by Aranha et al. [3].

Motivation and Rationale. The new curves presented in [12, 3] are all e�cient and secureelliptic curves ready to be used in cryptography. This prompts the question as to why weshould perform an e�ciency and security analysis for a set of new curves. It is our opinionthat not all options for prime �elds and elliptic curve models have been considered in therecent curve proposal projects (either because they are overlooked or do not �t the require-ments set by the project). Our goal is to rigorously analyze all of these di�erent aspects fromboth a security and e�ciency perspective, in hope that this paper helps practitioners betterunderstand (and correctly implement) the choices that lie in front of them. Abandoning a setof standard curves demands a judicious selection of new curves, since this cannot be done toofrequently if widespread adoption is desired. In that light, it is our opinion that one shouldconsider all of the options available. For example, in contrast to [12, 3], our selection includesprime order Weierstrass curves. Just as the almost-prime order twisted Edwards curves havetheir practical advantages, we argue that there are also bene�ts to choosing prime order Weier-strass curves: the absence of small torsion simpli�es the point/input validation process, and(over a prime �eld of �xed length) does not sacri�ce any bits of security with respect to at-tacks on the underlying elliptic curve discrete logarithm problem (ECDLP). In addition, suchcurves are backwards compatible with current implementations supporting NIST curves overprime �elds (i.e., no changes are required in protocols), and could be integrated into existingimplementations by simply changing the curve constant and (in some cases) �eld arithmetic1.

1 Cryptographic libraries with support for generic-prime �eld arithmetic (e.g., using Montgomery arithmetic)are fully compatible with the proposed curves.

Selecting Elliptic Curves for Cryptography: An E�ciency and Security Analysis 3

We investigate the selection of prime moduli that allow e�cient modular arithmetic. Asin [5, 34, 41, 15, 12, 3], we study pseudo-Mersenne primes of the form 2α−γ, but also primes ofthe form 2α(2β−γ)−1 that can be used to accelerate Montgomery arithmetic [45] as used in [31,15]. Following the deterministic selection requirement from [12], we pick two primes of eachshape for a given targeted security level: one prime is selected to be slightly smaller than theother, which sacri�ces a small amount of ECDLP security in favor of enhanced performance.Note that, as explained in Section 2, for practical considerations we require all primes to becongruent to 3 modulo 4. These primes are used to construct cryptographically suitable curvesfocusing on (arguably) the two most relevant curve models: short Weierstrass curves with thecurve parameter a set to −3 and twisted Edwards curves with the curve parameter a set to−1. The prime order Weierstrass curves give full ECDLP security over prime �elds of a �xedbitlength, while o�ering good practical performance. On the other hand, the twisted Edwardscurves sacri�ce a small amount of ECDLP security but facilitate the fastest realization of curvearithmetic [34]. Both types of curves are selected in a deterministic fashion (see Section 3 forthe full details) and o�er twist-security [5], a property which is useful in certain scenarios.

An important requirement for implementations of modern cryptographic algorithms isa constant run-time when the algorithm computes on secret data to guard against timingattacks [38]. In particular, this potential threat exists for two basic elliptic curve operations:variable-base and �xed-base scalar multiplication. One solution is to use a complete additionlaw. However, a complete addition law is typically less e�cient compared to the dedicatedformulas which can fail for certain inputs. In Section 4 we outline another solution to thisproblem for the variable-base case. We show that our algorithms which compute on secretdata, can never run into any exceptional cases (i. e. produce incorrect results) while using thefaster dedicated formulas and ensuring a constant runtime (with the exception of the very lastaddition; see Section 4.1 for the details). Hence, this solution results in faster implementationscompared to the complete solution. In the �xed-base case the situation is more complicated:most e�cient algorithms in the literature may potentially run into exceptions. While the use ofa complete addition formula su�ces to solve the problem on twisted Edwards curves, the highcost of complete additions on Weierstrass curves would degrade performance signi�cantly [18](see Appendix C.1). To solve this problem, we propose a new formula that works for allpossible inputs by exploiting masking techniques. This �complete� addition requires the samenumber of multiplications and squarings as the unprotected dedicated addition formula anddrastically reduces the overhead of protecting scalar multiplication. We comment that theformula is also useful in the context of secure, exception-free multi-scalar multiplications. Thereader is referred to Appendix C.1 for more details on the new formula.

We do not claim full security against other attacks such as simple power analysis (SPA);this is left for future work. Nevertheless, we remark that all the selected algorithms havea regular structure as required when implementing countermeasures against certain simpleside-channel attacks.

Summary of contributions.

� Analysis of a new set of deterministically selected prime-order Weierstrass curves (seeTable 1). These are de�ned over pseudo-Mersenne and Montgomery-friendly primes whosebitlengths match those of the NIST primes. See Sections 2 and 3.

� Analysis of a new set of deterministically selected composite-order twisted Edwards curves(see Table 2 and Section 3). In contrast to existing curve proposals, the selected curves


Table 1. Summary of our chosen Weierstrass curves of the form Eb/Fp : y2 = x3 − 3x + b de�ned over Fp

with quadratic twist E′b/Fp : y2 = x3 − 3x− b and target security level λ. The group orders r = #Eb(Fp) andr′ = #E′b(Fp) are both prime, and r < p for all curves. The value under ρ complexity is an estimate for theactual security of the ECDLP against Pollard's ρ method, it is log2(

√π/4 ·

√r) rounded to one decimal.

target security λ curve name p b ρ complexity

128

w-256-mont 2240(216 − 88)− 1 85610 127.8

w-254-mont 2240(214 − 127)− 1 -12146 126.8

w-256-mers 2256 − 189 152961 127.8

w-255-mers 2255 − 765 -20925 127.3

192

w-384-mont 2376(28 − 79)− 1 27798 191.5

w-382-mont 2368(214 − 5)− 1 -133746 190.8

w-384-mers 2384 − 317 -34568 191.8

w-383-mers 2383 − 421 97724 191.3

256

w-512-mont 2496(216 − 491)− 1 99821 255.8

w-510-mont 2496(214 − 290)− 1 39053 254.8

w-512-mers 2512 − 569 121243 255.8

w-511-mers 2511 − 481 555482 255.3

w-521-mers 2521 − 1 167884 260.3

present (simultaneously) minimal parameter d in the twisted Edwards form and minimalparameter A in isogenous Montgomery curves (minimal in absolute value). See Section 3.3.

� A new, (pseudo-)complete addition algorithm for general curves in short Weierstrass form.This algorithm works for all pairs of inputs and its execution incurs only a small overheadcompared to the dedicated addition law. See Section C.1.

� E�cient �xed- and variable-base scalar multiplication algorithms that are proven to beexception-free and facilitate constant-time implementations. These algorithms can achievesuch implementations on all prime-order Weierstrass curves and all composite-order twistedEdwards curves. See Section 4.

� A comprehensive software implementation providing timings for various scenarios; thisincludes performance estimates for the above curves when used in the context of the TLSprotocol. See Sections 5 and 6.

Proposed Curves. Tables 1 and 2 show the curves that we have chosen deterministicallyaccording to our security and e�ciency criteria. The tables show the target security level,which gives a rough estimate for the desired security in each case. Curve names indicate thecurve model (w for the Weierstrass model and ed for the (twisted) Edwards model), the bitlength of the underlying base �eld prime and the type of prime (mont for Montgomery-friendlyand mers for pseudo-Mersenne primes). In Appendix D, we provide the trace of Frobenius tfor each curve, so the number of Fp-rational points for the curve E and its quadratic twist E′

can be computed as #E(Fp) = p+ 1− t and #E′(Fp) = p+ 1+ t. More details on the curvechoices and their properties are given in Section 3.

2 Modular Arithmetic - Choosing Primes

Over a prime �eld Fp (with p > 3 prime), the computation of the elliptic curve group opera-tion boils down to numerous computations modulo p. In this section we outline the types ofprimes that we prefer for e�ciency and security considerations, and discuss how the primesare uniquely determined from a �xed security level.


Table 2. Summary of our chosen twisted Edwards curves of the form Ed/Fp : −x2 + y2 = 1 + dx2y2 de�nedover Fp, where d = −(A − 2)/(A + 2), and the target security level is λ. A model for the quadratic twistis E ′d/Fp : −x2 + y2 = 1 + (1/d)x2y2. The curve Ed is birationally equivalent to the Montgomery curveEA/Fp : y2 = x3 + Ax2 + x with quadratic twist E−A/Fp : y2 = x3 − Ax2 + x. The group orders are#Ed(Fp) = 4r and #E ′d(Fp) = 4r′, where r and r′ are both prime, and 4r < p for all curves. The valued0 = −(A+2)/4 = −1/(d+1) de�nes a curve with the same group order as that given by d, i.e. #Ed0(Fp) = 4rand#E ′d0(Fp) = 4r′ = #E−(d0+1). The ρ complexity is an estimate for the actual security of the ECDLP against

Pollard's ρ method, it is log2(√π/4 ·

√r) rounded to one decimal.

target security curve name p A d0 ρ complexity

128

ed-256-mont 2240(216 − 88)− 1 -54314 13578 126.8

ed-254-mont 2240(214 − 127)− 1 -55790 13947 125.8

ed-256-mers 2256 − 189 -61370 15342 126.8

ed-255-mers 2255 − 765 -240222 60055 126.3

192

ed-384-mont 2376(28 − 79)− 1 -113758 28439 190.5

ed-382-mont 2368(214 − 5)− 1 -2870790 717698 189.8

ed-384-mers 2384 − 317 -1332778 333194 190.8

ed-383-mers 2383 − 421 -2095962 523990 190.3

256

ed-512-mont 2496(216 − 491)− 1 -305778 76444 254.8

ed-510-mont 2496(214 − 290)− 1 -2320506 580126 253.8

ed-512-mers 2512 − 569 -2550434 637608 254.8

ed-511-mers 2511 − 481 -4390390 1097597 254.3

ed-521-mers 2521 − 1 -2201762 550440 259.3

Primes of the form 2α − γ. Selecting primes of a special form to enhance the performanceof the modular reduction is not new. The primes standardized in the digital signature stan-dard [56] have a special form allowing fast reduction based on the work by Solinas [52]. Evenfaster modular reduction can be achieved by selecting primes of the form p = 2α − γ, knownas pseudo-Mersenne primes. In this case, the value α is determined by the security param-eter and is typically a multiple of 64 (or slightly smaller). The integer γ is chosen to be asmall positive integer, i.e. signi�cantly smaller than 232. Given two integers x and y such that0 ≤ x, y < 2α − γ, one can compute x · y mod (2α − γ) by �rst computing the product andwriting this in a radix-2α system as x · y = zh · 2α + z`. A �rst reduction step, based on theshape of the modulus, is zh · 2α+ z` ≡ z`+ zh · γ (mod 2α− γ) = z, where 0 ≤ z < (γ +1)2α.If this step is repeated, the result is such that 0 ≤ z < 2α + γ2, which can �nally be broughtinto the desired range by applying an additional correction modulo p using subtractions. Astandard way of enhancing the performance is to use a redundant representation: instead ofreducing z to the range [0, 2α−γ), one can often more e�ciently reduce z to the range [0, 2α),or to the range [0, 22s) if α is a few bits smaller than 2s (at a target security level of s bits).The latter case can be optimized further by computing exclusively in such a redundant formand performing a sole correction at the end of the scalar multiplication.

Given a security level of s bits, we consider the parameter α ∈ {2s, 2s− 1}. Taking α = 2smakes the prime as large as possible, matching one of the requirements to achieve maximalECDLP security at the s-bit security level. Taking α = 2s− 1 sacri�ces half a bit of ECDLPsecurity in favor of potential enhancements in e�ciency, as described above. Thus, �xing sresults in two possible values for α and subsequently two primes of the form 2α−γ: for a �xedα, we choose the smallest γ such that 2α − γ is both prime and congruent to 3 modulo 4 (therational behind this congruence condition is discussed below). Following our curve selection


criteria, the values γ for the curves under analysis are always smaller than 210, which makesthem attractive for e�cient implementation on 16, 32 and 64-bit platforms.

Primes of the form 2α(2β − γ) − 1. Another approach to select primes is inspired byMontgomery arithmetic [45]. The idea behind Montgomery multiplication is to replace therelatively expensive divisions by computationally inexpensive logical shifts when computingthe modular reduction. Some computations (and storage) can be avoided when primes of theform p = 2α(2β − γ)− 1 are used for positive integers α, β and γ (cf. [39, 37, 1, 31, 15]). Whenthe prime p is two bits short of a multiple of the word size w (i.e. w | α+β+2), one can avoida conditional subtraction in every multiplication [57].

There are di�erent ways to construct Montgomery-friendly primes: for example, [31] prefersγ to be a power of two, while [15] sets β = 64 and γ as small as possible to speci�cally target64-bit platforms. We make choices of α, β and γ such that the modular arithmetic can beimplemented e�ciently on a wide range of platforms. Given a security level of s bits, weconsider α = 8δ and β ∈ {2s − α, 2s − 2 − α}, and choose γ and δ as the smallest positiveintegers such that p = 2α(2β − γ) − 1 is prime and dlog2(p)e = 2s (resp. dlog2(p)e = 2s − 2)in the setting of β = 2s− α (resp. β = 2s− 2− α). We start with δ = 1 and increment it by1 (if necessary) until γ is found. For instance, for s = 192 and β = 2s − α, we observe that(δ, γ) = (1, 79) results in a prime which can be written as

2376(28 − 79)− 1 = 2352(232 − 224 · 79)− 1 = 2320(264 − 256 · 79)− 1,

for usage on 8-, 32- and 64-bit platforms, respectively. This has the advantage that the re-duction step, which has to be computed at every iteration inside the interleaved Montgomeryalgorithm, can be computed using only a multiply-and-add and an addition instruction. Notethat, by construction, primes of this form are always congruent to 3 modulo 4.

Constant-time modular arithmetic. One of the measures to guard software implementa-tions against various types of side-channel analysis such as timing attacks [38] is to ensurea constant running time. In practice, this often means writing code which does not containbranches depending on secret data. For instance, the interleaved Montgomery multiplicationalgorithm requires a conditional subtraction at the end. To remove this, we always computethe subtractions and select (mask) the correct value depending on the conditional �ag. In thesetting of primes of the shape 2α − γ, one must always compute the worst-case number ofreduction rounds in order to ensure constant runtime.

Besides the �standard� modular operations, there is also the need for constant-time meth-ods to compute the modular inversion and the modular square roots. In order to compute theinversion modulo a prime p, one can use Fermat's little theorem: i.e. compute ap−2 ≡ a−1

(mod p). Since our chosen primes all have a special shape, �nding e�cient addition chains forthis exponentiation is not di�cult. For the n-bit primes considered in this work, we found thatwe can always compute the modular inversion using at most 1.11dlog2(p)e modular multipli-cations and modular squarings. If p ≡ 3 (mod 4), then one can compute a modular square

root x (if it exists) of an element a using x ≡ ap+14 (mod p). Since this can be performed

e�ciently, and in constant-time, we require all of our primes to be congruent to 3 modulo 4.

3 Curve Selection

In this section we explain how the curves in Tables 1 and 2 were chosen based on the selectionof primes that is outlined in Section 2. In addition to the four primes chosen to target each


security level, we also include the Mersenne prime p = 2521 − 1 to target the 256-bit securitylevel, since Mersenne primes might have a performance bene�t. For each chosen prime p ≡ 3(mod 4), we provide two curves: one is a prime order short Weierstrass curve, while the otheris an almost-prime order twisted Edwards curve.

3.1 Curve selection for Weierstrass curves

For a �xed prime p, a speci�c curve Eb : y2 = x3 − 3x + b is uniquely determined by thecurve parameter b ∈ Fp\{±2, 0}. Note that, since p ≡ 3 mod 4, its non-trivial quadratic twistE′b has the curve equation E′b : y2 = x3 − 3x − b. In order to guarantee twist-security [5],we require both the group orders r = #Eb(Fp) and r′ = #E′b(Fp) to be prime. We haver = p + 1 − t and r′ = p + 1 + t for |t| ≤ 2

√p and demand |t| > 1 because curves with

t ∈ {0, 1} are weak. Thus, depending on the sign of the trace t, either r > p, r′ < p orr < p, r′ > p. To ease implementation, we demand that r < p for all curves, i.e. we choosethe curve with positive trace. To leave no room for manipulating the curve choice, we selectall curve parameters deterministically, namely by choosing the integer b with the smallestabsolute value that yields a curve with the above properties. Based on these considerations,the selection process is completely explained in accordance with the rigidity condition of [12].Speci�cally, we search for a suitable coe�cient b by starting with b = 1 and incrementingb by one until both r and r′ are prime. For each value of b, we use the Schoof-Elkies-Atkin(SEA) point counting algorithm [50] in Magma [17] to compute the trace t of Eb, such thatr = p + 1 − t and r′ = p + 1 + t. We use the implementation's `early abort' feature thatabandons the computation when small factors are found either in the curve's or the twist'sgroup order. Because of the curve model for E′b, the search only considers positive values ofb and we select the sign of b to ensure that r < p. The resulting curves are summarized inTable 1.

3.2 Curve selection for twisted Edwards (and Montgomery) curves

For a �xed prime p, a speci�c twisted Edwards curve Ed : −x2 + y2 = 1 + dx2y2 is uniquelydetermined by the curve parameter d ∈ Fp\{0,−1}. Let A = 21−d

d+1 , and B = −(A + 2).

Theorem 3.2 of [7] shows that the twisted Edwards curve E and the Montgomery curve By2 =x3+Ax2+x are birationally equivalent. If B is a square in Fp (which it is for all our curves),then Ed is birationally equivalent to EA : y

2 = x3+Ax2+x. As for the Weierstrass curves, wedemand t > 1 to exclude the weak curves with t ∈ {0, 1} and to ensure that 4r < p.

Ideally, it would be desirable to have a curve with minimal parameter d in the twistedEdwards form and minimal parameter A in the Montgomery form. Unfortunately, existingcurve proposals have been forced to pick one form and optimize it at the expense of theother one. We show in Section 3.3 below, that a search minimizing the absolute value of theparameter d would �nd curves with the same group orders for curve and twist, where thelatter corresponds to −(d+ 1). This means that a search for minimal absolute value of d willalways �nd positive d �rst, which corresponds to negative A. Our search thus minimizes theabsolute values of A and d at the same time.

For each �xed p, we start with A = −6 and search for A ∈ 2+4Z (subtracting 4 each time)until #EA = 4r and #E′A = 4r′, where r and r′ are both prime. Note that the discussionin Section 3.3 also shows that B = −(A + 2) is always a square in Fp, which means thatE′A : y2 = x3−Ax2+x is a model for the non-trivial quadratic twist of EA. Again, for each A,


we use the SEA algorithm [50] in Magma [17] to compute the trace t of E, which determines#EA = p + 1 − t and #E′A = p + 1 + t. Section 3.3 also shows that A2 − 4 is non-square inFp, which simpli�es notions of completeness on E (see [5]). Furthermore, we check that thecurve satis�es all conditions posed by [12], if one of them is not met2, we continue with thenext value for A. We note that the cofactors of 4 are minimal when insisting on an Fp-rationaltwisted Edwards and/or Montgomery form. The resulting curves are summarized in Table 2.

3.3 Correspondence between minimal A and d for twisted Edwards curves

Table 2 contains a column with values for the parameter d0 = −(A+2)/4, which can be used forimplementing twisted Edwards curves de�ned over our prime �elds. The curve Ed0/Fp : −x2+y2 = 1 + d0x

2y2 has the same number of Fp-rational points as the curve Ed/Fp : −x2 + y2 =1+ dx2y2 with d = −(A− 2)/(A+2) and the Montgomery curve EA/Fp : y

2 = x3 +Ax2 + x.Furthermore, the curve E−(d0+1)/Fp : −x2 + y2 = 1 − (d0 + 1)x2y2 has the same number ofFp-rational points as the quadratic twist E ′d and the quadratic twist E−A. In this section, weshow that this is true in general, and that therefore, the relation between d0 and A showsthat the value d0 is the minimal value for d de�ning Ed such that all the criteria in our curveselection are satis�ed if and only if A is the minimal such value for the Montgomery form.This shows that it is not necessary to search for a new set of twisted Edwards curves if onewants to minimize the parameter d instead of the Montgomery parameter A. One can simplyuse the curve de�ned by d0.

The following lemma connects the two twisted Edwards curves Ed and Ed0 via an isogenywhenever d0 = −1/(d+1). It also gives a condition on d0 which determines whether the mapis de�ned over Fp. If this is the case, both curves have the same number of Fp-rational points.

Lemma 1. Let Ed : −x2 + y2 = 1 + dx2y2 be a twisted Edwards curve de�ned over a prime

�eld Fp and let d0 = −1/(d + 1) ∈ Fp. Then there exists a 4-isogeny φ : Ed → Ed0. If d0 is a

square in Fp, the isogeny is de�ned over Fp, in particular #Ed(Fp) = #Ed0(Fp).

Proof. The isogeny φ is one of the isogenies described in Section 3 of [2]. This means, it is thecomposition of maps

φ = ψ−1,−1/(d+1) ◦ σ ◦ ψ−1,d.

The map ψ−1,d is the 2-isogeny ψ−1,d : Ed → L−d to the Legendre form curve L−d : y2 =

x(x − 1)(x + d) given in [2, Theorem 3.2], and ψ−1,−1/(d+1) : L1/(d+1) → E−1/(d+1) is thedual of the corresponding isogeny for 1/(d + 1). The map σ is equal to the isomorphismσ2σ1 : L−d → L1/(d+1) given in [2, Section 3.2]. The composition φ is de�ned over Fp if d0and thus −(d+ 1) is a square in Fp. This proves the lemma. ut

The next result uses the previous isogeny to show that the original curve Ed and its twistE ′d each have corresponding curves with small parameters d0 and −(d0+1), respectively, whichhave the same number of Fp-rational points, provided that both these small parameters aresquares in Fp.

2 The only instance where the �rst twisted Edwards curve we found did not ful�ll all of the safecurvesrequirements was in the search for ed-383-mers: the constant A = 1629146 corresponds to a curve-twistpair with #EA = 4r and E′A = 4r′, where r and r′ are both prime, but the embedding degree of EA withrespect to r is (r − 1)/188, which fails to meet the minimum requirement of (r − 1)/100 imposed in [12].


Lemma 2. Let A ∈ Fp \ {−2, 2}, d = −(A − 2)/(A + 2) and d0 = −(A + 2)/4 such that

both d0 and −(d0 + 1) are squares in Fp. Then #Ed(Fp) = #Ed0(Fp). Moreover, #E ′d(Fp) =#E−(d0+1)(Fp).

Proof. The �rst part follows from Lemma 1 because d0 = −1/(d+1). Since the twist E ′d = E1/d,the second part follows from Lemma 1 with d replaced by 1/d, which means that d0 is replacedby −(d0 + 1). ut

Finally, we show that indeed our search criteria, in particular the facts that p ≡ 3 (mod 4)and that both group orders are not divisible by 8, imply that d0 and −(d0+1) as given in oursetting are squares in Fp, which shows that the correspondence above holds.

Lemma 3. Let p ≡ 3 (mod 4), d0 ∈ Fp and let Ed0 : −x2 + y2 = 1 + d0x2y2 be a twisted

Edwards curve such that #Ed0(Fp) = 4r and #E ′d0(Fp) = 4r′ for primes r and r′. Then d0and −(d0 + 1) are both squares in Fp.

Proof. We �rst prove that d0 is a square in Fp. Assume that it is not a square. Section 3 in[8] provides an exhaustive description of all points of order 2 and 4 on a twisted Edwardscurve. If d0 is not a square, then −1/d0 is a square because p ≡ 3 (mod 4). Then the full2-torsion is de�ned over Fp, it consists of the a�ne point (0,−1) and two points at in�nity((1 : 0), (±

√−1/d0)) (written as completed points in projective space P1 × P1, see [8, Sec-

tion 2.7]). Let s ∈ Fp with s2 = −1/d0, then exactly one of ±s is a square, assume withoutloss of generality that it is s. Then this value gives 4 a�ne points (±

√s,±√s) (signs chosen

independently) of order 4 de�ned over Fp. The group structure of the 4-torsion on Ed0 thatis de�ned over Fp is thus Z2 × Z4 and has order 8. Therefore 8 must divide #Ed0(Fp), whichcontradicts our assumption that the group order is 4r for r prime. Hence, d0 is a square.

We know that the twist E ′d0 is birationally equivalent to E1/d0 , and we have already shownthat d0 is a square, so 1/d0 is a square. We can apply Lemma 1 with d0 replaced by 1/d0,which means that d = −(d0 + 1), and obtain that #E−(d0+1)(Fp) = #E1/d0(Fp) = 4r′. Nowlooking at the 4-torsion de�ned over Fp as above yields that −(d0 + 1) is a square in Fp. ut

The minimality of d0. All our selected twisted Edwards curves satisfy the conditions of theprevious two lemmas. Therefore, one can choose to work with the isogenous curves de�ned byd0 or −(d0+1), whichever is more convenient. The isogenous curves and their twists have thesame orders as the original curves and their twists. Therefore all conditions required in thecurve selection are satis�ed with the added bene�t of a small d-value.

We argue that d0 is of minimal absolute value de�ning a curve that satis�es the searchcriteria. Assume that A is a coe�cient with minimal absolute value that yields a desired curvewhen minimizing for the Montgomery parameter (like the values for A in our examples). Asearch that minimizes the absolute value of the parameter d in the twisted Edwards model Ed,must �nd the value d0 (or −(d0+1)) �rst since A = −(4d0+2). Without loss of generality, let|d0| < |d0 + 1|, i.e. d0 > 0, otherwise interchange d0 and −(d0 + 1). Indeed, assume that a d1with |d1| < |d0| leads to a curve that satis�es all criteria. Let A1 = −(4d1 + 2). By Lemma 3,d1 and −(d1 + 1) are squares, then by Lemma 2, #Ed1(Fp) = #Ed1(Fp) = #EA1(Fp), where

d1 = −(A1 − 2)/(A1 + 2) and #E ′d1(Fp) = E−(d1+1)(Fp) = #E1/d1(Fp) = #E−A1(Fp). Thismeans that the curve EA1 satis�es the search criteria.

Since we �xed d0 > 0, we have A < 0. By assumption, we have −A = 4d0+2 = 4|d0|+2 >4|d1|+2. Now consider the two cases d1 > 0 and d1 < 0. If d1 > 0, then A1 = −(4d1+2) < 0,


and |A| = −A > 4|d1| + 2 = 4d1 + 2 = −A1 = |A1|, contradicting the minimality of A.Similarly, if d1 < 0, then A1 > 0 and |A| = −A > 4|d1|+2 > 4|d1+1|+2 = −4(d1+1)+2 =−(4d1 + 2) = A1 = |A1|, again a contradiction. Overall, this means that d0 must be thecoe�cient with minimal absolute value.

3.4 Curve properties

In both families of curves, note that for primes of the form 2α − γ, the bitlengths of r and r′di�er by 1, since |t| � γ in general; for primes of the form 2α(2β−γ)−1, the bitlengths of r andr′ are always equal when γ 6= 0. The curves in Table 2 can be used in di�erent curve models: inthe twisted Edwards model, in the Montgomery model for implementing Montgomery ladders,and also in the original Edwards model allowing complete addition formulas [11]. The lattercan be seen as follows. Since p ≡ 3 (mod 4), EA is birationally equivalent to an Edwardscurve by [7, Theorem 3.4]. Using the maps discussed in [7, Section 3], one can show thatEA : y2 = x3 + Ax2 + x is birationally equivalent to E−1/d : x2 + y2 = 1 − (1/d)x2y2.For all of our curves, d is a square in Fp, so −1/d is not a square, which means that theaddition law on E−1/d is complete. All of the curves in Table 2 allow for an e�cient mapfrom a subset of their Fp-rational points to bit strings of a certain length, such that they areindistinguishable from uniform random bitstrings of the same length (see [10], which is basedon [28]). However, note that curves de�ned over pseudo-Mersenne primes are more suitablefor achieving indistinguishability than those over Montgomery-friendly primes because for thelatter primes p, the value (p + 1)/2 is further away from a power of 2 (see [10, �2.6]). Theprime-order Weierstrass curves presented in Table 1 are similar in their basic properties to theNIST curves, as they have the same curve model, share the parameter a = −3, and includeprime �elds of the same bit lengths as the ones for the NIST curves [56]. However, we stressthat the curves in Table 1 do not allow any room for manipulations, which can be the casewhen the curve parameter b is allowed to be chosen �randomly�. Our curves are twist-secure,do not allow transfers, and have large discriminants (notions used to guard against certainattacks; e.g., see [12]). The work in [55] shows that indistinguishability can also be achievedfor our prime-order Weierstrass curves in Table 1, however the resulting bit strings are twiceas large as those that result from applying [10, 28] to the twisted Edwards curves in Table 2.

4 E�cient, Constant-time, and Exceptionless Scalar Multiplications

To protect against certain types of side-channel attacks [38], it is essential that scalar multipli-cations are computed in constant-time. This means that the running time of the algorithm forcomputing a scalar multiplication kP must be independent of the scalar k and the point P .Classical curve arithmetic formulas have exceptional cases, i.e. they do not work for all points.Having conditional statements in the code that check for these cases means the algorithmshave a variable running time depending on di�erent input cases, but simply leaving themout might lead to exceptional point attacks that produce wrong results or cause other im-plementation errors. In this section we outline how constant-time algorithms can be achievede�ciently for our chosen Weierstrass and twisted Edwards curves in two di�erent settings:the variable- and �xed-base scenarios. The variable-base scenario refers to the case in whichthe base point P can be di�erent for each execution of the algorithm. In the �xed-base case,multiples of a public constant point can be precomputed, which allows di�erent optimizationpossibilities. In Appendix A we present an algorithm for the double-scalar scenario, which


carries out a computation of the form k1P1+k2P2 (see Algorithm 9). This occurs for examplein the veri�cation of ECDSA signatures. In this setting the veri�cation algorithm operateson public inputs only, and there is no need for protecting secret inputs against side-channels.Since the implementation does not have to run in constant-time, one can pro�t from moree�cient variable-time algorithms.

We discuss the various cases for implementing scalar multiplication for the di�erent curvemodels and algorithm choices. We list all algorithms as pseudo-code in Appendix A (scalarmultiplication, point validation, precomputation and recoding) and in Appendix B (pointoperations). The reader is referred to Appendix C for complete details on the selection ofexplicit formulas. Note that several of these algorithms contain if-statements, which are markedin the pseudo-code according to their nature. For example, some of these statements occur inalgorithms that are only run on public inputs and do not need to run in constant time; someof them are implemented in constant time via masking techniques; and some of them are theremerely to allow us to represent several algorithms in one pseudo-code algorithm environmentand to re-use the di�erent variants in di�erent scenarios. As soon as a speci�c scenario ischosen, these statements are always executed under the same condition. The remaining if-statements are the ones that when implemented introduce data-dependent branches into thealgorithms. They occur only in algorithms for point doubling, point addition and merged pointdoubling/addition, where they correspond to exceptions, i.e. the exceptional cases for whichthe given formulas are not valid. But, whenever the implementation needs to be constant-time, the conditions for entering these if-statements are always false such that they are neverexecuted (and can be removed in the code). Below, we argue that indeed no exceptionalcases occur and that the proposed algorithms can be implemented to run in constant time(when used as described in the algorithms in Appendix A). Note that the neutral element onWeierstrass curves is the point at in�nity, i.e. the point (0 : 1 : 0) in projective coordinates,while on twisted Edwards curves the neutral element is the rational point (0, 1), and in theMontgomery ladder the neutral element is (X : Z) = (0: 0). In this paper, they are all denotedby O.

4.1 Weierstrass Scalar Multiplications

Let Eb/Fp be any of the Weierstrass curves in Table 1, with r = #Eb(Fp) prime. Let kbe an integer scalar and P = (x1, y1) ∈ Fp × Fp. We consider the computation of e�cient,constant-time and exception-free scalar multiplications in two scenarios.

The variable-base scenario. On input of the scalar k and variable point P = (x1, y1),perform the following steps.

1. Validation: Validate that k ∈ [1, r) and that P = (x1, y1) ∈ Eb(Fp) \ {O} by checkingthat y21 = x31 − 3x1 + b. Otherwise, return false (see Algorithm 2).

2. Precomputation: For a �xed window size 2 ≤ w < 10, compute the 2w−2 multiples{P, 3P, . . . , (2w−1−1)P} of P , and store them in a lookup table. This precomputation canbe achieved using one point doubling and 2w−2 − 1 point additions3 (see Algorithm 4).

3. Scalar recoding: Convert the scalar k to odd (if even) and recode it into exactly dlog2(r)/(w−1)e+ 1 odd, signed, non-zero digits in {±1,±3, . . . ,±(2w−1 − 1)} (see Algorithm 6).

4. Evaluation: Compute kP using a �xed window with the precomputed values from the pre-vious step. This requires exactly (w−1)dlog2(r)/(w−1)e point doublings and dlog2(r)/(w−

3 Except for when w = 2, where this comes for free.


1)e point additions, or (w− 2)dlog2(r)/(w− 1)e+1 point doublings, dlog2(r)/(w− 1)e− 1point doubling-additions and one addition when w > 2. Note that every time an additionis performed, we also negate the selected point in the look-up table, and choose the correctone according to the sign of the digit in the recoded scalar. This is repeated until the lastiteration, when crucially, the �nal addition is performed via a �complete masked� addition(see Appendix C.1). The �nal result is negated if the original value of k was even.

This can be computed as outlined in Algorithm 1 in Appendix A.

Proposition 4. When computing variable-base scalar multiplications on any of the Weier-

strass curves in Table 1 using Algorithm 1 to implement the steps above, no exceptions occur.

Before proving the proposition, we �x notation to partition the non-zero points in a primeorder subgroup of the group Eb(Fp). For a �xed point P ∈ Eb(Fp) \ {O}, the map [1, r) →Eb(Fp)\{O}, k 7→ kP is a bijection. It induces a partition of Eb(Fp)\{O} = Sodd∪Seven intotwo equally sized sets, where Sodd = {kP | k ∈ [1, r) odd} and Seven = {kP | k ∈ [1, r) even}.Let T = {P, 3P, . . . , (2w−1 − 1)P} ⊂ Sodd and T−1 = {(r − 1)P, (r − 3)P, . . . , (r − (2w−1 −1))P} ⊂ Seven. The set T−1 contains the inverses of the points in the set T .

Proof. To exclude any exceptions in the course of Algorithm 1, we consider all of its dou-bling, addition and merged doubling/addition operations. First of all, it is easy to see that alldoubling and addition steps for building the look-up table are exception-free. Note that thelook-up table consists exactly of the points in the set T de�ned above. The precomputation asshown in Algorithm 4 starts by doubling P with Algorithm 10. The algorithm works for thepoint at in�nity O when de�ned as (0 : Y1 : 0) with Y1 6= 0, but the case P = O is excludedby point validation, and it does not have any exceptions since there are no points of order 2 inthe group E(Fp). The points for the look-up table are then computed by adding 2P ∈ Sevento points from T ⊂ Sodd only, i.e. the input points to the additions are always di�erent anddo not include O. Also −2P = (r − 2)P is not among these points because 2w−1 − 1 < r − 2(note 2 ≤ w < 10).

The operations in the evaluation stage depend on the recoding of the scalar k, which atthis point in the algorithm satis�es 0 < k < r. Let t = dlog2(r)/(w − 1)e, then with notationas in Algorithm 1, the scalar can be written as

k =

t∑i=0

si|ki|2(w−1)i,

where si ∈ {−1, 1} and ki ∈ Z with 0 < |ki| < 2w−1. The recoding used here guaranteeskt > 0 such that st = 1 and |kt| = kt. Throughout the evaluation stage, the variable Q is usedto denote the running value during the algorithm. At any stage, there is some z ∈ [0, r) suchthat Q = zP . Let z1 > 0 and z2 = 2w−1z1 ± z0 with z0 ∈ {1, 3, . . . , 2w−1 − 1}, then z2 ≥ z1.If z1 > 1, we even have z2 > z1. This means that whenever a positive integer is doubledw − 1 times and then an integer corresponding to one of the elements in the look-up table iseither added or subtracted, the result cannot be smaller than the original integer. Thus, inthe evaluation stage of Algorithm 1, after each sequence of w − 1 doublings and one additionstep, the value z of the running point Q cannot decrease.

The evaluation stage begins with choosing an element from the lookup table T and as-signing it to Q. After the �rst assignment, we have z ∈ {1, 3, . . . , 2w−1 − 1}. All the doublingoperations in Lines 11, 14 and 18 of Algorithm 1 are done using Algorithm 10. Therefore, for


the same reasons as explained above there are no exceptions possible in these steps. The lastaddition in Line 19 is done with a complete addition formula and hence also does not have anyexceptional cases. It now su�ces to ensure that all remaining addition steps (i.e. in Lines 12and 15) do not run into exceptions.

First, assume that an exceptional case occurs in one of the additions in Step 15, whichcomputes Q + R for R ∈ T ∪ T−1 using Algorithm 12. Note that none of the doubling stepscan ever output O because there are no points of order 2 and O is never input to any of themsince the running value Q always has 1 < z < r for all points input to doubling steps priorto any of the additions in Step 15. Thus the only exceptional cases that could occur in thisalgorithm, are the cases where Q = ±R. This means that either Q ∈ T or Q ∈ T−1. Since Qis the output of a non-trivial doubling operation, we have Q ∈ Seven which excludes Q ∈ Tand means that Q ∈ T−1. Therefore, Q = zP with z ≥ r − (2w−1 − 1). After each additionin Step 15 there are always w − 1 doublings that follow. Hence, the minimal value for z thatcan occur after the exceptional addition and the following doublings is 2w−1(r− 2(2w−1− 1)).The addition of a table element immediately after these doublings, can bring down this valueto the minimal zmin = 2w−1(r − 2(2w−1 − 1)) − (2w−1 − 1) = 2w−1r − (2w + 1)(2w−1 − 1).This value is larger than r, because otherwise, it follows that r ≤ 2w + 1, which is not truefor any of our curves. Given the observation that a positive integer does not decrease afterany sequence of w − 1 doublings and a following addition of an integer corresponding to alook-up table element, the scalar k cannot be reached any more as the �nal value for z afterthe exceptional addition. This contradicts any exceptions in the additions of Step 15.

Next, assume that an exception occurs in one of the steps in Line 12 of Algorithm 1.This step is a merged doubling and addition step and is computed via Algorithm 11. Thealgorithm computes 2Q + R for R ∈ T ∪ T−1 as (Q + R) + Q. For the same reasons asabove, the input point Q cannot be equal to O. Since R ∈ T ∪ T−1, we have R 6= O. The�rst addition Q + R could have the same exceptions as the additions in Step 15 treated inthe previous paragraph. This means that an exception can only be Q ∈ T−1 as above andagain we look at the minimal value zmin after carrying out the exceptional addition, theaddition of Q and the following w − 1 doublings and subsequent addition (also the stepsincluding the merged doubling and addition algorithm can be treated as such). This value iszmin ≥ 2w−1 · (2r−3(2w−1−1))− (2w−1−1) = 2wr− (3 ·2w−1+1)(2w−1−1). Again, this valueis larger than r, because otherwise we would have r ≤ 3 ·2w−1+1, which does not hold for ourcurve parameters. As above this means that the scalar k < r cannot be reached as the �nalvalue of z, contradicting any exception in the �rst addition in (Q+R)+Q. Finally, we assumethat there is an exception in the second addition. We have already excluded Q = O andQ+R = O. Hence, the only two possibilities for an exception are Q+R = Q or Q+R = −Q.The �rst condition means that R = O which is not possible since R ∈ T ∪ T−1. We are thusleft with the condition 2Q = −R and hence either 2Q ∈ T or 2Q ∈ T−1. Since 2Q ∈ Seven, itcannot be in T , which leaves 2Q ∈ T−1. This means that 2z ≥ r − (2w−1 − 1). The minimalvalue zmin after the computation (Q+R) +Q and the following w− 1 doublings and anotheraddition is zmin ≥ 2w−1(r−2(2w−1−1))−(2w−1−1) = 2w−1r−(2w+1)(2w−1−1). Again, thisvalue is larger than r, leaving no way to achieve the scalar k during the remaining computation.This excludes all exceptions in Line 12 and therefore all exceptions in Algorithm 1. ut

Given that the recoding always produces a �xed length for the scalar, this means that after asuccessful validation step, we do not execute any conditional statements.


The �xed-base scenario. In this setting, the point P is �xed (e.g., as a public parameterof the system), so multiples of P can be precomputed o�ine and used to speedup the onlinecomputation of kP . In terms of performance, it might be di�cult to select the �optimal� sizeof the precomputed table. A larger table with more multiples of P typically means a reducednumber of elliptic curve operations at runtime, but such tables might result in cache-misseswhich can result in a performance penalty. Moreover, when one wants to extract elementsfrom this table in a cache-attack resistant manner, one should access every element and maskout the correct value to avoid leaking access patterns. Hence, using a larger table implies anincreased access cost for every table-lookup.

This is not the only problem with large precomputed tables. As far as we know, onecannot show (for all inputs) that a current active point in the �xed-base scalar multiplicationwill not be the same (or have an opposite sign) as one of the many precomputed values.Although this might happen only with extremely low probability, such that honest partiesmay never encounter this by accident, active adversaries could manipulate such scalar/pointcombinations to force exceptions. This means that, unlike the variable-base multiplication, theimplementation of the group law must cover exceptional cases. One solution is to use completeformulas (which have no exceptional cases). Unfortunately, the complete Weierstrass formulasfrom [18] (see Appendix C.1) are expensive compared to their incomplete counterparts, andusing these would incur a much larger relative penalty than the complete formulas on (twisted)Edwards curves do. Another possible solution is to always compute two candidates for theaddition, C1 = 2P and C2 = P + L, and select (in a constant time manner) C1 if P = L,O if P = −L, L if P = O, P if L = O, and C2 otherwise. At a �rst glance this approachseemingly increases the cost of an addition to be at least that of computing both an additionand an doubling. However, we present a solution which achieves this same behavior withoutincreasing the number of modular multiplications or squarings required in a dedicated pointaddition (see Algorithm 18). The idea is to exploit the similarity in the doubling and additionroutines by masking out the correct operands �rst, and using these as inputs to the arithmeticoperations. This allows us to keep the total number of multiplications and squarings exactlythe same as when computing just the dedicated point addition. Hence, Algorithm 18 worksfor any input points, does not have any exceptional cases and has roughly the same run-timeas a dedicated point addition.

For a scalar k and the �xed point P = (x1, y1), we make use of these formulas to performthe following steps.

O�ine computation.

1. Point validation: Validate that P = (x1, y1) ∈ Eb(Fp) \ {O} by checking that y21 =x31 − 3x1 + b. Otherwise, return false (see Algorithm 2).

2. Precomputation: For a �xed window size 2 ≤ w < 10, compute v > 0 tables of 2w−1

points (each) for the mLSB-set comb method (see Line 2 of Algorithm 7). Convert allpoints in the lookup table to a�ne form.

Online computation.

3. Scalar validation: Validate that the scalar k ∈ [1, r). Let the maximum bit-length of allvalid scalars be t = dlog2(r)e.

4. Recoding: Convert the scalar k to odd (if even) and recode it into the mLSB-set repre-sentation (see Algorithm 8).


5. Evaluation: Using the precomputed values from the o�ine precomputation, compute kPwith exactly d t

w·v e−1 point doublings and vd tw·v e−1 point additions4. All point additions

are computed using the �complete masked� approach in Algorithm 18 in Appendix C.1.The �nal result is negated if the original value of k was even.

This approach is outlined in Algorithm 7 in Appendix A.

Proposition 5. When computing �xed-base scalar multiplications on any of the Weierstrass

curves in Table 1 using Algorithm 7 to implement the steps above, no exceptions occur.

Proof. Following the proof of Proposition 4, point doublings computed via Algorithm 10 donot fail for any rational points in Eb(Fp) for any of the curves Eb in Table 1. Furthermore,Algorithm 10 also correctly computes doublings at the point at in�nity, O. Thus, no excep-tions can arise in point doublings; and, since all online additions are implemented using the�complete� masking technique described in Appendix C.1, it follows that no exceptions canarise at any stage of the online computation (o�ine computations can also make use of thistechnique if necessary). ut

4.2 Twisted Edwards Scalar Multiplications

Let Ed/Fp : − x2 + y2 = 1 + dx2y2 be any of the twisted Edwards curves in Table 2, with#E(Fp) = 4r for r prime. In a similar vein to [33, 5], we avoid small subgroup attacks by

requiring all scalar multiplications to include a cofactor 4. Thus, let the integer k be de�nedas k := 4k with k ∈ [1, r), and let P = (x1, y1) be in Fp × Fp.

The variable-base scenario. On input of k and (variable) P = (x1, y1) ∈ Fp × Fp, weperform the following steps.

1. Validation: Validate that k ∈ [4 · 1, 4 · 2, . . . , 4(r − 1)]. Validate that P = (x1, y1) ∈Ed(Fp) \ {O} by checking that −x21 + y21 = 1 + dx21y

21 and that P 6= (0, 1) = O (see

Algorithm 3). Otherwise, return false.2. Clear torsion: Compute Q← [4]P using two consecutive doublings (as in Algorithm 3).3. Revalidation: Validate that (the projective point) Q 6= O. If not, reject.4. Precomputation: Compute the 2w−2 odd, positive multiples {Q, 3Q, . . . , (2w−1 − 1)Q}

of Q, and store them in a lookup table. This precomputation can be achieved using onepoint doubling and 2w−2 − 1 point additions5 (see Algorithm 4).

5. Scalar recoding: Using a window size of 2 ≤ w < 10, convert the updated scalar k :=k/4 ∈ [1, r−1] to odd (if even) and recode it into exactly dlog2(r)/(w−1)e+1 odd, signed,non-zero digits in {±1,±3, . . . ,±(2w−1 − 1)} (see Algorithm 6).

6. Evaluation: Compute kP as kQ, using exactly (w− 1)dlog2(r)/(w− 1)e point doublingsand dlog2(r)/(w − 1)e point additions. Note that every time an addition is performed, wealso negate the selected point in the look-up table, and choose the correct one accordingto the sign of the digit in the recoded scalar. This is repeated until the last iteration, whencrucially, the �nal addition is performed using the uni�ed formula in [34, Eq. (5)]. The�nal result is negated if the original value of k was even.

This computation is given in Algorithm 1 in Appendix A.

4 We note that this cost increases by a single point addition when wv | t, since an extra precomputed pointis needed in this case.

5 Again, except for when w = 2, where this comes for free.


Proposition 6. When computing variable-base scalar multiplications on any of the twisted

Edwards curves in Table 2 using Algorithm 1 to implement the steps above, no exceptions

occur.

Proof. The �rst 3 steps (validation, clear torsion, and revalidation) detailed in Section 4.2ensure that the point Q has large prime order r. Furthermore, only elements of 〈Q〉 areencountered after the revalidation stage, meaning that Corollary 1 from [34] can be invoked tosay that the additions in Algorithm 15 (from [34], but extended according to the representationsuggested in [31]) will never fail to add points P and Q of odd order, except when P = Q.This corollary also tells us that the formulas for point doubling in Algorithm 14 never failfor points of odd order. Similar to the addition formulas, these doubling formulas, which arefrom [7], are extended according to [31]. Thus, the proof from this point is identical to theproof of Proposition 4: we partition the elements in 〈Q〉\{O} into Sodd and Seven to categorizethe elements in the look-up table, and use this to show that the running value that is inputinto point additions can never be equal to an element in the look-up table, except possibly inthe �nal addition, where we use the formula in [34, Eq. (5)], which is slightly slower, but isexception-free in 〈Q〉. ut

The �xed-base scenario. Let P = (x1, y1) ∈ Fp ×Fp be a �xed point and let k = 4k be aninteger scalar, which is a multiple of the cofactor 4. Then perform the following steps.

O�ine computation.

1. Validation: Validate that k ∈ [4 · 1, 4 · 2, . . . , 4(r − 1)]. Validate that P = (x1, y1) ∈Ed(Fp) \ {O} by checking that −x21 + y21 = 1 + dx21y

21 and that P 6= (0, 1) = O (see

Algorithm 3). Otherwise, return false.2. Clear torsion: Compute Q← [4]P using two consecutive doublings (see Algorithm 3).3. Revalidation: Validate that Q 6= O. If not, reject.4. Precomputation: For a �xed window size 2 ≤ w < 10, compute v > 0 tables of 2w−1

points (each) for the mLSB-set comb method (see Line 2 of Algorithm 7) � convert allpoints in the lookup table to a�ne form.Online computation.

5. Recoding: Convert the updated scalar k := k/4 to odd (if even) and recode it into themLSB-set representation (see Algorithm 8).

6. Evaluation: Using the precomputed values from the o�ine precomputation, compute kPas kQ with exactly d t

w·v e−1 point doublings and vd tw·v e−1 point additions6. Every one of

these additions is computed using the uni�ed formulas from [34, Eq. (5)]. The �nal resultis negated if the original value of k was even.

Algorithm 7 in Appendix A outlines this computation.

Proposition 7. When computing �xed-base scalar multiplications on any of the twisted Ed-

wards curves in Table 2 using Algorithm 7 to implement the steps above, no exceptions occur.

Proof. As in the proof of Proposition 6, we start by noting that the (updated) point Q hasodd order r, and that we only compute on elements in 〈Q〉. The only algorithm we use foronline additions corresponds to the formulas in [34, Eq. (5)], which do not fail for any pair ofinputs in 〈Q〉. Additionally, the only algorithm we use for doublings is Algorithm 14 (from [7]),which is also exception-free on all inputs from 〈Q〉. ut6 Again, we note that when wv | t, an extra precomputed point is needed.


4.3 The Montgomery Ladder

Let EA/Fp : y2 = x3+Ax2+x be the Montgomery form of any of the curves in Table 2, with

#EA(Fp) = 4r, for r a large prime. Since the Montgomery ladder is not compatible with therecoding techniques discussed in Section 4, we take the following route to guarantee a �xedlength scalar. For all k ∈ [1, r − 1], we use the updated scalar k = 4(αr + k), where α is thesmallest positive integer such that αr+1 and (α+1)r−1 have the same bitlength; α is speci�cto r, but for each of the curves in Table 2 we have α ∈ {1, 2, 3}. Note that scalar multiplicationby k corresponds to scalar multiplication by 4k on EA, which thwarts small subgroup attacksin the same vein as the twisted Edwards scalar multiplications in Section 4.2.

On input of k and x1 ∈ Fp, we perform the following steps.

1. Scalar validation: First validate that k ∈ 4Z, and then that the integer k/4 ∈ [αr +1, (α+ 1)r − 1]. Otherwise, reject.

2. Evaluation: Process the scalar by inputting k and (x1 : 1) into the standard (X : Z)-onlyMontgomery ladder routine [46, �10], with constant (A+2)/4 in the addition formula. Sincek = 4(αr + k), this can be done by inputting the �xed-length scalar k/4 = αr + k and(x1 : 1) into the Montgomery ladder to give (X1 : Z1), before �nishing with two repeated,standalone Montgomery doublings of (X1 : Z1) to give (X : Z) = 4(X1 : Z1)

3. Normalize: If Z = 0, return O, otherwise return x1 = X/Z.

Notice that there is no validation of the input coordinate x1 ∈ Fp, i.e. that we do not checkwhether x31 +Ax21 + x1 is a square in Fp, so that x1 corresponds to a point (or points) on EA.Avoiding this check in the presence of twist-security is due to Bernstein (cf. [5]), since even ifx1 corresponds to a point on the quadratic twist E′A, the output of the Montgomery laddercorresponds to a scalar multiplication on E′A, because scalar multiplications on both curves

use the same constant (A + 2)/4. In this case, multiplication by k = 4(αr + k) on E′A nolonger corresponds to the scalar 4k, but rather to the scalar 4k′, where k′ ≡ (αr + k) mod r′

for #E′A(Fp) = 4r′. This is not a problem in practice since the cofactor of 4 still clears torsionon the twist, and the twist-security ensures that the discrete logarithm problem has a similardi�culty in E′A(Fp) as it does in EA(Fp). Following the arguments developed in [4] (see also [5,App. A-B]), it could be possible to prove that no exceptions can occur in Montgomery ladderimplementations of the curves in Table 2 that follow Steps 1-3 above, subject to addressingthe issues below.

It should �rst be pointed out that the lack of validation means that there are somescalar/point combinations which could produce exceptions. For example, suppose k is cho-sen as the unique integer less than r′ such that k ≡ −αr mod r′. If k is also less than r, thenk := 4(αr+k) is a valid scalar according to Step 1 above. But, if an unvalidated x-coordinate,say x′1, corresponds to a point P ′1 on E′A, then kP1 = O, because (αr + k) ≡ 0 mod r′; notethat outputting O in Step 3 above could leak information to an attacker. Furthermore, inpractice these ladder implementations are often used in conjunction with non-ladder imple-mentations on (most likely a twisted Edwards model of) the same curve � see Section 6. Insuch a scenario, the re�ned forms of the scalars in this section do not match the forms of thescalars in Section 4.2, so if the scalars above were to be used on the twisted Edwards form ofEA, then Proposition 6 and Proposition 7 no longer provide any guarantees. More speci�cally,if an implementation synchronizes the inherently larger Montgomery ladder scalars above toalso be used on the twisted Edwards curve, then the argument of k ∈ [4, 8, . . . , 4(r − 1)] thatwas used in the proof of Proposition 6 no longer holds when α > 0. Roughly speaking, the


fact that k/4 is now outside the range [1, r − 1] means that the running multiple of an inputpoint can now reach the dangerous stage of a scalar multiplication (which we handle by usingcomplete additions) before the �nal addition.

In the Montgomery ladder implementation of Curve25519 [5], and in the complementaryEdwards �Ed25519� implementation [9], it seems that the above problems are overcome byrestricting the set of permissible scalars to be of a lesser cardinality than the prime subgrouporder. Namely, Curve25519 has r, r′ > 2252, with all scalars being of the form k = 8 · (2251+k)for k ∈ [0, 2251 − 1]. As well as guaranteeing that all of the possible scalars k have the samebitlength, this prevents the existence of a k such that k ≡ 0 mod r or k ≡ 0 mod r′. On theother hand, it also means that for a �xed base point P of order r on Ed25519, less than halfof the elements in 〈P 〉 are possible outputs when computing scalar multiplications of P .

As one potential alternative, we remark that a hybrid solution which uses both Montgomeryand twisted Edwards scalar multiplications could parse scalars di�erently: k ∈ [0, r− 1] couldbe modi�ed to k := 4(αr + k) in the Montgomery implementation, but modi�ed to k := 4kin the twisted Edwards implementation. If, in addition, all x-coordinates were validated inStep 1 of the Montgomery ladder routine7, then this may well be enough to prove that allscalar multiplications compute correctly and without exception: Proposition 6 would thenapply directly to the twisted Edwards part, while the techniques in [5, 4] could be used toprove the Montgomery ladder part.

5 Implementation Results

To evaluate the performance of the selected curves, we developed a software library8 thatincludes support for three scenarios: variable-based, �xed-based and double-scalar multiplica-tion. The library can perform arithmetic on a = −1 twisted Edwards, a = −3 Weierstrass, andMontgomery curves and supports all of the new curves from Section 3 (see Tables 1 and 2).The implementation of the library is largely in the C-programming language with the modulararithmetic implemented in x64 assembly. We plan to add support for other popular platformslike the ARM architecture, and explore implementation options using vector instructions andmethods such as Karatsuba multiplication [36] for larger moduli sizes.

Table 3 shows the performance details of scalar multiplication in the three scenarios ofinterest. Variable-base scalar multiplication is computed with the �xed-window method (seeAlgorithm 1 in Appendix A) using window width w = 6, except for twisted Edwards atthe 256-bit security level which uses w = 7. Fixed-base scalar multiplication was computedusing the mLSB-set method (see Algorithm 7 in Appendix A) using parameters w = 6 andv = 3 for the Weierstrass curves and w = 5 and v = 4 for the twisted Edwards curves.These values correspond to precomputed tables of sizes: 6KB, 9KB and 12KB at the 128-,192- and 256-bit security levels, respectively. Double-base scalar multiplication was computedusing the wNAF method with interleaving (see Algorithm 9 in Appendix A) using windowwidth w1 = 6 for the variable base and w2 = 7 for the �xed base. The latter correspondsto precomputed tables with sizes: 2KB, 3KB and 4KB for Weierstrass curves at the 128-,192- and 256-bit security levels, respectively, and 3KB, 4.5KB and 6KB for twisted Edwardscurves at the 128-, 192- and 256-bit security levels, respectively. The results (expressed in

7 Validating that x1 ∈ Fp corresponds to EA would incur the small relative cost of an exponentiation and afew multiplications: namely, we reject x1 if (x31 +Ax21 + x1)

(p−1)/2 = −1.8 A version of the library that supports a subset of the curves presented in this work is available athttp://research.microsoft.com/en-us/downloads/149804d4-b5f5-496f-9a17-a013b242c02d/.


Table 3. Experimental results for variable-base, �xed-base and double-scalar multiplication. The results(rounded to thousand cycles) are the average of 104 runs of the scalar multiplication including the �nalmodular inversion to convert the result to its a�ne form. These results have been obtained on a 3.4GHz IntelCore i7-2600 Sandy Bridge processor with Intel's Turbo Boost and Hyper-threading disabled. The library wascompiled with Visual Studio 2012 on Windows 7 OS.

security level curve name variable-base �xed-base double-scalar

128

w-256-mont 280, 000 115, 000 287, 000

w-256-mers 281, 000 119, 000 288, 000

ed-256-mont 234, 000 93, 000 237, 000

ed-254-mont 196, 000 76, 000 198, 000

ed-256-mers 234, 000 95, 000 237, 000

ed-255-mers 226, 000 94, 000 228, 000

m-254-mont 224, 000 N/A N/A

m-255-mers 245, 000 N/A N/A

192

w-384-mont 795, 000 281, 000 812, 000

w-384-mers 744, 000 277, 000 761, 000

ed-384-mont 659, 000 243, 000 672, 000

ed-382-mont 591, 000 225, 000 605, 000

ed-384-mers 617, 000 231, 000 624, 000

ed-383-mers 596, 000 227, 000 607, 000

m-382-mont 672, 000 N/A N/A

m-383-mers 666, 000 N/A N/A

256

w-512-mont 1, 762, 000 576, 000 1, 821, 00

w-512-mers 1, 543, 000 517, 000 1, 592, 000

w-521-mers 1, 881, 000 − −ed-512-mont 1, 461, 000 504, 000 1, 507, 000

ed-510-mont 1, 338, 000 466, 000 1, 369, 000

ed-512-mers 1, 293, 000 446, 000 1, 320, 000

ed-511-mers 1, 241, 000 436, 000 1, 274, 000

ed-521-mers 1, 552, 000 − −m-510-mont 1, 496, 000 N/A N/A

m-511-mers 1, 392, 000 N/A N/A

m-521-mers 1, 729, 000 − −

terms of computer cycles) were obtained by running and averaging 104 iterations of eachcomputation on an Intel Core i7-2600 (Sandy Bridge) processor with Intel's turbo boost andhyper-threading disabled. The variable- and �xed-base scalar multiplication routines have aconstant running time which guards against various types of timing attacks [38, 20], includingcache attacks [49] (e.g., see [19] in the asymmetric setting). This means that no conditionalbranches on secret data or secret indexes for table lookups are allowed in the implementations.

Our results suggest that reducing the size of the pseudo-Mersenne primes does not have asigni�cant e�ect on the performance: below a factor 1.04 reduction of the running time at theexpense of roughly half a bit of ECDLP security. However, using slightly smaller moduli in thesetting of the Montgomery-friendly primes does pay o�: a reduction of the running time by afactor 1.19, 1.12, and 1.09 at the 128-, 192-, and 256-bit security level, respectively. This perfor-mance di�erence between pseudo-Mersenne and Montgomery-friendly primes can be explainedby the fact that the �nal constant-time conditional subtraction in Montgomery multiplicationcan be omitted when reducing the modulus size appropriately. The size-reduced Montgomery-friendly primes are the best choice (with respect to performance) at the 128- and 192-bit


security levels while the size-reduced pseudo-Mersenne prime is faster for the 256-bit securitylevel. For full-word length moduli, Montgomery-friendly and pseudo-Mersenne primes achievesimilar performance at the 128-bit security level, whereas full-word length pseudo-Mersennemoduli are the best option for the 192- and 256-bit security levels. The better performance ofpseudo-Mersenne primes at high security levels can be explained by the inherent higher regis-ter pressure in our Montgomery-friendly implementations which results in more load and storeoperations for large moduli sizes. The faster arithmetic operations in the base �eld translatedirectly to optimizations in the di�erent scenarios for the scalar multiplication. For the sakeof comparison, we also include performance numbers for curves de�ned over the Mersenneprime p = 2521 − 1 (at the 256-bit security level). The performance of w-521-mers and ed-521-mers compared to our corresponding 512-bit Weierstrass and twisted Edwards curves atthe 256-bit security level (w-512-mers and ed-512-mers) degrades by a factor 1.20. This ismainly due to the higher cost of modular multiplication, which performs computations overelements containing one more computer word in comparison with the 512-bit prime options.

In the setting of the variable-base scalar multiplication our twisted Edwards implementa-tion consistently outperforms the Montgomery ladder. This does not come as a surprise sinceour twisted Edwards curves allow one to use the e�cient curve arithmetic from [34] and thee�cient �xed-window method which exploits precomputations. Note that the state-of-the-artMontgomery ladder implementation of Curve25519 [5] is 1.15 times faster than our ladder al-gorithm at the 128-bit security level (when considering the benchmark machine �sandy� [13]).This can largely be explained by the signi�cant level of code optimization that went into theimplementation of [5], which includes the �ne-tuning of the full curve arithmetic at the assem-bly level. Still, without resorting to full use of assembly, our best result using twisted Edwardsform (196, 000 cycles) is virtually equivalent to the Montgomery ladder performance numbersfrom Curve25519 (194, 000 cycles).

The recent software records for the NIST P-256 curve [30] can compute a variable-basescalar multiplication in 400 thousand cycles on a Sandy Bridge CPU. Our curve w-256-

mont o�ers better security properties and results in a 1.43 times reduction of the runningtime compared to [30]. When switching from prime order Weierstrass curves using full sizemoduli to composite order twisted Edwards curves with size-reduced moduli one can expecta reduction in the running time by a factor between 1.24 and 1.43 at the price of a slightdecrease in ECDLP security.

6 Real-World Protocols

Although signi�cant research has been devoted to optimize the most popular ECC operation(the variable-base scalar multiplication), in real-world cryptographic solutions it is often not assimple as computing just a single scalar multiplication with an unknown base. Cryptographicprotocols typically require a combination of di�erent types of scalar multiplications including�xed-, variable-base and multiple-scalar operations. In this section we study the TLS protocol,more speci�cally the computation of the TLS handshake using the ECDHE-ECDSA ciphersuite. We outline the impact of using di�erent curve and coordinate systems in practice.

TLS with perfect forward secrecy. Support for using elliptic curves in the TLS protocolis speci�ed in RFC 4492 [14]. The cipher suites speci�ed in this RFC use the elliptic curveDi�e-Hellman (ECDH) key exchange, whose keys may either be long-term or ephemeral. Wefocus our analysis on the latter case (denoted by ECDHE) since it o�ers perfect forward


Table 4. Costs estimates for the TLS handshake using the ECDHE-ECDSA cipher suite for di�erent securitylevels where we consider the elliptic curve scalar multiplications. Costs in cycles are estimated using theperformance numbers from Table 3. Estimates for the total cost correspond to the full handshake ECDHE-ECDSA involving authentication in both the server and client side. We assume the use of precomputed tableswith 96 and 64 points to accelerate �xed-base scalar multiplication on the Weierstrass and twisted Edwardscurves, respectively. Similarly, we assume the use of precomputed tables with 32 points to accelerate double-base scalar multiplication (where one base is �xed). For comparison we state performance numbers for NISTP-256 [30] which uses 150KB of storage, and signature performance numbers when using EdDSA [9] (obtainedfrom the benchmark machine �sandy� [13]). We consider that point transmission (T) in the key-exchange canbe performed in uncompressed (U) or compressed (C) form.

securitycurve names T

estimated cost (in cycles)

level ECDHE ECDSA sign ECDSA ver total cost

128

w-256-montU 395, 000

115, 000 287, 000797, 000

C 416, 000 818, 000

ed-254-montU 272, 000

76, 000 198, 000546, 000

C 288, 000 562, 000

hybrid U300, 000 76, 000 198, 000 574, 000

ed-254-mont + m-254-mont C

NIST P-256 [30] U 490, 000 90, 000 530, 000 1, 110, 000

Ed25519 [9] C N/A 69, 000 225, 000 N/A

192

w-384-mersU 1, 021, 000

277, 000 761, 0002, 059, 000

C 1, 078, 000 2, 116, 000

ed-382-montU 816, 000

225, 000 605, 0001, 646, 000

C 865, 000 1, 695, 000

hybrid U897, 000 225, 000 605, 000 1, 727, 000

ed-382-mont + m-382-mont C

256

w-512-mersU 2, 060, 000

517, 000 1, 592, 0004, 169, 000

C 2, 168, 000 4, 277, 000

ed-511-mersU 1, 677, 000

436, 000 1, 274, 0003, 387, 000

C 1, 779, 000 3, 489, 000

hybrid U1, 828, 000 436, 000 1, 274, 000 3, 538, 000

ed-511-mers + m-511-mers C

secrecy. Besides the usage of elliptic curves in the DH key exchange, TLS certi�cates containa public key that the server uses to authenticate itself: this is an ECDSA public key for thecase of the ECDHE-ECDSA cipher suite. The TLS handshake, using the ECDHE-ECDSAcipher suite, consists of three main components. The ECDSA signature generation (�xed-base scalar multiplication), ECDSA signature veri�cation (double-base scalar multiplication),and ECDHE (one �xed- and one variable-base scalar multiplication). We consider Weierstrassand twisted Edwards curves separately, with and without point compression. The cost ofdecompressing a point in Weierstrass and twisted Edwards form is stated in Table 7 (wherewe follow the approach described in [9] to decompress points on twisted Edwards curves).

When using Weierstrass curves the situation is not complicated: transmitting compressedpoints costs a single conversion while no additional cost is needed when transmitting uncom-pressed points. In the setting of twisted Edwards curves there are more possibilities. Thesimplest approach is to only use the Montgomery form; however, this is expensive since theMontgomery ladder cannot take advantage of the �xed-base setting. One might consider ahybrid solution: computing the �xed-base scalar multiplication using the birationally equiva-lent twisted Edwards curve while computing the variable-base scalar multiplication using theMontgomery ladder. In such a hybrid solution the protocol should specify if the coordinates


are transmitted in (compressed) twisted Edwards or Montgomery coordinates (which are al-ready in compressed form). When using such a hybrid solution in the setting of ECDHE,transmitting the points in Montgomery form is best (see Table 7). The cost for the conversion(between Montgomery and twisted Edwards) is roughly the same as when only using twistedEdwards curves and transmitting compressed points. Our performance numbers suggest thatthe approach using only twisted Edwards is slightly faster than such a hybrid approach usingthe Montgomery ladder, while it avoids conversions between coordinate systems. Furthermore,our Montgomery ladder implementations do not include the extra validation step discussed atthe end of Section 4.3; if incorporated, this would incur additional overhead.

Table 4 gives the cost estimates for the separate components and total cost of the TLShandshake using the ECDHE-ECDSA cipher suite for di�erent security levels. The results showthat the use of twisted Edwards for the ECDHE and full ECDHE-ECDSA computations areapproximately a factor 1.45, 1.25 and 1.23 faster in comparison to the Weierstrass curves at the128-, 192- and 256-bit security levels, respectively. We also include the results from [30] whenusing NIST P-256. In [30] the �xed-base scalar multiplication is implemented using a relativelylarge (slightly over 150KB) lookup table for the �xed-base scalar multiplication. It is unclearif this implementation accesses the table-lookup elements in a cache-attack resistant mannerand if the dedicated addition formula used takes care of exceptions, and if so if this is donein constant time. This might explain the faster implementation results. In order to compareto the state-of-the-art software implementation of twisted Edwards curves we also include theresults from EdDSA using Ed25519 [9] (obtained from the �sandy� benchmark machine� [13]).Note that [9] only computes signatures; when computing ECDH one could use the approachas described in [5] which uses the Montgomery ladder. In order to achieve perfect forwardsecrecy (ECDHE), the implementation can compute the �xed-base scalar multiplication usingthe Montgomery ladder (which is slow) or convert the point and compute the �xed-base scalarmultiplication using the corresponding twisted Edwards curve (using a hybrid approach).

7 Conclusions

In this paper we have presented new elliptic curves for cryptography targeting the 128-,192-, and 256-bit security levels. By considering di�erent choices for the base �eld arith-metic, pseudo-Mersenne and Montgomery-friendly primes, we deterministically selected e�-cient twisted Edwards curves as well as traditional Weierstrass curves. Instead of resortingto the slower complete formulas, we show how to compute e�cient scalar multiplications byusing constant-time, exceptionless, dedicated group operations. For the cases in which theyare not guaranteed to be exceptionless, we have proposed an e�cient �complete� addition for-mula based on masking techniques for Weierstrass curves. Our implementation of the scalarmultiplication in the three most-widely deployed scenarios show that our new backwards com-patible Weierstrass curves o�er enhanced security properties while improving the performancecompared to the standard NIST Weierstrass curves. At the expense of at most a few bits ofECDLP security, our new twisted Edwards curves o�er a performance increase of a factor 1.2to 1.4 compared to our new Weierstrass curves. We demonstrated the potential cryptographicimpact by showing cost estimates for these curves inside the TLS handshake protocol.

Acknowledgments. We thank Niels Ferguson, Thorsten Kleinjung, Dan Shumow and GregZaverucha for their valuable feedback, comments, and help.


References

1. T. Acar and D. Shumow. Modular reduction without pre-computation for special moduli. Technical report,Microsoft Research, 2010.

2. O. Ahmadi and R. Granger. On isogeny classes of edwards curves over �nite �elds. Journal of NumberTheory, 132(6):1337 � 1358, 2012.

3. D. F. Aranha, P. S. L. M. Barreto, G. C. C. F. Pereira, and J. E. Ricardini. A note on high-security general-purpose elliptic curves. Cryptology ePrint Archive, Report 2013/647, 2013. http://eprint.iacr.org/.

4. D. J. Bernstein. Can we avoid tests for zero in fast elliptic-curve arithmetic?, 2006. http://cr.yp.to/

papers.html#curvezero.5. D. J. Bernstein. Curve25519: New Di�e-Hellman speed records. In M. Yung, Y. Dodis, A. Kiayias, and

T. Malkin, editors, Public Key Cryptography � PKC 2006, volume 3958 of LNCS, pages 207�228. Springer,Heidelberg, 2006.

6. D. J. Bernstein. Counting points as a video game, 2010. Slides of a talk given at Counting Points: Theory,Algorithms and Practice, April 19, University of Montreal: http://cr.yp.to/talks/2010.04.19/slides.pdf.

7. D. J. Bernstein, P. Birkner, M. Joye, T. Lange, and C. Peters. Twisted Edwards curves. In S. Vaudenay,editor, AFRICACRYPT, volume 5023 of LNCS, pages 389�405. Springer, 2008.

8. D. J. Bernstein, P. Birkner, T. Lange, and C. Peters. ECM using edwards curves. Math. Comput., 82(282),2013.

9. D. J. Bernstein, N. Duif, T. Lange, P. Schwabe, and B.-Y. Yang. High-speed high-security signatures.Journal of Cryptographic Engineering, 2(2):77�89, 2012.

10. D. J. Bernstein, M. Hamburg, A. Krasnova, and T. Lange. Elligator: Elliptic-curve points indistinguishablefrom uniform random strings. In ACM Conference on Computer and Communications Security, 2013.

11. D. J. Bernstein and T. Lange. Faster addition and doubling on elliptic curves. In K. Kurosawa, editor,ASIACRYPT, volume 4833 of LNCS, pages 29�50. Springer, 2007.

12. D. J. Bernstein and T. Lange. SafeCurves: choosing safe curves for elliptic-curve cryptography. http:

//safecurves.cr.yp.to, accessed 16 October 2013.13. D. J. Bernstein and T. Lange (editors). eBACS: ECRYPT Benchmarking of Cryptographic Systems.

http://bench.cr.yp.to, accessed February 3rd 2014.14. S. Blake-Wilson, N. Bolyard, V. Gupta, C. Hawk, and B. Moeller. Elliptic curve cryptography (ECC)

cipher suites for transport layer security (TLS). RFC 4492, 2006.15. J. W. Bos, C. Costello, H. Hisil, and K. Lauter. Fast cryptography in genus 2. In T. Johansson and P. Q.

Nguyen, editors, EUROCRYPT, volume 7881 of LNCS, pages 194�210. Springer, 2013.16. J. W. Bos, J. A. Halderman, N. Heninger, J. Moore, M. Naehrig, and E. Wustrow. Elliptic curve cryptog-

raphy in practice (to appear). In Financial Cryptography and Data Security, LNCS. Springer, 2014.17. W. Bosma, J. Cannon, and C. Playoust. The Magma algebra system. I. The user language. J. Symbolic

Comput., 24(3-4):235�265, 1997. Computational algebra and number theory (London, 1993).18. W. Bosma and H. W. Lenstra. Complete systems of two addition laws for elliptic curves. Journal of

Number Theory, 53(2):229�240, 1995.19. B. B. Brumley and R. M. Hakala. Cache-timing template attacks. In M. Matsui, editor, ASIACRYPT,

volume 5912 of LNCS, pages 667�684. Springer, 2009.20. D. Brumley and D. Boneh. Remote timing attacks are practical. In S. Mangard and F.-X. Standaert, edi-

tors, Proceedings of the 12th USENIX Security Symposium, volume 6225 of LNCS, pages 80�94. Springer,2003.

21. Certicom Research. Standards for e�cient cryptography 2: Recommended elliptic curve domain parame-ters. Standard SEC2, Certicom, 2000.

22. D. Chudnovsky and G. Chudnovsky. Sequences of numbers generated by addition in formal groups andnew primality and factorization tests. Advances in Applied Mathematics, 7(4):385�434, 1986.

23. ECC Brainpool. ECC Brainpool Standard Curves and Curve Generation. http://www.ecc-brainpool.

org/download/Domain-parameters.pdf, 2005.24. H. M. Edwards. A normal form for elliptic curves. Bulletin of the American Mathematical Society, 44:393�

422, July 2007.25. J.-C. Faugère, L. Perret, C. Petit, and G. Renault. Improving the complexity of index calculus algorithms

in elliptic curves over binary �elds. In D. Pointcheval and T. Johansson, editors, EUROCRYPT, volume7237 of LNCS, pages 27�44. Springer, 2012.

26. A. Faz-Hernández, P. Longa, and A. Sánchez. E�cient and secure algorithms for GLV-based scalarmultiplication and their implementation on GLV-GLS curves (extended version). In Cryptology ePrintArchive, Report 2013/158, 2013. Available at: http://eprint.iacr.org/2013/158.


27. M. Feng, B. Zhu, M. Xu, and S. Li. E�cient comb elliptic curve multiplication methods resistant to poweranalysis. In Cryptology ePrint Archive, Report 2005/222, 2005. Available at: http://eprint.iacr.org/2005/222.

28. P.-A. Fouque, A. Joux, and M. Tibouchi. Injective encodings to elliptic curves. In C. Boyd and L. Simpson,editors, ACISP, volume 7959 of LNCS, pages 203�218. Springer, 2013.

29. R. P. Gallant, R. J. Lambert, and S. A. Vanstone. Faster point multiplication on elliptic curves withe�cient endomorphisms. In J. Kilian, editor, CRYPTO, volume 2139 of LNCS, pages 190�200. Springer,2001.

30. S. Gueron and V. Krasnov. Fast prime �eld elliptic curve cryptography with 256 bit primes. CryptologyePrint Archive, Report 2013/816, 2013. http://eprint.iacr.org/.

31. M. Hamburg. Fast and compact elliptic-curve cryptography. Cryptology ePrint Archive, Report 2012/309,2012. http://eprint.iacr.org/.

32. M. Hamburg. Twisting edwards curves with isogenies. Cryptology ePrint Archive, Report 2014/027, 2014.http://eprint.iacr.org/.

33. D. Hankerson, A. Menezes, and S. Vanstone. Guide to elliptic curve cryptography. Springer Verlag, 2004.34. H. Hisil, K. K.-H. Wong, G. Carter, and E. Dawson. Twisted Edwards curves revisited. In J. Pieprzyk,

editor, Asiacrypt 2008, volume 5350 of LNCS, pages 326�343. Springer, Heidelberg, 2008.35. M. Joye and M. Tunstall. Exponent recoding and regular exponentiation algorithms. In M. Joye, editor,

Proceedings of Africacrypt 2003, volume 5580 of LNCS, pages 334�349. Springer, 2009.36. A. Karatsuba and Y. Ofman. Multiplication of multidigit numbers on automata. In Soviet physics doklady,

volume 7, page 595, 1963.37. M. Kneºevi¢, F. Vercauteren, and I. Verbauwhede. Speeding up bipartite modular multiplication. In

M. Hasan and T. Helleseth, editors, Arithmetic of Finite Fields � WAIFI 2010, volume 6087 of LNCS,pages 166�179. Springer Berlin / Heidelberg, 2010.

38. P. C. Kocher. Timing attacks on implementations of Di�e-Hellman, RSA, DSS, and other systems. InN. Koblitz, editor, Crypto 1996, volume 1109 of LNCS, pages 104�113. Springer, Heidelberg, 1996.

39. A. K. Lenstra. Generating RSA moduli with a predetermined portion. In K. Ohta and D. Pei, editors,Asiacrypt'98, volume 1514 of LNCS, pages 1�10. Springer Berlin / Heidelberg, 1998.

40. C. H. Lim and P. J. Lee. More �exible exponentiation with precomputation. In Y. Desmedt, editor,CRYPTO, volume 839 of LNCS, pages 95�107. Springer, 1994.

41. P. Longa and C. Gebotys. E�cient techniques for high-speed elliptic curve cryptography. In S. Mangardand F.-X. Standaert, editors, Proceedings of CHES 2010, volume 6225 of LNCS, pages 80�94. Springer,2010.

42. P. Longa and A. Miri. New composite operations and precomputation scheme for elliptic curve cryptosys-tems over prime �elds. In R. Cramer, editor, Proceedings of PKC 2008, volume 4939 of LNCS, pages229�247. Springer, 2008.

43. N. Meloni. New point addition formulae for ECC applications. In C. Carlet and B. Sunar, editors,Workshop on Arithmetic of Finite Fields (WAIFI), volume 4547 of LNCS, pages 189�201. Springer, 2007.

44. B. Möller. Algorithms for multi-exponentiation. In S. Vaudenay and A. M. Youssef, editors, Selected Areasin Cryptography, volume 2259 of LNCS, pages 165�180. Springer, 2001.

45. P. L. Montgomery. Modular multiplication without trial division. Mathematics of Computation,44(170):519�521, April 1985.

46. P. L. Montgomery. Speeding the Pollard and elliptic curve methods of factorization. Mathematics ofComputation, 48(177):243�264, 1987.

47. National Security Agency. Fact sheet NSA Suite B Cryptography. http://www.nsa.gov/ia/programs/

suiteb_cryptography/index.shtml, 2009.48. K. Okeya and T. Takagi. The width-w NAF method provides small memory and fast elliptic curve scalars

multiplications against side-channel attacks. In M. Joye, editor, Proceedings of CT-RSA 2003, volume2612 of LNCS, pages 328�342. Springer, 2003.

49. D. A. Osvik, A. Shamir, and E. Tromer. Cache attacks and countermeasures: The case of AES. InD. Pointcheval, editor, CT-RSA, volume 3860 of LNCS, pages 1�20. Springer, 2006.

50. R. Schoof. Counting points on elliptic curves over �nite �elds. Journal de théorie des nombres de Bordeaux,7(1):219�254, 1995.

51. D. Shumow and N. Ferguson. On the possibility of a back door in the NIST SP800-90 dual ec prng.http://rump2007.cr.yp.to/15-shumow.pdf, 2007.

52. J. A. Solinas. Generalized Mersenne numbers. Technical Report CORR 99�39, Centre for Applied Cryp-tographic Research, University of Waterloo, 1999.

53. J. A. Solinas. E�cient arithmetic on Koblitz curves. Designs, Codes and Cryptography, 19(195�249), 2000.


54. The New York Times. Government announces steps to restore con�dence on encryptionstandards. http://bits.blogs.nytimes.com/2013/09/10/government-announces-steps-to-restore-

confidence-on-encryption-standards, 2013.

55. M. Tibouchi. Elligator squared: Uniform points on elliptic curves of prime order as uniform random strings.Cryptology ePrint Archive, Report 2014/043, 2014. http://eprint.iacr.org/.

56. U.S. Department of Commerce/National Institute of Standards and Technology. Digital Signature Stan-dard (DSS). FIPS-186-4, 2013. http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.186-4.pdf.

57. C. D. Walter. Montgomery exponentiation needs no �nal subtractions. Electronics Letters, 35(21):1831�1832, 1999.

A Algorithms for Scalar Multiplication

Algorithms for variable-base scalar multiplication. Algorithm 1 computes scalar mul-tiplication for the variable-base scenario using the �xed-window method from [48]. We referto Sections 4.1 and 4.2 for details on its usage with Weierstrass and twisted Edwards curves,respectively. The computation of this operation mainly consists of four di�erent stages: inputand point validation, precomputation, recoding and evaluation. Input and point validation arecomputed at the very beginning of the execution using Algorithm 2 for Weierstrass curves andAlgorithm 3 for twisted Edwards curves. In particular, Algorithm 3 performs two doublingsover the input point in twisted Edwards to ensure that subsequent computations are performedin the large prime order subgroup (avoiding small subgroup attacks). We remark that it is theprotocol implementer's responsibility to ensure that timing di�erences during the detectionof errors do not leak sensitive information to an attacker. In the precomputation stage, theimplementer should �rst select a window width 2 ≤ w < 10 according to e�ciency and/ormemory considerations. For example, selecting w = 6 for 256-, 384- and 512-bit scalar mul-tiplication was found to achieve optimal performance in our implementations of Weierstrasscurves. Precomputation is then computed by successively executing P + 2P + 2P + . . .+ 2Pwith 2w−2− 1 point additions and storing the intermediate results. Explicit schemes are givenin Algorithms 4 and 5 for a = −3 Weierstrass and a = −1 twisted Edwards curves, respec-tively. In the recoding stage, we use a variant of the regular recoding by [35] that ensures�xed length (see Algorithm 6). Since Algorithm 6 only recodes odd integers, we include aconversion step at Step 6 to deal with even values. The corresponding correction is performedat Step 20. These computations should be executed in constant time to protect against timingattacks. For example, a constant time execution of Step 6 could be implemented as follows (as-suming a two's complement representation in which −1 ≡ 0xFF. . .FF, and bitlength(odd) =bitlength(k)):

odd = −(k AND 1) {If k is even then odd =0xFF. . .FF else odd = 0}k′ = k − rk = (odd AND (k XOR k′)) XOR k′ {If odd = 0 then k = k − r}

The main computation in the evaluation stage consists of t = dlog2(r)/(w− 1)e iterationseach computing (w− 1) doublings and one addition with a value from the precomputed table.For a = −3 Weierstrass curves, the use of Jacobian coordinates is a popular choice for e�-ciency reasons. If this is used, then Algorithm 1 can use an e�cient merged doubling-additionformula [42] when w > 2 by setting DBLADD = true. Other cases, including Weierstrasscurves with w = 2 or twisted Edwards curves, should use DBLADD = false. Note that the


Algorithm 1 Variable-base scalar multiplication using the �xed-window method.

Input: Scalar k ∈ [0, r〉 and point P = (x, y) ∈ E(Fp), where #E(Fp) = h.r with co-factor h ∈ Z+ and rprime.

Output: kP .1. if k = 0 ∨ k ≥ r then return (�error: invalid scalar�) [if: validation]

2. Run point validation and compute T = 4P (for Ed) using Algorithm 2 for Eb and Algorithm 3 for Ed. If�invalid� return (�error: invalid point�), else set P = T (for Ed). [if: validation]

Precomputation Stage:

3. Fix the window width 2 ≤ w < 10 ∈ Z+.4. Compute P [i] = (2i+ 1)P for 0 ≤ i < 2w−2 using Algorithm 4 for Eb and Algorithm 5 for Ed.

Recoding Stage:

5. odd = k mod 26. if odd = 0 then k = r − k [if: masked constant time]

7. Recode k to (kt, . . . , k0) = (st · |kt|, . . . , s0 · |k0|) using Algorithm 6, where t = dlog2(r)/(w− 1)e and sj arethe signs of the recoded digits.Evaluation Stage:

8. Q = stP [(|kt| − 1)/2]9. for i = (t− 1) to 110. if DBLADD = true ∧ w 6= 2 then [if: algorithm variant]

11. Q = 2(w−2)Q (Use Alg.10)

12. Q = 2Q+ siP [(|ki| − 1)/2] (Use Alg.11)

13. else

14. Q = 2(w−1)Q (Use Alg.10 for Eb and Alg.14 for Ed)15. Q = Q+ siP [(|ki| − 1)/2] (Use Alg.12 for Eb and Alg.15 for Ed)16. end if

17. end for

18. Q = 2(w−1)Q (Use Alg.10 for Eb and Alg.14 for Ed)19. Q = Q⊕ s0P [(|k0| − 1)/2] (Use Alg.19 for Eb and Alg.17 for Ed)20. if odd = 0 then Q = −Q [if: masked constant time]

21. Convert Q to a�ne coordinates (x, y).22. return Q.

Algorithm 2 Point validation for the Weierstrass curves Eb/Fp : y2 = x3− 3x+ b in Table 1.

Input: Point P = (x1, y1).Output: �Valid� or �invalid� point.1. if P = O then return (�invalid�) [if: validation]

2. if x1 /∈ [0, p− 1] ∨ y1 /∈ [0, p− 1] then return (�invalid�) [if: validation]

3. if y21 6= x31 − 3x1 + b (mod p) then return (�invalid�) [if: validation]

4. return (�valid�).

evaluation of DBLADD is used to simplify the description of the algorithm. An implementa-tion might choose for having separate functions for twisted Edwards and Weierstrass curves.Following the recommendations from Section 4, the last addition should be performed with auni�ed formula (denoted by ⊕) in order to avoid exceptions and it has been separated fromthe main loop; see Steps 18 and 19. To achieve constant-time execution, the points from theprecomputed table should be extracted by doing a full pass over all the points in the lookuptable and masking the correct value with the index (|ki| − 1)/2. Finally, a suitable conversionto a�ne coordinates may be computed at Step 21 (if required).

Algorithm 7 computes scalar multiplication for the �xed-base scenario using the modi�edLSB-set method [26] (denoted by mLSB-set), which combines the comb method [40] andLSB-set recoding [27]. Refer to Sections 4.1 and 4.2 for details on the use of the method withWeierstrass and twisted Edwards curves, respectively. This operation consists of computations


Algorithm 3 Combined point validation and torsion clearing for the twisted Edwards curvesEd : − x2 + y2 = 1 + dx2y2 in Table 2.Input: Point P = (x1, y1).Output: �Invalid�, or �valid� and a point T of prime order r.1. if x1 /∈ [0, p− 1] ∨ y1 /∈ [0, p− 1] then return (�invalid�) [if: validation]

2. if −x21 + y21 6= 1 + dx21y21 (mod p) then return (�invalid�) [if: validation]

3. if (x1, y1) = (0, 1) then return (�invalid�) [if: validation]

4. Compute T = 4P5. if T = O then return (�invalid�) [if: validation]

6. return (�valid�) and T .

Algorithm 4 Precomputation scheme for the Weierstrass curves Eb/Fp : y2 = x3 − 3x+ b in

Table 1.Input: Point P = (x1, y1) ∈ Eb(Fp) \ {O} of prime order r and window width 2 ≤ w < 10.Output: P [i] = (2i+ 1)P for 0 ≤ i < 2w−2, in Chudnovsky coordinates.1. Given P = (x1, y1), compute Q = 2P in Jacobian coordinates (Q = (X2,Y2,Z2)) and convert P to

Chudnovsky coordinates (P = (X,Y, Z1, Z2, Z3)) such that the new P and Q have the same Z-coordinate(read from left to right, top to bottom):t2 = 1, t1 = x21, t1 = t1 − t2, t2 = t1/2, t1 = t1 + t2,Z2 = y21 , X = x1 · Z2, Z3 = Z2 · y1, t2 = t21 t2 = t2 −X,X2 = t2 −X, Z2 = y1, Z1 = y1, Y = Z2

2 , t2 = X −X2,t3 = t1 · t2, Y2 = t3 − Y .

2. P [0] = P3. for i = 1 to 2w−2 − 1 do4. Given Q = (X1, Y1, Z) and P [i − 1] = (X2, Y2, Z, Z

2, Z3) compute P [i] = Q + P [i − 1] =(X3, Y3, Z3,1, Z3,2, Z3,3) and update the representation of Q = (X,Y,Z) such that Z = Z3,1 (readfrom left to right, top to bottom):t1 = X2 −X1, Z3,1 = Z · t1, Z = Z3,1, t2 = t21, Z3,2 = Z2

3,1,t3 = t1 · t2, X = X1 · t2, t1 = Y2 − Y1, X3 = t21, Z3,3 = Z3 · Z3,2,X3 = X3 − t3, X3 = X3 −X, X3 = X3 −X, t2 = X−X3, t1 = t1 · t2,Y = Y1 · t3, Y3 = t1 −Y.

5. return P [i] (= (2i+ 1)P for 0 ≤ i < 2w−2).

executed o�ine, which involve point validation and precomputing multiples of the knowninput point, and computations executed online, which involve scalar validation, recoding andevaluation stages. As before, point validation for twisted Edwards using Algorithm 3 during theo�ine phase performs two doublings over the input point to ensure that the computation takesplace in the large prime order subgroup. Again, it is the protocol implementer's responsibilityto ensure that timing di�erences during the detection of errors do not leak sensitive informationto an attacker. The implementer should choose a window width 2 ≤ w < 10 and a tableparameter v ≥ 1 according to e�ciency and/or memory constraints, taking into account thatthe mLSB-set method requires v · 2w−1 precomputed points. For example, selecting w = 6and v = 3 for 256-bit scalar multiplication was found to achieve optimal performance in ourimplementations of Weierstrass curves when storage is constrained to 6KB. During the onlinecomputation, the recoded scalar obtained from Algorithm 8 has a �xed length, which enablesa fully regular execution when the representation is set up as described at Step 7. SinceAlgorithm 8 only recodes odd integers, we include a conversion step at Step 6 to deal witheven values. The corresponding correction is performed at Step 13. In the evaluation stage,the main computation consists of e− 1 = ddlog2(r)e/(wv)e − 1 iterations each computing onedoubling and v additions with a value from the precomputed table. Following Section 4, the


Algorithm 5 Precomputation scheme for twisted Edwards curves (Ed).Input: Point P = (x1, y1) ∈ Ed(Fp) of prime order r and window width w ≥ 2 ∈ Z+.Output: P [i] = (2i+ 1)P , for 0 ≤ i < 2w−2, in extended homogeneous coordinates (X + Y, Y −X, 2Z, 2T ).1. Given P = (x1, y1), compute Q = 2P = (X2, Y2, Z2, T2) (where Q is represented using (X+Y, Y −X,Z, T )),

and update P = (X,Y,Z,T) in the representation (X + Y, Y −X, 2Z, 2T ) (read from left to right, top tobottom):X = x21, t1 = y21 , t2 = X+ t1, t1 = t1 −X, Y = x1 · y1,t3 = Y +Y, Y2 = t1 · t2, T2 = t2 · t3, t2 = 2, t2 = t2 − t1,Z2 = t1 · t2, t1 = t2 · t3, Z = x1, T = x1 ·Y, X2 = t1 + Y2,Y2 = Y2 − t1, t1 = X+Y, Y = Y −X, X = t1, Z = Z+ Z,T = T+T.

2. P [0] = P3. for i = 1 to 2(w−2) − 1 do4. Given P [i − 1] = (X2, Y2, Z2, T2) (represented using (X + Y, Y −X, 2Z, 2T )) and Q = (X1, Y1, Z1, T1)

(represented using (X + Y, Y −X,Z, T )) compute P [i] = Q+ P [i− 1], where P [i] = (X3, Y3, Z3, T3) isrepresented as (X + Y, Y −X, 2Z, 2T ) (read from left to right, top to bottom):t1 = T2 · Z1, t2 = T1 · Z2, t3 = t2 − t1, t1 = t1 + t2, t2 = t1 · t3,T3 = t2 + t2, t2 = X1 · Y2, X3 = Y1 ·X2, Y3 = t2 −X3, t2 = X3 + t2,X3 = Y3 · t1, Z3 = Y3 · t2, t1 = t3 · t2, Y3 = t1 −X3, X3 = X3 + t1,Z3 = Z3 + Z3.

5. return P [i] (= (2i+ 1)P for 0 ≤ i < 2w−2).

Algorithm 6 Protected odd-only recoding algorithm for the �xed-window representation.Input: odd integer k ∈ [1, r〉 and window width w ≥ 2, where r is the prime order of the targeted elliptic

curve (sub)group.Output: (kt, . . . , k0), where ki ∈ {±1,±3, · · · ,±(2w−1 − 1)} and t = dlog2(r)/(w − 1)e.1. t = dlog2(r)/(w − 1)e2. for i = 0 to (t− 1) do3. ki = (k mod 2w)− 2w−1

4. k = (k − ki)/2w−1

5. kt = k6. return (kt, . . . , k0).

additions should be performed with a uni�ed formula (denoted by ⊕) to avoid exceptions.Note that, as described in the variable-base case, all the conditional computations using �if�statements as well as the extraction of points from the precomputed table should be executedin constant time in order to protect against timing attacks (with the exception of Step 3,which depends on public parameters; any potential leak through the detection of errors atStep 4 should be assessed by the protocol's implementer). Finally, a suitable conversion toa�ne coordinates may be computed at Step 14 (if required).

Algorithm 9 computes double-scalar multiplication, which is typically found in signatureveri�cation schemes, and uses the width-w non-adjacent form [53] with interleaving [29, 44].We assume that one of the input points is known in advance (P2) whereas the other one is avariable base (P1). Hence, we distinguish two phases: o�ine, which involves validation of P2

and a precomputation stage using the value w2; and online, which involves scalar validation,point validation of P1 and precomputation (using w1), recoding and evaluation stages. Again,point validation for twisted Edwards curves with Algorithm 3 performs two doublings over theinput points to ensure computation in the large prime order subgroup. The precomputationfor both input points are performed as in the variable-base scenario using Algorithms 4 and5 for a = −3 Weierstrass and a = −1 twisted Edwards curves, respectively. However, the


Algorithm 7 Protected �xed-base scalar multiplication using the modi�ed LSB-set combmethod.Input: A point P = (x, y) ∈ E(Fp), where #E = h.r with co-factor h ∈ Z+ and r prime, a scalar k =

(kt−1, . . . , k0)2 ∈ [0, r〉, where t = dlog2(r)e, window width w ≥ 2, and table parameter v ≥ 1, such thate = dt/(wv)e, d = ev and ` = dw.

Output: kP .O�ine computation:Precomputation stage:

1. Run point validation and compute T = 4P (for Ed) using Algorithm 2 for Eb and Algorithm 3 for Ed. If�invalid� return (�error: invalid point�), else set P = T (for Ed). [if: validation]

2. Compute P [i][j] = 2ej(1 + i02d + . . . + iw−22

(w−1)d)P for all 0 ≤ i < 2w−1 and 0 ≤ j < v, wherei = (iw−2, . . . , i0)2.

3. if wv | t then compute 2wdP . [if: algorithm variant]

Online computation:4. if k = 0 ∨ k ≥ r then return (�error: invalid scalar�) [if: validation]

Recoding stage:

5. odd = k mod 26. if odd = 0 then k = r − k [if: masked constant time]

7. Pad k with dw − t zeros to the left and convert it to the mLSB-set representation using Algorithm 8such that k = (c, bl−1, . . . , b0)mLSB-set. Set the digit columns Ti,j = |

∑w−2m=0 b(m+1)d+ei+j2

m| with signssi,j = bei+j for all 0 ≤ i < v and 0 ≤ j < e.Evaluation stage:

8. Q =∑v−1

i=0 si,e−1P [Ti,e−1][i]9. for i = e− 2 to 0 do10. Q = 2Q (Use Alg.10 for Eb and Alg.14 for Ed)11. Q = Q⊕

∑v−1j=0 sj,iP [Tj,i][j] (Use Alg.18 for Eb and Alg.17 for Ed)

12. if wv | t ∧ c = 1 then Q = Q+ 2wdP [if: masked constant time]

13. if odd = 0 then Q = −Q [if: masked constant time]

14. Convert Q to a�ne coordinates (x, y).15. return Q

Algorithm 8 Protected odd-only recoding algorithm for the modi�ed LSB-set representation.Input: An odd `-bit integer k = (k`−1, . . . , k0)2 ∈ [1, r〉, window width w ≥ 2 and table parameter v ≥ 1,

where r is the prime order of the targeted elliptic curve group such that e = dt/(wv)e, d = ev and ` = dw,where t = dlog2(r)e.

Output: (c, b`−1, . . . , b0)mLSB-set, where

{bi ∈ {1,−1} if 0 ≤ i < dbi ∈ {0, bi mod d} if d ≤ i < `.

If wv | t then c ∈ {0, 1}, otherwise,

c = 0.1. bd−1 = 12. for i = 0 to (d− 2) do3. bi = 2ki+1 − 14. c = bk/2dc5. for i = d to (`− 1) do6. bi = bi mod d · c07. c = bc/2c − bbi/2c8. return (c, bl−1, . . . , b0)mLSB-set.

implementer has additional freedom in the selection of w2 since the precomputation for the�xed-base is done o�ine. For example, we found that using w1 = 6 and w2 = 7 results inoptimal performance in our implementations of Weierstrass curves when storage was restrictedto 2KB, 3KB and 4KB for 128-, 192- and 256-bit security levels. In the online computation,recoding of the scalars is performed using [33, Algorithm 3.35]. Accordingly, the evaluationstage consists of dlog2(r)e + 1 iterations, each consisting of one doubling and at most two


additions (one per precomputed table). As in the variable-base case, for a = −3 Weierstrasscurves using Jacobian coordinates one may use the merged doubling-addition formula [42] bysetting DBLADD = true. A suitable conversion to a�ne coordinates may be computed atStep 39 (if required).


Algorithm 9 Double-scalar multiplication using the width-w NAF with interleaving.(If-statements are not marked because this algorithm is not assumed to be constant-time.)

Input: Scalars k1 and k2 ∈ [0, r〉 and points P1 and P2 ∈ E(Fp), where #E = h.r with co-factor h ∈ Z+ andr prime.

Output: k1P1 + k2P2.O�ine computation:Precomputation stage:

1. Run point validation over P2 and compute T = 4P2 (for Ed) using Algorithm 2 for Eb and Algorithm 3 forEd. If �invalid� return (�error: invalid point�), else set P2 = T (for Ed).

2. Fix the window width w2 ≥ 2 ∈ Z+.3. Compute P2[i] = (2i+ 1)P2 for 0 ≤ i < 2w2−2 using Algorithm 4 for Eb and Algorithm 5 for Ed.

Online computation:4. if (k1 = 0 ∨ k1 ≥ r) ∨ (k2 = 0 ∨ k2 ≥ r) then return (�error: invalid scalar�)5. Run point validation over P1 and compute T = 4P1 (for Ed) using Algorithm 2 for Eb and Algorithm 3 forEd. If �invalid� return (�error: invalid point�), else set P1 = T (for Ed).Precomputation Stage:

6. Fix the window width w1 ≥ 2 ∈ Z+.7. Compute P1[i] = (2i+ 1)P1 for 0 ≤ i < 2w1−2 using Algorithm 4 for Eb and Algorithm 5 for Ed.

Recoding Stage:

8. Recode k1 to (k1,i−1, k1,i−2, . . . , k1,0)wNAF using [33, Algorithm 3.35] and pad it with dlog2(r)e− i+1 zerosto the left.

9. Recode k2 to (k2,j−1, k2,j−2, . . . , k2,0)wNAF using [33, Algorithm 3.35] and pad it with dlog2(r)e − j + 1zeros to the left.Evaluation Stage:

10. for i = dlog2(r)e to 011. if DBLADD = true then12. if k1,i = 0 then13. Q = 2Q (Use Algorithm 10)

14. else if k1,i > 0 then15. Q = 2Q+ P1[k1,i/2] (Use Algorithm 11)

16. else if k1,i < 0 then17. Q = 2Q− P1[(−k1,i)/2] (Use Algorithm 11)

18. end if

19. if k2,i > 0 then20. Q = Q+ P2[k2,i/2] (Use Algorithm 13)

21. else if k2,i < 0 then22. Q = Q− P2[(−k2,i)/2] (Use Algorithm 13)

23. end if

24. else

25. Q = 2Q (Use Algorithm 14)

26. else if k1,i > 0 then27. Q = Q+ P1[k1,i/2] (Use Algorithm 15)


30. end if

31. if k2,i > 0 then32. Q = Q+ P2[k2,i/2] (Use Algorithm 16)


35. end if

36. end if

37. end for

38. Convert Q to a�ne coordinates (x, y).39. return Q.


B Algorithms for Point Operations

Algorithm 10 Point doubling using Jacobian coordinates on Weierstrass curves (Eb).

Input: P = (X1, Y1, Z1) ∈ Eb(Fp) in Jacobian coordinates.Output: 2P = (X2, Y2, Z2) ∈ Eb(Fp) in Jacobian coordinates.

1. t1 = Z21

2. t2 = X1 + t13. t1 = X1 − t14. t1 = t1 × t25. t2 = t1/26. t1 = t1 + t27. t2 = Y 2

1

8. t3 = X1 × t29. t4 = t21

10. t4 = t4 − t311. X2 = t4 − t312. Z2 = Y1 × Z1

13. t2 = t2214. t4 = t3 −X2

15. t1 = t1 × t416. Y2 = t1 − t217. return 2P = (X2, Y2, Z2).

Algorithm 11 Merged point doubling-addition using Jacobian/Chudnosvky coordinates onWeierstrass curves (Eb).

Input: P,Q ∈ Eb(Fp) such that P = (X1, Y1, Z1) is in Jacobian coordinates and Q = (X2, Y2, Z2, Z22 , Z

32 ) is

in Chudnosvky coordinates.Output: 2P +Q = (X4, Y4, Z4) ∈ Eb(Fp) in Jacobian coordinates.

1. if P = O then return Q [if: exception]

2. if Q = O then use Algorithm 10 to compute andreturn 2P [if: exception]

3. t1 = Z21

4. t2 = Z1 × t15. t3 = Z3

2 × Y1

6. t2 = Y2 × t27. t2 = t2 − t38. t4 = Z2

2 ×X1

9. t1 = t1 ×X2

10. t1 = t1 − t411. if t1 = 0 then [if: exception]

12. if t2 = 0 then [if: exception]

13. Use Alg. 10 to compute R = 2P14. Use Alg. 12 to compute and return R+Q15. else return P16. Z4 = Z1 × Z2

17. Z4 = t1 × Z4

18. t5 = t2119. t1 = t1 × t520. X4 = t4 × t521. t4 = t22

22. t4 = t4 − t123. t4 = t4 −X4

24. t4 = t4 −X4

25. t4 = t4 −X4

26. if t4 = 0 then return (O) [if: exception]

27. Y4 = t1 × t328. t1 = t2 × t429. t1 = t1 + Y4

30. t1 = t1 + Y4

31. Z4 = Z4 × t432. t2 = t2433. t4 = t2 × t434. t2 = t2 ×X4

35. t3 = t2136. t3 = t3 − t437. t3 = t3 − t238. X4 = t3 − t239. t3 = X4 − t240. t4 = t4 × Y4

41. t1 = t1 × t342. Y4 = t1 − t443. return 2P +Q = (X4, Y4, Z4).


Algorithm 12 Point addition using Jacobian/Chudnovsky coordinates on Weierstrass curves(Eb).

Input: P,Q ∈ Eb(Fp) such that P = (X1, Y1, Z1) is in Jacobian coordinates and Q = (X2, Y2, Z2, Z22 , Z

32 ) is

in Chudnosvky coordinates.Output: P +Q = (X3, Y3, Z3) ∈ Eb(Fp) in Jacobian coordinates.


2. if Q = O then return P [if: exception]

3. t1 = Z21

4. t2 = Z1 × t15. t3 = Z3

2 × Y1

6. t2 = Y2 × t27. t2 = t2 − t38. t4 = Z2

2 ×X1

9. t1 = t1 ×X2

10. t1 = t1 − t411. if t1 = 0 then [if: exception]


13. Use Alg. 10 to compute and return 2P .14. else return (O)

15. Z3 = Z1 × Z2

16. Z3 = t1 × Z3

17. t5 = t2118. t1 = t1 × t519. t4 = t4 × t520. t5 = t2221. t5 = t5 − t122. t5 = t5 − t423. X3 = t5 − t424. t4 = t4 −X3

25. t4 = t2 × t426. t1 = t1 × t327. Y3 = t4 − t128. return P +Q = (X3, Y3, Z3).

Algorithm 13 Point addition using Jacobian/a�ne coordinates on Weierstrass curves (Eb).

Input: P,Q ∈ Eb(Fp) such that P = (X1, Y1, Z1) is in Jacobian coordinates and Q = (x2, y2) is in a�necoordinates.

Output: P +Q = (X3, Y3, Z3) ∈ Eb(Fp) in Jacobian coordinates.



3. t1 = Z21

4. t2 = Z1 × t15. t1 = t1 × x26. t2 = t2 × y27. t1 = t1 −X1

8. t2 = t2 − Y1



11. Use Alg. 10 to compute and return 2P .12. else return (O)13. Z3 = Z1 × t1

14. t3 = t2115. t4 = t1 × t316. t3 = X1 × t317. t1 = t3 + t318. X3 = t2219. X3 = X3 − t120. X3 = X3 − t421. t3 = t3 −X3

22. t3 = t2 × t323. t4 = t4 × Y1

24. Y3 = t3 − t425. return P +Q = (X3, Y3, Z3).

Algorithm 14 Point doubling using homogeneous/extended homogeneous coordinates onEdwards curves (Ed).Input: P = (X1, Y1, Z1) ∈ Ed(Fp).Output: 2P = (X2, Y2, Z2, T2,a, T2,b) ∈ Ed(Fp).

1. t1 = X21

2. t2 = Y 21

3. T2,b = t1 + t24. t1 = t2 − t15. t2 = Y1 + Y1

6. T2,a = X1 × t27. Y2 = t1 × T2,b

8. t2 = Z21

9. t2 = t2 + t210. t2 = t2 − t111. Z2 = t1 × t212. X2 = t2 × T2,a

13. return 2P = (X2, Y2, Z2, T2,a, T2,b).


Algorithm 15 Point addition using extended homogeneous coordinates on Edwards curves(Ed).Input: P,Q ∈ Ed(Fp) such that P = (X1, Y1, Z1, T1,a, T1,b) and Q = (X2 + Y2, Y2 −X2, 2Z2, 2T2).Output: P +Q = (X3, Y3, Z3, T3,a, T3,b) ∈ Ed(Fp).


2. if P = O then [if: exception]

3. t1 = (X2 + Y2)− (Y2 −X2)4. t1 = t1/25. Y3 = (Y2 −X2) + t16. X3 = t17. Z3 = (2Z2)/28. T3,a = (2T2)/29. T3,b = 110. T3,a = T1,a × T1,b

11. t1 = (2T2)× Z1

12. t2 = T3,a × (2Z2)

13. T3,a = t2 − t114. T3,b = t1 + t215. t2 = X1 + Y1

16. t1 = (Y2 −X2)× t217. t2 = Y1 −X1

18. t2 = (X2 + Y2)× t219. Z3 = t1 − t220. t1 = t1 + t221. X3 = T3,b × Z3

22. Z3 = Z3 × t123. Y3 = T3,a × t124. return P +Q = (X3, Y3, Z3, T3,a, T3,b).

Algorithm 16 Point addition using extended homogeneous/extended a�ne coordinates onEdwards curves (Ed).Input: P,Q ∈ Ed(Fp) such that P = (X1, Y1, Z1, T1,a, T1,b) and Q = (x2 + y2, y2 − x2, 2t2).Output: P +Q = (X3, Y3, Z3, T3,a, T3,b) ∈ Ed(Fp).


2. if P = O then [if: exception]

3. t1 = (x2 + y2)− (y2 − x2)4. t1 = t1/25. Y3 = (y2 − x2) + t16. X3 = t17. Z3 = 18. T3,a = (2t2)/29. T3,b = 110. T3,a = T1,a × T1,b

11. t1 = (2T2)× Z1

12. t2 = T3,a + T3,a

13. T3,a = t2 − t114. T3,b = t1 + t215. t2 = X1 + Y1

16. t1 = (Y2 −X2)× t217. t2 = Y1 −X1

18. t2 = (X2 + Y2)× t219. Z3 = t1 − t220. t1 = t1 + t221. X3 = T3,b × Z3

22. Z3 = Z3 × t123. Y3 = T3,a × t124. return P +Q = (X3, Y3, Z3, T3,a, T3,b).

Algorithm 17 Uni�ed point addition using extended homogeneous coordinates on Edwardscurves (Ed).Input: P,Q ∈ Ed(Fp) such that P = (X1, Y1, Z1, T1,a, T1,b) and Q = (X2 + Y2, Y2 −X2, 2Z2, 2T2).Output: P +Q = (X3, Y3, Z3, T3,a, T3,b) ∈ Ed(Fp).

1. T3,a = T1,a × T1,b

2. if 2Z2 = 2 then [if: exception]

3. t1 = Z1 +Z1 {Q is in a�ne coordinates}4. else5. t1 = (2Z2)× Z1

6. t2 = T3,a × (2T2)7. T3,a = t2 × d8. t3 = t1 − T3,a

9. t1 = t1 + T3,a

10. t2 = X1 + Y1

11. T3,a = (X2 + Y2)× t212. t2 = Y1 −X1

13. X3 = (Y2 −X2)× t214. T3,b = T3,a −X3

15. T3,a = T3,a +X3

16. X3 = T3,b × t317. Z3 = t3 × t118. Y3 = T3,a × t119. return P +Q = (X3, Y3, Z3, T3,a, T3,b).


C Implementing the Group Law

Weierstrass curves. It is standard to represent points on Eb : y2 = x3−3x+b using Jacobian

coordinates [21, 47, 56]: for non-zero Z ∈ Fp, the tuple (X : Y : Z) is used to represent thea�ne point (X/Z2, Y/Z3) on Eb. There are many di�erent variants of the Jacobian formulasoriginally proposed in [22]. In our implementation we use the doubling formula from [41] (seeAlgorithm 10). Point additions are usually performed between a running point and a point froma (precomputed) `look-up' table. Typically, it is advantageous to leave the precomputed pointsin projective form for variable-base computations, and to convert them (o�ine) to their a�neform for �xed-base computations. When elements in the table are stored in a�ne coordinates,point addition is performed using mixed Jacobian/a�ne coordinates using, for example, theformula presented in [33] (see Algorithm 13). There are cases in which exceptions in theformulas might arise. This is the case, for example, for �xed-base scalar multiplication. Toachieve constant-time execution, we devised a complete formula based on masking that worksfor point addition, doubling, inverses and the point at in�nity (see Algorithm 18). If points fromthe precomputed table are stored in projective coordinates, we use Chudnovsky coordinates torepresent the a�ne point (X/Z2, Y/Z3) ∈ Eb by the projective tuple (X : Y : Z : Z2 : Z3). Thecorresponding addition formula is given as Algorithm 12. More e�ciently, whenever a doublingis followed by an addition (as in the main loop of the variable-base scalar multiplication; seeAlgorithm 1) one can use a merged doubling-addition formula [42] that is based on the specialaddition with the same Z-coordinate from [43] (see Algorithm 11). The di�erent costs of thepoint formulas used in our implementation can be found in Table 5. Finally, the exact routineto perform the precomputation for the variable-base scenario is outlined in Algorithm 4. Thescheme uses a straightforward variant of the general formulas, including the special additionfrom [43].

Twisted Edwards curves. Hisil et al. [34] derive e�cient formulas for additions on (special)twisted Edwards curves [7] by representing a�ne points (X/Z, Y/Z) on Ed : − x2 + y2 =1 + dx2y2 by the projective tuple (X : Y : Z : T ), where T = XY/Z. Hamburg [31] proposesto represent such a projective point using �ve elements: (X : Y : Z : T1 : T2), where T = T1T2.This has the advantage of avoiding a required look-ahead when computing the elliptic curvescalar multiplication using the techniques from [34]. If the addition formulas are �dedicated�they do not work for doubling but are usually more e�cient. The details of the dedicatedadditions used in our implementation are outlined in Algorithm 15 and 16. For settings thatmight trigger exceptions in the formulas (e.g., �xed-based scalar multiplication), one can usethe uni�ed addition formula proposed by [34] (see Algorithm 17). The algorithm for pointdoubling on Ed is given in Algorithm 14: this extends the formula from [7] by using the �veelement representation as suggested in [31].

When storing precomputed points, we follow the caching techniques described in [34]: westore a�ne points as (x+ y, y − x, 2t) with t = xy, or projective points as (X + Y : Y −X :2Z : 2T ) with T = XY/Z, both of which can speed up the scalar multiplication computation.Just as in the case of the Weierstrass curves above, it is usually advantageous to leave theprecomputed points in projective form for variable-base computations, and to convert them(o�ine) to their a�ne form for �xed-base computations. The explicit routine that performsthe precomputation for the variable-base scenario is outlined in Algorithm 5. The costs of thedi�erent formulas used in our implementation are displayed in Table 5.


Table 5. An overview of the number of modular operations required to implement the group law for a =−3 Weierstrass, a = −1 twisted Edwards and Montgomery curves using di�erent coordinate systems. TheWeierstrass point doubling works on Jacobian coordinates while the point addition formula takes as inputone Jacobian (Jac) coordinate and the other in either a�ne (a�) or (projective) Chudnovsky coordinates. Wealso show a merged double-and-add approach which computes R = 2P + Q where R and P are in Jacobianand Q in Chudnovsky coordinates. The complete addition formulas also include the number of table look-ups (denoted by #lut) that are required for their realization. The Edwards doubling uses the �ve-elementprojective coordinates (X : Y : Z : T1 : T2). The Edwards addition adds a �ve-element projective coordinate(X : Y : Z : T1 : T2) to a four-element projective coordinate (X + Y : Y − X : 2Z : 2T ) (proj.) or a three-element extended a�ne coordinate (x + y, y − x, 2t) (a�.) resulting in a �ve-element coordinate as a result.The performance of a single step of the Montgomery ladder (which computes a doubling and a di�erentialaddition) is stated as well.

ref #mul #sqr #mulc #add #sub #div2 #lut see

Weierstrass double [41] 4 4 0 2 5 1 0 Algorithm 10Weierstrass add:Jac+ Chud→ Jac [22] 11 3 0 0 7 0 0 Algorithm 12Jac+ a�.→ Jac [33] 8 3 0 1 6 0 0 Algorithm 13Weierstrass dbl-add [42] 16 5 0 2 11 0 0 Algorithm 11(Complete) Jac+ a�→ Jac This work 8 3 0 2 8 1 1 Algorithm 18(Complete) Jac+ Jac→ Jac This work 12 4 0 2 8 1 1 Algorithm 19

Edwards doubling [7] 4 3 0 3 2 0 0 Algorithm 14Edwards addition proj. [34] 8 0 0 3 3 0 0 Algorithm 15Edwards addition a�. [34] 7 0 0 4 3 0 0 Algorithm 15(Uni�ed) Edwards addition proj. [34] 9 0 0 3 3 0 0 Algorithm 17(Uni�ed) Edwards addition a�. [34] 8 0 0 4 3 0 0 Algorithm 17

Montgomery ladder step[46] 5 4 1 4 4 0 0

double-and-add

C.1 Complete Addition Laws

An elliptic curve addition law is said to be complete if it correctly computes the group operationregardless of the two input points. Although employing such an addition law on its own cansimplify the task of the implementer, it usually incurs a performance penalty. This is becausethe fastest formulas available for a particular curve model, which work �ne for most inputpairs, tend to fail on certain inputs. However, it is often the case that implementers cansafely exploit the speed of such incomplete formulas by correctly dealing with all possibleexceptions, or by designing the scalar multiplication routine such that exceptions can neverarise. All of the twisted Edwards curves presented in this paper can make use of the completeaddition law in [11] by working on the birationally equivalent Edwards model E−1/d : x2+y2 =1− (1/d)x2y2. However, the complete formulas are slower compared to the fastest formulas onthe twisted Edwards curve [34]. But even when working on an Edwards curve with completeformulas, an implementation of the scalar multiplication could still be sped up by mappingto a di�erent curve, while remaining with the complete formulas for all other operations. Onecould for example follow the approach suggested in [32], and use an isogeny to the twistedEdwards curve E−1/d−1 : x2 + y2 = 1 − (1/d + 1)x2y2; or use the birational equivalence toE : −x2 + y2 = 1 + dx2y2.

The situation for the prime order Weierstrass curves in this paper is more complicated.As pointed out by Bosma and Lenstra [18], the best that we can do for general elliptic curvesis as follows: on input of two points P1 and P2, we must compute two candidate sums, P3

and P ′3, for which we can only be guaranteed that at least one of them is a correct projectiverepresentation for P1 + P2. In the case that precisely one of P3 and P ′3 correctly corresponds


to P1 + P2, the other candidate has all of its coordinates as zero; although this makes itstraightforward to write a constant-time routine for complete additions, it also means thatcomputing complete additions in this way is much more costly than computing incompleteadditions.

For the sake of comparison, we present the simpli�ed version of the complete formulas9

from [18], which are specialized to short Weierstrass curves of the form E : y2 = x3 + ax+ b.For two input points P1 = (X1 : Y1 : Z1) and P2 = (X2 : Y2 : Z2) in homogeneous projectivespace, the two candidate sums P3 = (X3 : Y3 : Z3) and P

′3 = (X ′3 : Y

′3 : Z

′3) are computed as

X3 = (X1Y2 −X2Y1)(Y1Z2 + Y2Z1)− (X1Z2 −X2Z1)(a(X1Z2 +X2Z1) + 3bZ1Z2 − Y1Y2);Y3 = −(3X1X2 + aZ1Z2)(X1Y2 −X2Y1) + (Y1Z2 − Y2Z1)(a(X1Z2 +X2Z1) + 3bZ1Z2 − Y1Y2);Z3 = (3X1X2 + aZ1Z2)(X1Z2 −X2Z1)− (Y1Z2 + Y2Z1)(Y1Z2 − Y2Z1);

X′3 = −(X1Y2 +X2Y1)(a(X1Z2 +X2Z1) + 3bZ1Z2 − Y1Y2)− (Y1Z2 + Y2Z1)(3b(X1Z2 +X2Z1) + a(X1X2 − aZ1Z2));

Y ′3 = Y 21 Y

22 + 3aX2

1X22 − 2a2X1X2Z1Z2 − (a3 + 9b2)Z1Z

22 + (X1Z2 +X2Z1)(3b(3X1X2 − aZ1Z2)− a2(X2Z1 +X1Z2));

Z′3 = (3X1X2 + aZ1Z2)(X1Y2 +X2Y1) + (Y1Z2 + Y2Z1)(Y1Y2 + 3bZ1Z2 + a(X1Z2 +X2Z1)). (1)

In the case of a = −3 short Weierstrass curves, like the prime order curves in this paper,we found that the computations in (1) require at most10 22 multiplications, 3 multiplicationsby b, and one multiplication by b2 − 3. The adaptation of the formulas to points in Jacobiancoordinates can be achieved in the obvious way at an additional cost of 6 multiplicationsand 3 squarings: preceding (1), we can transform from Jacobian coordinates to homogeneouscoordinates by taking Xi ← Xi · Zi and then Zi ← Z3

i for i = 1, 2; and, following thecorrect choosing of P3 = (X3 : Y3 : Z3), we can move back to Jacobian coordinates by takingX3 ← X3 · Z3 and then Y3 ← Y3 · Z2

3 .Although the formulas in (1) are mathematically satisfactory, their computation costs

around twice as much as an incomplete addition (see Table 5), which renders them far fromsatisfactory in cryptographic applications. On the other hand, the work-around we present inAlgorithm 19 and Algorithm 18, while perhaps not as mathematically elegant, is equivalentfor all practical purposes and incurs a much smaller overhead over the incomplete formulas.In particular, there are no additional multiplications or squarings (on top of those incurredduring an incomplete addition) required when performing a complete addition via this masking

approach.As brie�y discussed in Section 4.1, the idea is to exploit the similarity between the se-

quences of operations computed in a doubling and an addition. On input of P and Q, onewould ordinarily compute the doubling 2P and the (non-uni�ed) addition P + Q and maskout the correct result at the end, depending on whether P = Q. However, the detection ofP = Q (or not) can be achieved much earlier in projective space using only a few operationsthat are common to both doublings and non-uni�ed additions � see Line 17 (resp. Line 12)in Algorithm 19 (resp. Algorithm 18). After this detection, the required operation (doublingor addition) is achieved by masking the correct inputs and outputs through a sequence ofsubsequent computations, those which overlap in the explicit formulas for point doublings andadditions. Of course, in the case that one or both of P or Q is O, or that P = −Q, thesesuper�uous computations are still computed in constant-time such that the correct result ismasked out in a cache-attack resistant manner.

9 We also corrected some typos in [18] that were pointed out in [6].10 We did not optimize (1) agressively; we simply grouped common subexpressions and employed obvious

operation scheduling � it is likely that there are faster routes.


Algorithm 18 Complete (mixed) addition using masking and Jacobian/a�ne coordinates onprime-order Weierstrass curves Eb.Input: P,Q ∈ Eb(Fp) such that P = (X1, Y1, Z1) is in Jacobian coordinates and Q = (x2, y2) is in a�ne

coordinates.Output: R = P +Q ∈ Eb(Fp) in Jacobian coordinates. Computations marked with [*] are implemented in

constant time using masking.

1. T [0] = O {T [i] = (Xi, Yi, Zi) for 0 ≤ i < 4}2. T [1] = Q3. t2 = Z2

1

4. t3 = Z1 × t25. t1 = x2 × t26. t4 = y2 × t37. t1 = t1 −X1

8. t4 = t4 − Y1

9. index = 310. if t1 = 0 then [*]

11. index = 0 {R = O}12. if t4 = 0 then index = 2 {R = 2P} [*]

13. if P = O then index = 1 {R = Q} [*]

14. mask = 015. if index = 3 then mask = 1 [*]

{case P +Q, else any other case}16. t3 = X1 + t217. t6 = X1 − t218. if mask = 0 then t2 = Y1 else t2 = t1 [*]

19. t5 = t2220. t1 = X1 × t521. Z2 = Z1 × t2

22. Z3 = Z2

23. if mask 6= 0 then t3 = t2 [*]

24. if mask 6= 0 then t6 = t5 [*]

25. t2 = t3 × t626. t3 = t2/227. t3 = t2 + t328. if mask 6= 0 then t3 = t4 [*]

29. t4 = t2330. t4 = t4 − t131. X2 = t4 − t132. X3 = X2 − t233. if mask = 0 then t4 = X2 else t4 = X3 [*]

34. t1 = t1 − t435. t4 = t3 × t136. if mask = 0 then t1 = t5 else t1 = Y1 [*]

37. if mask = 0 then t2 = t5 [*]

38. t3 = t1 × t239. Y2 = t4 − t340. Y3 = Y2

41. R = P [index] (= (Xindex, Yindex, Zindex)) [*]

42. return R


Algorithm 19 Complete (projective) addition using masking and Jacobian coordinates onprime-order Weierstrass curves Eb.Input: P,Q ∈ Eb(Fp) such that P = (X1, Y1, Z1) and Q = (X2, Y2, Z2) are in Jacobian coordinates.Output: R = P +Q ∈ Eb(Fp) in Jacobian coordinates. Computations marked with [*] are implemented in

constant time using masking.

1. T [0] = O {T [i] = (Xi, Yi, Zi) for 0 ≤ i < 5}2. T [1] = Q3. T [4] = P4. t2 = Z2

1

5. t3 = Z1 × t26. t1 = X2 × t27. t4 = Y2 × t38. t3 = Z2

2

9. t5 = Z2 × t310. t7 = X1 × t311. t8 = Y1 × t512. t1 = t1 − t713. t4 = t4 − t814. index = 315. if t1 = 0 then [*]

16. index = 0 {R = O}17. if t4 = 0 then index = 2 {R = 2P} [*]

18. if P = O then index = 1 {R = Q} [*]

19. if Q = O then index = 4 {R = P} [*]

20. mask = 021. if index = 3 then mask = 1

{case P +Q, else any other case} [*]

22. t3 = X1 + t223. t6 = X1 − t224. if mask = 0 then t2 = Y1 else t2 = t1 [*]

25. t5 = t2226. if mask = 0 then t7 = X1 [*]

27. t1 = t5 × t728. Z2 = Z1 × t229. Z3 = Z2 × Z2

30. if mask 6= 0 then t3 = t2 [*]

31. if mask 6= 0 then t6 = t5 [*]

32. t2 = t3 × t633. t3 = t2/234. t3 = t2 + t335. if mask 6= 0 then t3 = t4 [*]

36. t4 = t2337. t4 = t4 − t138. X2 = t4 − t139. X3 = X2 − t240. if mask = 0 then t4 = X2 else t4 = X3 [*]

41. t1 = t1 − t442. t4 = t3 × t143. if mask = 0 then t1 = t5 else t1 = t8 [*]

44. if mask = 0 then t2 = t5 [*]

45. t3 = t1 × t246. Y2 = t4 − t347. Y3 = Y2

48. R = T [index] (= (Xindex, Yindex, Zindex)) [*]

49. return R


D Traces of Frobenius

Table 6. The traces of Frobenius t for the curves in Tables 1 and 2. Compute group orders as#E(Fp) = p+1−tand #E′(Fp) = p+ 1 + t for E ∈ {Eb, Ed}.

curve Eb trace

w-256-mont 0x3AE8AEC191AF8B462EF3A1E5867A815

w-254-mont 0x147E7415F25C8A3F905BE63B507207C1

w-256-mers 0x1BC37D8A15D9A39FDF54DFD6B8AE571F

w-255-mers 0x79B5C7D7C52D4C2054705367C3A6B219

w-384-mont 0x456480EB358AEDAC85B1232C7583BE25D641B76B4D671145

w-382-mont 0x5914E300B421DEB28C4CDE002717D32E9F54797FC144CFE3

w-384-mers 0x29E150E114A2977E412562C2B3C81D859FB27E0984F19D0B

w-383-mers 0x563507EB575EE952604F4BFCABE8550CE6D6803F4485BABD

w-512-mont 0x9C757286D118AFD67F9B550F47B6719E20C2C66AF9B128C46C69D70E81670237

w-510-mont 0x46EB93321EAF10CC8B854D62E19A8C272DD216A1CDDCFC0C5FF4DFF6790565D3

w-512-mers 0xA4C35B046B187CE4B03DA712682F4239C4A974C99F832DBC31EAC0C6FBCCA86B

w-511-mers 0x724105C0A12627C65D2B01900AE91780572C19A95F06605E0FEFA08C4C462C81

w-521-mers 0xEC68BAF7857C1B89CA8C6EEA5B33425078F28632FC263E5626E091FCE826B47F7

curve Ed trace

ed-256- mont 0x13AAD11411E6330DA649B44849C4E1154

ed-254- mont 0x51AB3E4DD0A7413C5430B004EE459CE4

ed-256-mers 0x106556A94BD650E6C691EC643BB752C90

ed-255-mers 0x8C3961E84965F3454ED8B84BEF244F30

ed-384- mont 0x2A4BE076C762D8C9825225944DFC2407E406C7167336DD94

ed-382- mont 0xB394157AB7C8FA209CFA7E8EDF87E5F659DFF2586830167C

ed-384-mers 0x4CA0BB84A976997697B17EE9C7182C6EB8A4A3823EF64630

ed-383-mers 0x3BBDA3EC630981110CAA5E0D854D777E40050C4F9160DDE8

ed-512- mont 0xCCC0A98C8F32E3CBBF3E7EBB024842CB2099437935363F81733ADE04D1C927EC

ed-510- mont 0xA0C4BB860F4395023A482F564F6E7DFD280CF7DBA06996F4DE9F78C8324AB93C

ed-512-mers 0x1606BDFD840951119676E1EC2EDAAE83C8C56803CD1FFC1DAC61CB8D3D283F7A4

ed-511-mers 0x560F2F9F46F87459155B3C6E1CEDD9236AF63E504E83379AC20F45C1CBAF41DC

ed-521-mers 0x3FDBE33AA148655EB858032064704A4970AD77413EC18570EF44ABE8020D9AFAC


E Costs of Point Conversion

Table 7. The cost of converting points when using the curves from Tables 1 and 2. This is used for pointdecompressing and converting between twisted Edwards and Montgomery (and vice versa). The cost is ex-pressed in the number of exponentiations (exp), multiplications (mul), multiplication by constants (mulc) andsquarings (squ). Let EA/Fp : v2 = u3 + Au2 + u and Ed/Fp : −x2 + y2 = 1 + dx2y2 with B = −(A + 2) asquare in Fp and d = −(A − 2)/(A + 2). Let (X : Y : Z) be the projective coordinates for E . We follow theapproach described in [9] to decompress twisted Edwards points.

Edwards to Montgomery

conversion formula cost

(x, y) to (u) u = (1 + y)(1− y)p−2 1 exp, 1 mul

(y) to (u) u = (1 + y)(1− y)p−2 1 exp, 1 mul

(X : Y : Z) to (u) u = (Z + Y )(Z − Y )p−2 1 exp, 1 mul

Edwards to Edwards


(y) to (x, y) a = y2 − 1 1 exp, 3 mul, 2 squ

b = dy2 + 1

x = ab(ab3)(p−3)/4

Montgomery to Edwards


(u) to (y) y = (u− 1)(u+ 1)p−2 1 exp, 1 mul

(u) to (x, y) x = u√B(u3 +Au2 + u)(3p−5)/4 2 exp, 2 mul, 1 mulc, 1 squ

y = (u− 1)(u+ 1)p−2

Weierstrass to Weierstrass


(x) to (x, y) y = (x3 − 3x+ b)p−2 1 exp, 1 mul, 1 squ

Documents

Selecting Elliptic Curves for Cryptography: An E …Selecting Elliptic Curves for Cryptography: An E ciency and Security Analysis 3 We investigate the selection of prime moduli that