Transcript Collision Attacks: Breaking Authentication in TLS, IKE, … · 2015. 11. 1. · from the draft version of the TLS 1.3 protocol. Outline Section II introduces transcript

Transcript Collision Attacks:Breaking Authentication in TLS, IKE, and SSH

(Working Draft: November 2, 2015)

Karthikeyan BhargavanINRIA

Gaetan LeurentINRIA

Abstract—In response to recent high-profile attacksthat exploit hash function collisions, software vendorshave started to phase out the use of MD5 and SHA-1 in third-party digital signature applications such asX.509 certificates. However, weak hash functions continueto be used in various cryptographic constructions withinmainstream protocols such as TLS, IKE, and SSH, becausepractitioners argue that their use in these protocols reliesonly on second preimage resistance, and hence is unaf-fected by collisions. This paper systematically investigatesand debunks this argument.

We identify a new class of transcript collision attackson key exchange protocols that rely on efficient collision-finding algorithms on the underlying hash construc-tions. We implement and demonstrate concrete credential-forwarding attacks on TLS 1.2 client authentication, TLS1.3 server authentication, and TLS channel bindings. Wedescribe almost-practical impersonation and downgradeattacks in IKEv2 and SSH-2. As far as we know, theseare the first collision-based attacks on these popularcryptographic protocols. Our analysis demonstrates theurgent need for disabling all uses of weak hash functionsin mainstream protocols, and our recommendations havebeen incorporated into the latest TLS 1.3 draft.

I. INTRODUCTION

Hash functions, such as MD5 and SHA-1, are widelyused to build authentication and integrity mechanismsin cryptographic protocols. They are used within public-key certificates, digital signatures, message authentica-tion codes (MAC), and key derivation functions (KDF).However, recent practical attacks on MD5 and almost-practical attacks on SHA-1 have led researchers andpractitioners to question whether these usages of hashfunctions in popular protocols are still secure.

The first collision on MD5 was demonstrated in2005 [39], and since then, collision-finding algorithmshave gotten much better. Simple MD5 collisions cannow be found in seconds on a standard desktop. In re-sponse, protocol experts reviewed the use of MD5 in In-ternet protocols such as Transport Layer Security (TLS)

and IPsec [16], [15], [3]. Despite some disagreementon the long-term impact of collisions, they concludedthat most uses of hash functions in these protocolswere not affected by collisions. Consequently, MD5continues to be supported (alongside newer, strongerhash algorithms) in protocols like TLS and IPsec.

In 2009, an MD5 collision was used to create arogue CA certificate [37], hence breaking the securityof certificate-based authentication in many protocols. Avariant of this attack was used by the Flame malwareto disguise itself as a valid Windows Update securitypatch [35]. Due to these high-profile attacks, thereis now consensus among certification authorities andsoftware vendors to stop issuing and accepting newMD5 certificates. Learning from the MD5 experience,software vendors are also pro-actively phasing out SHA-1 certificates, since collisions on SHA-1 are believed tobe almost practical [36].

This leaves open the question of what to do aboutother uses of MD5 and SHA-1 in popular crypto-graphic protocols. Practitioners commonly believe thatcollisions only affect non-repudiable signatures (likecertificates), but that signatures and MACs used withinprotocols are safe as long as they include unpredictablecontents, such as nonces [16], [15].12In these cases,protocol folklore says that a second preimage attackwould be required to break these protocols, and suchattacks are still considered hard, even for MD5.

Conversely, theoretical cryptographers routinely as-sume collision-resistance in proofs of security forthese protocols. For example, various recent proofs ofTLS [17], [21], [11] assume collision-resistance eventhough the most popular hash functions used in TLSare MD5 and SHA-1. Whom shall we believe? Either itis the case that cryptographic proofs of these protocolsare based on too-strong (i.e. false) assumptions that

1http://www.rtfm.com/dimacs.pdf2http://events.iaik.tugraz.at/HashWorkshop07/slides/ekr

Indigestion.pdf

http://www.rtfm.com/dimacs.pdf

http://events.iaik.tugraz.at/HashWorkshop07/slides/ekr_Indigestion.pdf

http://events.iaik.tugraz.at/HashWorkshop07/slides/ekr_Indigestion.pdf

Fig. 1. SIGMA’: A mutually-authenticated key exchange protocol

should be weakened, or that practitioners are wrong andcollision resistance is required for protocol security.

This paper seeks to clarify this situation by systemat-ically investigating the use of hash functions in the keyexchanges underlying various versions of TLS, IPsec,and SSH. We demonstrate that, contrary to commonbelief, collisions can be used to break fundamental secu-rity guarantees of these protocols. We describe a genericclass of attacks called transcript collision attacks, anddetail concrete instances of these attacks against real-world applications. In particular, we demonstrate howa man-in-the-middle attacker can impersonate TLS 1.2clients, TLS 1.3 servers, and IKEv2 initiators. We alsoshow how a network attacker can downgrade DTLS [33]and SSH-2 [40] connections to use weak ciphers. Weimplement proofs-of-concept exploit demos for threeof these attacks to demonstrate their practicality, andprovide attack complexities for the others. We believethat ours are the first hash collision-based attacks on thecryptographic constructions within these protocols.

We do not claim to have found all transcript col-lision attacks in these protocols; nor do we think thatour attack implementations are the most efficient. Still,our results already provide enough evidence for usto strongly recommend that weak hash functions likeMD5 and SHA-1 should be immediately disabled fromInternet protocols. Partly due to recommendations byus and other researchers, these hash functions and otherweak constructions based on them have been removedfrom the draft version of the TLS 1.3 protocol.

Outline Section II introduces transcript collision at-tacks on authenticated key exchange protocols. Sec-tion III outlines the state-of-the-art in collision-findingalgorithms for MD5, SHA-1, and their concatenation.Section IV describes concrete attacks on various ver-sions of TLS and describes three proof-of-concept de-mos. Section V describes concrete attacks on IKE andSSH. Section VI summarizes our results and concludes.

II. TRANSCRIPT COLLISION ATTACKS ONAUTHENTICATED KEY EXCHANGE

Authenticated Key Exchange (AKE) protocols areexecuted between two parties, usually called client and

server or initiator and responder, in order to establisha shared session key that can be used to encrypt sub-sequent messages. A typical example is the SIGMA’protocol depicted in Figure 1. This protocol is a variantof the basic SIGMA (sign-and-mac) protocol from [20]which served as the inspiration for the key exchangesused in many protocols including IKE, OTR, and JFK.

In SIGMA’, the initiator A first sends a messagem1 to B, consisting of Diffie-Hellman public value gx,along with some protocol-specific parameters infoA thatmay include, for example, a nonce, a protocol version,a proposed ciphersuite, etc. B responds with a messagem2 containing its own Diffie-Hellman public value gyand some parameters infoB . A and B have now com-pleted an anonymous Diffie-Hellman exchange and cancompute the shared secret gxy and use it to derive thesession key. However, before using the session key, theyauthenticate each other by exchanging digital signaturesover the protocol transcript hash(m1 |m2) using theirlong-term signing keys (skA, skB). (Digital signaturealgorithms typically hash their arguments before sign-ing them, and we have chosen to make this hashingexplict in our presentation of SIGMA’.) By signing thetranscript, A and B verify that they agree upon all theelements of the key exchange, and in particular, that anetwork attacker has not tampered with the messages.Finally, A and B also prove to each other that they knowthe session key gxy by exchanging MACs computedwith this key over their own identities.

Like other AKE protocols, SIGMA’ aims to preventmessage tampering, peer impersonation, and sessionkey leakage, even if the network and other clients andservers are under the control of the adversary. Formally,authenticating the transcript guarantees matching con-versations, that is, that the two parties agree on eachothers identity and other important protocol parameters.

Transcript Collision Attacks The alert reader willnotice that SIGMA’ does not in fact guarantee that Aand B agree on the message sequence m1 |m2; it onlyguarantees that they agree on the hash of this sequence.What if a network attacker were to tamper with the mes-sages, so that A and B see different message sequencesbut the hashes of the two sequences is the same? In thatcase, the protocol will proceed to completion but theintegrity and authentication guarantees no longer hold.

Figure 2 illustrates such an attack. The man-in-the-middle (MitM) intercepts messages sent between A andB. It sends its own message m′1 = gx

′ |info′A to B andit sends it own response m′2 = gy

′|info′B to A. Supposeit can choose these messages such that the authenticatedtranscripts match:

hash(m1 |m′2) = hash(m′1 |m2)

2

Fig. 2. Man-in-the-middle credential forwarding attack on SIGMA’.The attacker creates a transcript collision by tampering with themessages shown in red. At the end of the protocol, the client andserver have seemingly authenticated each other, but the attacker knowsboth connection keys, and hence can read or write any data.

We call this a transcript collision. Now, MitM cansimply forward A’s signature over this transcript to Band vice versa. A and B will accept the signaturessince the hashed transcripts match and the signing keysare correct. However, the MitM knows the session keys(gx

′y, gxy′) on both connections (since it knows x′, y′).

Hence, the MitM has fully hijacked both connectionsand can now send messages to B pretending to be Aand to A pretending to be B. This is an impersonationattack that breaks peer authentication.

Synchronizing Transcripts In many protocols, mes-sage boundaries are clearly demarcated by prependingeach message with its length. So, when concatenatingm1|m2, the end of message m1 cannot be confused withthe beginning of m2. If this were not the case, then inSIGMA’, the MitM could choose:

m′1 = gx′ |[infoA |gy

′]︸︷︷︸

info′A

m′2 = gy′ |[gy |infoB ]︸︷︷︸

info′B

Then, m1 |m′2 = m′1 |m2 and the transcript collisionwould be accomplished. However, if each message wereprepended with its length, these transcripts would notmatch because the lengths of m1 and m′1.

In the rest of this paper, we will assume that eachmessage (and each message field) is prefixed with itslength, so that such trivial attacks do not occur. Instead,we will try to implement transcript collisions by relyingon collisions in the underlying hash function.

A Generic Transcript Collision The main challengein implementing the attack in Figure 2 is that the MitMhas to compute the messages m′1 and m′2 after receivingm1 but before the responder has sent its response m2.

The feasibility of the attack depends on the contents andformats of these messages.

Suppose the responder B always sends the samemessage m2 for every request; that is, it uses the same(static) Diffie-Hellman value gy and same parametersinfoB . (This situation occurs, for example in protocolslike QUIC, where the server uses a static configuration.)In that case, after receiving m1, the MitM can compute atranscript collision by finding x′, y′, info′A, info

′B such

that hash(m1 |m′2) = hash(m′1 |m2). The amount ofwork required to find such a collision depends on thehash function. As we will see in the next section, suchgeneric collisions require 2N/2 work for hash functionsthat produce N bits. Hence, for MD5, such a collisionwould require the MitM to compute 264 MD5 hashes,which may well be achievable by powerful adversaries.As a point of comparison, the current Bitcoin network isable to compute up to 259 SHA-256 hashes per second3,and SHA-256 is typically 14 times slower than MD5.

A Chosen-Prefix Transcript Collision We now con-sider a more efficient attack that works even whenB sends an unpredictable m2 containing a fresh(ephemeral) Diffie-Hellman value gy and a previouslyunknown infoB . However, we assume that the lengthof m2 (M ) is fixed and known to MitM. Moreover,suppose that in the second message of SIGMA’, infoBis allowed to have arbitrary length and arbitrary con-tents. That is, even if infoB has junk data at the end,A will accept the message. Specifically, suppose thatinfoB = lenB |dataB where dataB is opaque data thatwill be ignored by A. (We will see several examplesof such “collision-friendly” messages in TLS, IKE, andSSH.) Finally, we assume that the hash function uses theMerkle-Damgard construction [28], [7], so that it obeysthe length extension property: if hash(x) = hash(y)then hash(x|z) = hash(y |z). (In general, this propertyonly holds when the lengths of x, y are equal and amultiple of the hash function block size.)

Under all these conditions, MitM can compute atranscript collision by finding two collision bitstringsC1, C2 each with N bytes, such that:

hash(m1 |[gy′|

info′B︷︸︸︷

len ′B |C1 |−]︸︷︷︸m′

2

) = hash([gx′|

info′A︷︸︸︷

C2 ]︸︷︷︸m′

1

)

where len ′B = N +M . Note that we have left emptyspace (written −) of size M bytes that still needs tobe filled after C1 in info′B . As we will see in the nextsection, this kind of collision is called a chosen-prefixcollision and is typically achievable with far less workthan a generic collision. For example, a chosen-prefix

3https://blockchain.info/de/charts/hash-rate

3

https://blockchain.info/de/charts/hash-rate

collision in MD5 requires the MitM to compute about239 MD5 hashes, which takes only a few GPU hours.

After receiving m1 from A and computing C1, C2,MitM now sends m′1 to B. When B responds to m2

(of size M ) bytes, MitM now stuffs m2 at the end ofinfo′B (in place of −) and sends m′2 to A. Due to thelength extension property, we have:

hash(m1 |[gy′|

info′B︷︸︸︷

len ′B |C1 |m2]︸︷︷︸m′

2

) = hash([gx′|

info′A︷︸︸︷

C2 ]︸︷︷︸m′

1

|m2)

That is, the MitM has obtained a transcript collision andthe impersonation attack succeeds.

The attack here exploits hash collisions in combi-nation with flexible protocol-specific message formats,and as we will see, this is one of the main novel tricksthat we use to mount various attacks in this paper.

Other Transcript Collisions The transcript collisionsdescribed above are not the only attacks possible onsuch protocols. In some cases, MitM may not be ableto use its own Diffie-Hellman values gx

′, gy

′but it may

still be able to tamper with the protocol parameters (e.g.ciphersuites) in info′A, info

′B . In such cases, the MitM

does not have full control over either connection (i.e.it cannot impersonate A or B) because it does knowthe session keys, but it may still be able to downgradethe protocol parameters to use weak, breakable ciphers.For example, by choosing an obsolete protocol versionor 512-bit export cryptography in info′A, the MitM maybe able to force A and B into using weak session keysthat can be broken with some computation.

In other cases, the message format may lend itselfto simpler common-prefix collisions that require evenless work than chosen-prefix collisions. Such collisionson MD5 can be found in seconds even on standarddesktops. In the next section, we will discuss thesedifferent types of collisions in mode detail, and in theremainder of the paper, we will exploit them to mounttranscript collision attacks on real-world protocols.

III. HASH FUNCTION CRYPTANALYSIS

A hash function H : {0, 1}∗ → {0, 1}N maps arbi-trary length binary strings to strings of N bits. Broadlyspeaking, a cryptographic hash function is expectedto behave like a randomly selected function from theset of all functions from {0, 1}∗ to {0, 1}N ; buildinginput/output values with specific properties should be ashard for H as for a random function. More concretely,a cryptographic hash functions should meet four goals:

• Preimage resistance: Given a target value H ,it should be hard to find x such that H(x) = H

• Second-preimage resistance: Given an inputx, it should be hard to find a second input x′ 6=x such that H(x′) = H(x)

• Chosen-prefix-collision resistance: Given pre-fixes P and P ′, it should be hard to find a pairof values x, x′ such that H(P ′ |x′) = H(P |x).

• Collision resistance: For a hash function H , itshould be hard to find a pair of inputs x 6= x′

such that H(x′) = H(x).

The best generic preimage or second-preimage attack isa brute-force attack where the adversary has to try about2N random inputs in order to find a preimage. There-fore, a good cryptographic hash function should havea preimage and second-preimage security of 2N . Onthe other hand, there is a generic collision and chosen-prefix-collision attack with complexity 2N/2 becauseof the birthday paradox. If an adversary computes theimage of 2N/2 random inputs, this defines about 2N

pairs of inputs, and there is a high probability that oneof these pairs is a collision.

Generic collision attacks While a basic collisionattack requires to compute and store 2N/2 images ofthe hash function, it is possible to mount a parallel andmemory-less attack with a very small overhead. Themain idea was introduced by Pollard as the Rho algo-rithm for factorization [29] and discrete logarithms [30],and was later generalized to collision search. The hashfunction is first restricted from {0, 1}∗ → {0, 1}N to{0, 1}N → {0, 1}N , so that it can be iterated. Aftersome number of steps, a chain of iterations reaches acycle, and the graph will have the shape of the greekletter ρ. On average, the cycle has length O(2N/2) andis reached after O(2N/2) steps. The point where thetail of the meets with the cycle reveals a collision inthe hash function. It can be detected in time O(2N/2)with little or no memory, using various cycle detectionmethods, such as Floyd’s algorithm [19] (also known astortoise and hare).

Some variants of this attack using distinguishedpoints can be parallelized efficiently. We now describea parallel version of Pollard’s Lambda algorithm, asdescribed by van Oorschot and Wiener [38], using cCPUs. Each CPU will compute iteration chains of thefunction H , and stop when reaching a distinguishedpoint, that is a point with some easy to test property. Forinstance, we stop a computation when the ending pointsatisfies x < 2N/2αc for some small constant α, so thatthe expected length of a chain is 2N/2/αc. When a chainis finished, we store the starting point, the length, andthe ending point. We generate αc chains in this way, sothat the function has been evaluated about 2N/2 times,and there is a high probability that there was a collision.The important idea of this attack is that if a given point

4

is reached by two different chains, both chain will stopat the same distinguished point. Therefore, we look atthe ending points of the chains, and when a collision isdetected, we restart the chains from the starting point inorder to locate the collision. This attack requires about2N/2 evaluations of H , and a memory of αc when usingc CPUs.

This attack can be tweaked for a chosen-prefix-collision attack using an auxiliary functiong : {0, 1}N → {0, 1}N defined as:

g(x) =

{H(P |x) if x is evenH(P ′ |x) if x is odd.

Collisions in g can be found with the previous tech-niques. With probability 1/2 a collision is g is betweenan even x and an odd x′ (or vice versa), this implies achosen-prefix-collision H(P|x) = H(P ′|x′). An accuratecomplexity analysis is provided in [38]: a collision isexpected to be found after

√πN/2 evaluations. For a

chosen-prefix collision, we expect to find two collisionsin g after

√πN evaluations.

Generic collision attacks are very powerful: theycan use meaningful messages, and can easily be usedfor chosen-prefix-collisions. In addition, they can beefficiently parallelized on large-scale hardware. In prac-tice, an efficient way to mount such an attack withcommodity hardware would be to use general purposeGPU. A single Nvidia GTX 970 card costs about $350,and can perform about 234.5 MD5 computations persecond4; a brute-force collision attack for MD5 wouldrequire 264 evaluations of MD5, i.e. about 68 card-years. One could perform 264 MD5 computation per daywith a budget of 10 million dollars (using 27000 cards).Assuming an electricity consumption of 200 Watts percard, and a price of 10¢/kWh, the running cost wouldbe about $13000 per day (i.e. per collision).

Alternatively, using Amazon EC2 with GPU in-stances, one can get about 2.5GH/s for a cost ofabout 8¢/hr (spot price). This prices a single brute-force MD5 collision at 165000$. Dedicated hardwarewould be significantly more efficient, as shown bythe Bitcoin network, where many miners uses customhardware to compute up to 242.4 SHA-256 hashes persecond.5 The full Bitcoin network has reached a peakhash rate of 259 hashes/s. For a cost estimate, welooked at several providers that offer cloud-based SHA-256 computation for bitcoin-mining6; it typically costless than a thousand dollars to perform 264 hashes.These calculations provide a point of comparison, but

4http://thepasswordproject.com/oclhashcat benchmarking5https://en.bitcoin.it/wiki/Mining hardware comparison6https://hashop.io/purchase/plans

H1

H2

IV1 h1

x0

x′0

x1

x′1

x2

x′2

x3

x′3

x4

x′4

x5

x′5

IV2 h2

M

M ′

Fig. 3. Multi-collision attack

Bitcoin hardware cannot directly be used for MD5, anddesigning custom hardware incurs a large one-time cost.

Concatenation To strengthen protocols against colli-sions in any one hash function, it may be tempting touse a combination of two independent hash functions.For example, TLS versions up to 1.1 use a concatenationof MD5 and SHA-1. While the output length of thisconstruction is 288 bits, it does not offer the securityof a 288-bit hash function. In particular, Joux describeda multi-collision attack that breaks the concatenationof two hash functions with roughly the same effort asbreaking the strongest one of the two [18].

The adversary first finds a collision pair (x0, x′0) forH1, starting from the initialization value of H1. Then itfinds a collision pair (x1, x

′1) starting from H1(x0) =

H1(x′0). This defines 4 messages with the same H1-

digest: x0 |x1, x0 |x′1, x′0 |x1, x′0 |x′1. After N2/2 steps,this defines a set of 2N2/2 messages with the same H1-digest. With high probability, two of these messageshave the same H2-digest as well. Therefore, one canfind a collision in H1|H2 with a complexity only N2/2×2N1/2 +2N2/2. For MD5|SHA-1, this translates to 280,roughly as much as a generic collision attack on SHA-1.

Shortcut collision attacks In the last decade, hashfunction cryptanalysis has been a very active researcharea, and more efficient attacks have been discovered onwidely used hash functions. The best attacks currentlyknown against MD5 and SHA-1 are the following:

• A collision-attack for MD5 with estimatedcomplexity 216 [37]

• A chosen-prefix-collision for MD5 with esti-mated complexity 239 [37]

• A collision-attack for SHA-1 with estimatedcomplexity 261 [36]

• A chosen-prefix-collision for SHA-1 with esti-mated complexity 277 [36]

This allows attacks against MD5|SHA-1 by combin-ing cryptanalytic attacks and the generic multi-collision

5

http://thepasswordproject.com/oclhashcat_benchmarking

https://en.bitcoin.it/wiki/Mining_hardware_comparison

https://hashop.io/purchase/plans

SHA-1

MD5

IV1

IV ′1x′0

x0

h1

IV2

IV ′2

x0

x′0

h2

M ′

M

Fig. 4. CPC attack against MD5|SHA-1

H Collision CPC

Generic 2N/2 2N/2

H1 |H2 2N1/2N2/2 + 2N2/2 2N1/2N2/2 + 2N2/2

MD5 216 239

SHA-1 261 277

MD5|SHA-1 267 277

TABLE I. COMPLEXITY OF FINDING COLLISIONS IN VARIOUSHASH CONSTRUCTIONS

attack. A collision attack can be build for a cost of64 × 261 = 267. For a chosen-prefix-collision, wefirst perform a chosen-prefix-collision against SHA-1,to generate messages (x, x′) such that SHA-1(P |x) =SHA-1(P ′|x′). Then we build a multicollision in SHA-1 starting from this value, and we evaluate MD5 overa set of 264 messages in order to find a collision. Thetotal cost is about 277 + 64× 261 + 264 ≈ 277.

Moreover it has been shown that it is possible tocombine cryptanalytic shortcuts both on SHA-1 andMD5, assuming that collision attacks against SHA-1improve in the future [26]. This may allow collisionattacks against MD5 |SHA-1 with less than 264 work.Table I summarizes the currently-known complexitiesfor computing various hash collisions.

Using hash functions in HMAC Hash function areoften used with a key in order to build a Message Au-thentication Code (MAC) or a Pseudo Random Functionfamily (PRF). In particular HMAC, as defined below, iswidely used in cryptographic protocols, both as a MACand as a PRF:

MACk(M) = H(k ⊕ opad|H(k ⊕ ipad|M))

The security requirement of a MAC is to resistforgeries. An adversary with access to a MAC oracle(but not to the secret key) should not be able to predictthe MAC of a non-queried message. A PRF has astronger security requirement: an adversary with oracleaccess to a randomly selected function Hk in the familyshould not be able to distinguish it from a truly randomfunction. In particular, a secure PRF is a secure MAC.

HMAC has been proved to be a secure PRF (andMAC) even if the hash function is not collision-resistant (assuming that the compression function isstill a PRF) [2]. Therefore, the proper use of HMACcan be secure even based on a weak hash function.However, HMAC only provides this extra security whenthe key is unknown to the adversary and when theHMAC is computed directly over a protected message.For example, the Finished message in TLS is computedas HMACk(H(M)). While this seems secure due tothe use of HMAC, it is actually weak because themessage is first hashed by an unkeyed hash function. Ameaningful collision in H can result in MAC forgery.

A secure MAC does not have to be collision resis-tant. In particular, if the MAC uses a weak hash functionand a key that is known to the attacker, then collisionsin the MAC can be found easily.7 If the hash functionis strong, but the MAC is truncated to 96 bits (e.g. asin the TLS Finished message), then we still get a goodlevel of security as a MAC, but collisions can be foundrelatively easily (with about 248 work).

Other MAC Constructions Other than HMAC, thereare several well-known constructions to build a MACor a PRF family from a hash function H:

• Secret-prefix MAC:MACk(M) = H(k |M)

• Secret-suffix MAC:MACk(M) = H(M |k)

• Envelope MAC:MACk(M) = H(k |M |k)

•

Variant of secret-prefix MAC are used in HTTP di-gest (RFC2069); variants of secret-suffix MAC are usedin APOP (RFC1939) and HTTP Digest (RFC2647). Theauthenticated Diffie-Hellman key exchange in SSH-2also uses a variant of secret-suffix MAC.

All these constructions are good PRFs and MACs(up to 2N/2 queries) when used with a random ora-cle, and there is a generic forgery attack using 2N/2

queries if the hash function has an N -bit internal state.However, it is important to note that constructions usingthe key only after processing the message (such asthe secret-prefix MAC) are much less secure. Indeed,collisions in the hash function directly lead to forgeriesindependently of the key. In particular, these collisionscan be computed offline, and do not require any queriesto a MAC oracle. If the hash function is susceptible toshortcut collision attacks, this makes the MAC clearlyinsecure.

7See e.g. http://dankaminsky.com/2015/05/07/the-little-mac-attack/

6

http://dankaminsky.com/2015/05/07/the-little-mac-attack/

Some MACs constructions that just concatenate themessage and the secret key (e.g. secret-suffix MAC andenvelope MAC) are also susceptible to a key-recoveryattack based on collisions [32]. In particular this kind ofattack have been shown to be practical against APOP,a MD5 based challenge-response authentication schemebased on secret-suffix MAC [24].

Randomized hashing One way to strengthen the useof weak hash functions in signatures is via randomizedhashing, as introduced by Halevi and Krawczyk in [13].This requires a small modification of the signing algo-rithm: rather than applying the signature to a hash ofthe message as σ(H(M)), the signer uses a family of(enhanced) target collision resistant hash function andpicks a random salt r to select a function Hr in the fam-ily. Then, the signature is computed as r, σ(r|Hr(M)),or r, σ(Hr(M)) if the hash function is enhanced targetcollision resistant. For a concrete construction of anenhanced target collision resistant hash function, Haleviand Krawczyk introduce the RMX constuction. Thisconstruction can potentially prevent an adversary fromexploiting hash collisions to forge signatures, but it isnot yet used in any mainstream Internet protocol.

IV. TRANSCRIPT COLLISION ATTACKS ON TLS

The Transport Layer Security protocol (TLS) [8] isperhaps the most widely used secure channel protocol.Its most visible application is to secure connectionsbetween web browsers and HTTPS websites, but itis also used to secure email, Wi-Fi networks, VirtualPrivate Networks (VPNs), etc. Various versions of theprotocol are still used on the Internet and they differin some important cryptographic details, although themessage structures remain quite similar. Many versionsof TLS are used on the Internet; the latest releasedversion is TLS 1.2 [8], while TLS 1.3 [9] is currentlyundergoing standardization at the IETF.

A. The TLS Handshake Protocol

Figure 5 depicts a typical handshake in TLS (inversions 1.0 to 1.2). The client first sends a hellomessage CH that contains a fresh random client noncenc and various protocol parameters exc, including theprotocol version, supported list of ciphersuites, andvarious protocol extensions. Each extension is prefixedby its length and can contain a payload of up to 216

bytes. Notably, the client hello may include extensionsthat the server does not understand or support, and theserver will ignore them.

The server responds to the client hello with a seriesof messages (from SH to SHD). The server hello SHcontains a fresh server nonce ns and parameters exs,including the server’s chosen version, ciphersuite, andprotocol extensions. In most ciphersuites, the server then

Fig. 5. TLS: A mutually-authenticated DHE handshake. Fields shownin red indicate parts of the handshake that can contain arbitrary-lengthopaque data (useful for stuffing collision blocks). Handshake logs(log1, log2, log3) refer to the concatenation of all messages up to(and including) the current one. Messages SCR,CC,CCV are optionaland only appear when client certificate authentication is used. NPN isoptional and only appears when the client and server support the next-protocol-negotiation extension. The tls-unique channel bindingis a connection identifier that may be used by applications to binduser authentication tokens, such as cookies and passwords, to theunderlying TLS channel to prevent credential forwarding.

sends its public-key certificate SC. In Ephemeral Diffie-Hellman (DHE) ciphersuites, SC is followed by a serverkey exchange message SKE that contains an ephemeralpublic value gy along with a description of the Diffie-Hellman group chosen by the server, including the primep and generator g. The server signs these values toprotect then from tampering and to prove that it knowsthe private key (skS) for the certificate:

sign(skS , hash(nc |ns |p|g |gy))

The signature and hash algorithm used for this signatureis chosen by the server based on its certificate as well asthe supported algorithms indicated by the client withinan optional signature-algorithms extension inthe client hello. In TLS versions before 1.2, the hashalgorithm was fixed to be MD5 |SHA-1 but TLS 1.2allows clients and servers to choose any hash algorithmthey both support (MD5, SHA-1, SHA-256, etc.) Hencein TLS 1.2, each digital signature is prefixed withidentifiers for the chosen signature and hash algorithm.

If the server wants the client to authenticate itselfwith a public-key certificate, it then sends a certificaterequest message SCR indicating the certificate types andsignature algorithms it supports, as well as an optionallist of distinguished names dn for the client certificationauthorities that it trusts. As with hello extensions, eachdistinguished name can be 216 bytes long and cancontain arbitrary data that the client will ignore if it doesnot recognize the name. The server’s message flight then

7

ends with the server hello done message SHD.

The client then sends its own certificate CC if theserver asked for it, and its own Diffie-Hellman keyshare gx in a client key exchange message CKE. If theclient sent a certificate, it must prove that it knows theprivate key skC by sending a client certificate verifyCCV message with a signature over the full messagelog up to this point in the protocol:

sign(skC , hash(CH|SH|SC|SKE|SCR|SHD|CC|CKE))︸︷︷︸

log1

At this point, the client and server both derive a ses-sion master secret ms and authenticated encryption keysfor both directions (k1, k2). The client sends a changecipher spec message to indicate that the subsequent mes-sages it sends will be encrypted (with k1.) This messageis not technically part of the handshake protocol anddoes not appear in the authenticated transcript, and soit is not shown in Figure 5.

If the client and server both indicate support for thenext-protocol-negotiation extension [23] in their hellomessages, the client then sends an encrypted extensionsmessage NPN containing a selected application layerprotocol (e.g. http/1.1 or spdy/3). The protocolname is ASCII-encoded and then padded to the nearestmultiple of 32 bytes (to avoid leaking information viathe encrypted message length.)

The client then sends an encrypted finished messageCFIN containing a MAC of the full handshake log log2using the master secret ms . In TLS 1.0 and 1.1, thisMAC is computed using a combination of HMAC-MD5and HMAC-SHA-1, whereas in TLS 1.2, it uses HMAC-SHA-256. In all these versions, the result of the MACis then truncated to 12 bytes (96 bits):

mac96(ms, hash(CH|SH|SC|SKE|SCR|SHD|CC|CKE|CCV|NPN))︸︷︷︸

log2

When a server receives CFIN, it verifies that the clientagrees with it on the full message log and on the mastersecret. It responds by sending its own change cipherspec message to turn on encryption and a server finishedmessage SFIN that contains a 96-bit MAC over the fullhandshake log log3 using the master secret ms .

At the end of the handshake, both client and serverhave authenticated each other, proved knowledge of themaster secret, and agreed upon the message log. Theycan now start encrypting application data to each otherusing the connection keys (k1, k2).

In most common TLS usage scenarios, clients arenot authenticated using certificates. The handshake au-thenticates only the server and the client-side user is

authenticated within the application using a challenge-response protocol based on a password or some otherbearer token (e.g. HTTP cookie). Such application-levelauthentication protocols are known to be vulnerable toa general class of credential forwarding attacks unlessthe application-level credential is channel bound to theTLS connection (e.g. see [5]. In such attacks, a client Cconnects to a malicious server M and authenticates withsome credential over TLS, but M forwards the authenti-cation message over another TLS channel to S, therebylogging in as C at S. The attack is prevented if theauthentication protocol embeds a unique identifier forthe underlying TLS channel, so that a message sent overone channel cannot be forwarded over another. One suchidentifier, called tls-unique, defined in [1], uses thecontents of the CFIN message as a unique identifierfor the TLS connection. This tls-unique channelbinding is used by a number of emerging application-level authentication protocols, such as SCRAM [27],FIDO [14], and Token Binding [31], specifically toavoid credential-forwarding attacks.

Despite all these protections built into the protocol,in the rest of this section, we will show how to usetranscript collision attacks to mount various attacks onTLS and its variants.

B. Breaking TLS 1.2 Client Authenticationusing a Chosen-Prefix Transcript Collision

Suppose a client C uses the same certificate toconnect to two different servers A and S. We show thatif A is malicious, it can force C to create a signature(in CCV) that A can use to impersonate C at S, asdepicted in Figure 6. Here, A acts as a man-in-the-middle between C and S. Note, however, that A usesits own certificate certa and does not rely on knowingany long-term secrets belonging to C or S.

Recall that the client signs the transcript hash(logc1);so the key idea of the attack is to compute a collisionbetween this client-side transcript and the server-sidetranscript hash(logs1), even though the two connectionssee different message sequences.

When the MitM A receives a client hello from C,it responds with its own hello SH′, certificate SC′, keyexchange SKE′. It then initiates a connection with theserver S by sending a carefully crafted client helloCH′. A now runs both connections in parallel. It willreceive a hello SH, certificate SC, key exchange SKE,and certificate request SCR from S. We assume that thelength of these messages SH|SC|SKE|SCR is fixed (=M )and is known in advance to A.

Note that A needs to choose CH′ before it receivesany messages from S. A can compute CH′ and SCR′

as follows. A uses a chosen-prefix collision to find two

8

Fig. 6. Man-in-the-middle client signature forwarding attack on TLS 1.2. The client C connects to a malicious server A and offers toauthenticate with its certificate certC . The attacker A computes a chosen-prefix collision on the client signature transcript hash(log1), anduses it to impersonate the client at a different server S. Messages that the attacker controls are labeled in red. A sends a bogus Diffie-Hellmangroup (k2 − k, k) to C; we use k = g here for simplicity.

bitstrings (C1, C2) of length N bytes each such that C1

appears within the last distinguished name dn′ in SCR′

and C2 appears within the last extension ex′c in CH′:

hash(CH|SH′ |SC′ |SKE′ |SCR′(C1 |−︸︷︷︸dn′

))

= hash(CH′(nc, C2︸︷︷︸ex′

c

))

Furthermore, we set the length of dn′ in SCR′ to beN+M , so that it still has M bytes (denoted by −) thatneed to be filled in after C1.

Now, A sends CH′ to S, receives SH|SC|SKE|SCR inresponse, and stuffs these messages into the remainingM bytes in SCR′ and sends it to C. At this point thehash of the message logs in the two connections coin-cide, assuming that the hash function satisfies the length

extension property of Merkle-Damgard constructions:

hash(CH|SH|SC′ |SKE′ |SCR′(C1 |SH|SC|SKE|SCR︸︷︷︸

dn′

))

= hash(CH′(nc, C2︸︷︷︸ex′

c

)|SH|SC|SKE|SCR)

From this message onwards, the hash of the handshakelog in both connections will remain the same.

A then forwards the sever hello done SHD to C. Inresponse, C sends a certificate CC, a key exchange CKE,and a certificate verify CCV that contains a signatureover the transcript hash(logc1) which is now the sameas hash(logs1). A simply forwards these messages to S,pretending to be C, and S accepts these messages.

Controlling the master secret Even though S hasaccept C’s certificate on its connection with A, Acannot complete the connection unless it knows themaster secret on its connection with S. The mastersecret is computed from gxy so A needs to know thex corresponding the gx that C sent in its key exchange

9

message CKE. In order to accomplish this task, we relyon a key forcing attack in the DHE handshake.

When A sends SKE′ to C, it does not send avalid Diffie-Hellman group (p, g). Instead, it chooses anarbitrary public value k = gx

′and sets p = k2 − k and

g = k. This p value is clearly not a prime, and it has theproperty that no matter what private value x is generatedby C, we will have gx mod p = k. Hence, by choosingsuch a bogus Diffie-Hellman group, A can force C tosend a CKE with a public value that it controls.

To complete the attack, we assume that S alwaysuses the same Diffie-Hellman group (p, g). A choosessome x′ and sets k = gx

′mod p. It then sends SKE′

to C with the bogus group (k2 − k, k) and the publicvalue k. Now, the CKE sent by C will contain k, and Awill forward it to S. A will then forward C’s signatureCCV as usual. The master secret between A and S willbe derived from gx

′y mod p, but A knows x′ and hencecan compute this value. Consequently, A can completethe handshake and impersonate C at S.

We observe that the attack here relies on the clientnot validating the Diffie-Hellman groups it receivesfrom the server. From our experiments, we find thatmost TLS libraries do not validate the groups theyreceive in the server key exchange, probably becausechecking for primality is expensive. In some libraries,the value k2−k is rejected because it is an even number.In those cases, we find that we can use p = k2− 1 andwith 50% probability, the client will compute gx = k,allowing the attack to succeed. This weakness in TLS-DHE has been noted before [6] and a new protocolextension aims to fix it by allowing only well-knownDiffie-Hellman groups [12]. However, an optional ex-tension cannot prevent our attack scenario, since Acould always pretend to not support the extension andmount the attack anyway.

Note that the attack only relies on DHE betweenC and A; the connection between A and S can useECDHE or RSA and the attack would still work. Inother words, such transcript collisions can also be usedto mount cross-protocol attacks in the sense of [25].

Attack Complexity The transcript collision attackrequires A to compute a chosen-prefix collision forthe hash function used in the client signature. In TLSversions before 1.2, the default hash function is aconcatenation of MD5 and SHA-1 and hence requirescomputing 277 MD5 and SHA-1 hashes. In TLS 1.2,if the signature uses SHA-1, the cost is 277 hashes.Remarkably, TLS 1.2 also allows RSA-MD5 signatures,and for such signatures, the cost of the collision isonly 239 MD5 hashes. Below, we describe our proof-of-concept implementation that relies on RSA-MD5.

Note that these cost estimates are per-connection be-cause the collision needs to be computed once for eachclient nonce nc. Usually, these nonces are generatedwith a strong random number generator. However, insome cases the client random can become predictabledue to implementation bugs (e.g. see CVE-2015-0285in OpenSSL). We also observe that it is commonlybelieved that these nonces only need to be unique,not unpredictable. For example, the OpenSSL libraryuses RAND_pseudo_bytes to generate the client andserver random, whereas it uses RAND_bytes to gener-ate other key material; the former succeeds even whenthere is not enough entropy in the system. If the clientnonce were predictable, or if it were to be repeated withhigh frequency, the collision can be computed offline atleisure, making even SHA-1 collisions feasible. Eventhough our attack below does not rely on predictablenonces, it offers yet another justification for the needfor strongly random nonces in TLS.

Implementing a Proof-Of-Concept To implement theattack, we need a client that is willing to sign withRSA-MD5 and a server that is willing to accept suchsignatures. We found a number of TLS libraries thatsupport RSA-MD5, including GnuTLS (commonly usedin curl, wget, and git) and JSSE (the default TLS libraryfor Java clients and servers).8 For our demo, we set up aman-in-the-middle attack between a Java HTTPS clientand Java HTTPS server in their default configurations.

The MitM implements Figure 6. In order to setup thecollision while preserving the TLS message formats, theattacker needs to carefully set the length fields in variousplaces in CH′ and SCR′. For example, in CH′ it needs toset consistent lengths for the full hello message, for theextensions field, and for the last extension. Furthermore,the MitM needs to make sure that the two prefixes havea length that is a multiple of the MD5 block size (512bits). To achieve this, we fill up the last extension in CH′and the last distinguished name in SCR′ with enoughzero bytes until the prefixes are block-aligned.

To compute the chosen-prefix collision, we used anunmodified version of the HashClash software [34] andran it on a single Amazon “large” GPU instance with16 cores. The computation took 5 hours and cost about$2.50 and generated a 10 block collision (N = 640).The time taken is comparable with Stevens et al.’sestimate: the chosen-collision attack should require 239

hash computations, or 35 core-hours [37]. However,the problem is highly parallelizable; chosen-prefix col-lisions could be found much faster using a large cluster,by using GPUs more aggressively, or by using custom

8Other libraries that support RSA-MD5 are PolarSSL/mbedTLS,BouncyCastle, NSS clients. Mono/CyaSSL do not support serversignatures. SChannel/SecureTransport appear not to support RSA-MD5.

10

Fig. 7. TLS 1.3: A mutually-authenticated DHE handshake basedon draft 7 of the specification. The client sends its key share CKSbefore the server SKS in order to finish the handshake in one roundtrip. CKS can contain more than one key share and the server willignore the ones (e.g. go) that it does not use. The server signs thefull handshake transcript hash(log2) in a new SCVmessage.

hardware. Our attack uses stock HashClash and showsthat the attack is already feasible at modest cost.

In our proof-of-concept demo, A accepts the clienthello and then keeps the client-side TLS connectionalive until a collision has been found. Most TLS con-nections can be kept alive by sending regular warningalerts; Java clients are willing to keep the connectionopen indefinitely. Keeping the client waiting for hoursis not always practical, but we note that some TLSclients (such as git) are often used to perform long-running unsupervised connections to web services APIs,and long connection times may not be noticed. In anycase, the collision search can be significantly sped up.

Once the collision has been found, A connects to Sto completes the attack and is able to impersonate Cat S and read and write data that only C should haveaccess to. Hence, the demo shows that A is able to breakTLS 1.2 client authentication between mainstream TLSclients and servers. The precise transcript collision traceis too long to include with this paper but will be madeavailable on a website linked to the final version of thispaper.

C. Breaking TLS 1.3 Server Authenticationusing a Chosen-Prefix Transcript Collision

The key to our attack above on TLS 1.2 clientauthentication is that the client is willing to sign the hashof the full message log, and the format of various TLSmessages is flexible enough to allow the attacker to stuff

Fig. 8. Man-in-the-middle server signature forwarding attack on TLS1.3. The client C tries to connect to S, but the attacker intercepts thisconnection and uses a chosen-prefix collision on the server signaturetranscript to impersonate the server at the client.

meaningless collision blocks and server-side messagesinto them. A similar attack would not work on TLS1.2 server authentication because the server signaturetranscript does not contain flexible-size elements. Theonly part of the signed value that the attacker maycontrol is the client nonce nc which is fixed-length.

This does not mean that TLS 1.2 server signaturesare designed well; by not signing enough, the serverbecomes vulnerable to various downgrade attacks suchas Logjam.9 In response to such attacks, TLS 1.3 isbeing redesigned to use the full handshake log for bothclient and server signatures.

Figure 7 illustrates the standard one-round-trip (1-RTT) message flow in the current draft (version 7) ofthe TLS 1.3 specification. In comparison to TLS 1.2,this protocol introduces three new messages and flipsthe order in which the key exchange messages are sent.Immediately after the client hello, the client sends aclient key share message CKS that contains a list of keyshares gx, go, . . ., one for each Diffie-Hellman group(and elliptic curve) that the client wishes to support.For example, a client may offer key shares for 2048-bit, 4096-bit and 8192-bit groups, allowing the serverto choose which size group it wants to use.

9http://weakdh.org

11

http://weakdh.org

In response, the server sends its server hello fol-lowed by a server key share SKS message that containsits share gy for one of these groups. Unlike the TLS 1.2server key exchange, SKS does not contain a signature.Instead, the server sends a new server certificate verifymessage SCV after the certificate request SCR thatincludes a signature over the hash of the full messagelog up to this point (log2). Another departure fromTLS 1.2 is that all handshake messages after SKS areencrypted, in order to protect the privacy of the clientand server certificates from passive attackers.

We demonstrate a chosen-prefix transcript collisionon the TLS 1.3 server signature (Figure 8), similar tothe attack on TLS 1.2 client authentication. The client Cwants to connect to S, but its messages are interceptedby a network attacker A. After A receives the client’sCKS, it responds with its own hello SH′, server keyshare SKS′(gy

′), and forwards the real server’s certifi-

cate SC(certS). Since A knows y′, it now knows thehandshake encryption keys on the client-side.

At this point, A connects to S by sending a clienthello CH. The next messages on the two connectionswill be a CKS′ to the server and a SCR′ to the client.To compute these messages, A finds a chosen-prefixcollision C1, C2 of length N bytes each such that C1

appears within the last distinguished name of SCR′ andC2 appears as the last key share in CKS′:

hash(CH|CKS|SH′ |SC|SKE′ |SCR′(C1 |−︸︷︷︸dn′

))

= hash(CH|CKS(gx′, C2︸︷︷︸

go′

))

We assume that the server S will choose the first groupin CKS′ and ignore the second key share go

′. We also

assume that the server S will respond to CH withmessages SH, SKS, and SC of known length M .

In SCR′, we set the length of dn′ to N + M sothat there is room for M more bytes after C1. OnceA receives SH, SKS, and SC from S, it stuffs thesemessages within its certificate request SCR′ to C. Atthis point, the handshake logs at the client logc2 and atthe server logs2 have the same hash, due to the lengthextension property of the hash function.

hash(CH|SH′ |SKS′ |SC|SCR′(C1 |SH|SKS|SC︸︷︷︸

dn′

))

= hash(CH|CKS(gx′, C2︸︷︷︸

go′

)|SH|SKS|SC)

Since the transcripts collide, A can forward S’s signa-ture in the SCV to C, which will accept the signature.A will need to decrypt and reencrypt the SCV, but itcan do so because it knows the encryption keys on both

connections. A then completes the handshake with C bysending its own finished message SFIN and acceptingC’s certificate and finished messages. A does not needits connection with S any more; it has successfullyhijacked C’s connection to S.

Implementing a Proof-Of-Concept As with clientauthentication, the complexity of the attack dependson the hash algorithm being used. Like TLS 1.2, thecurrent draft of TLS 1.3 allows RSA-MD5 signatures.However, since TLS 1.3 is not yet implemented, westudied TLS 1.2 servers in order to estimate whetherreal-world servers would be willing to sign their SKEmessages with RSA-MD5. Internet-wide scans showthat about 29% of the Alexa top 1 million websitessupport RSA-MD5 signatures.10 This subset includespopular websites like microsoft.com and paypal.com.

A second question is whether TLS clients wouldaccept RSA-MD5 signatures. Most popular web serversand TLS libraries do not offer RSA-MD5 as one ofthe supported signature algorithm in the client hello.This might lead one to believe that they would notaccept RSA-MD5 server signatures. However, we foundand reported security bugs in NSS (the library used byFirefox and some versions of Chrome) and in GnuTLS(commonly used in curl and git). These libraries (andapplications that rely on them) incorrectly accept RSA-MD5 signatures even though they claim not to supportthem. For example, Firefox will accept an RSA-MD5signature from a website, even though it is not supposedto. Furthermore, non-web applications such as Javaclients and servers routinely offer and accept RSA-MD5 signatures. Consequently, we believe that if TLS1.3 were to be implemented today, it is quite likelythat its clients and servers would support RSA-MD5signatures and therefore be vulnerable to our chosen-prefix transcript collision attack.

We wrote a proof-of-concept attack demo based onour own simple prototype implementation of TLS 1.3that signs with RSA-MD5. As with TLS 1.2 client au-thentication, we used HashClash to compute the chosen-prefix collision in roughly 5 hours on an Amazon GPUinstance, and demonstrated that a man-in-the-middlecould impersonate the server as shown in Figure 8.

D. Breaking the tls-unique Channel Bindingusing a Generic Transcript Collision

Suppose an application-level authentication protocolat C binds its login credential to the tls-uniquechannel binding [1], so that when the credential issent from C to A, it cannot be used by A at S. Wedemonstrate how the attacker A could use a genericcollision to break this protection.

10https://securitypitfalls.wordpress.com/2015/06/20/may-2015-scan-results/

12

microsoft.com

paypal.com

https://securitypitfalls.wordpress.com/2015/06/20/may-2015-scan-results/

https://securitypitfalls.wordpress.com/2015/06/20/may-2015-scan-results/

Fig. 9. Man-in-the-middle credential forwarding attack ontls-unique channel binding. The attack uses a transcript collisionto impersonate the client to the server.

Figure 9 depicts the attack. It follows the generalpattern of the TLS 1.2 client authentication attack,except that it relies on a collision on the transcriptMAC in the client finished message, rather than acollision in the hash function. The client C connectsto the MitM A who then opens a new connection to S.The attacker sends a SKE′ to C that contains a bogusgroup (k2 − 1, k), thereby forcing the client to sendkx mod (k2 − 1) = k in its client key exchange CKE.On the server side, the attacker can send its own CKE′

containing any Diffie-Hellman value. Hence, the MitMknows the master secrets msc,mss and connection keyson both connections.

The goal of the attacker is to make sure thatthe contents of the client finished message (i.e. thetls-unique) coincide on both connections:

mac96(msc, logc2) = mac96(ms

s, logs2)

To accomplish this transcript collision, the attacker care-fully crafts the certificate request SCR′ on the client-sideand the NPN message on the server side. In particular,it computes a generic collision (C1, C2) of length Nbytes each such that C1 is the last distinguished namein SCR′ and C2 is the padding in the NPN message(after the protocol name “http/1.1”):

mac96(msc,CH|SH|SC′ |SKE′ |SCR′( C1︸︷︷︸

dn

)|

SHD|CC|CKE) =mac96(ms

s,CH|SH|SC|SKE|SHD|CKE|NPN(‘‘http/1.1’’|C2︸︷︷︸

exn

))

Note that we have carefully set things up so that theattacker can compute the collision as late as possible,that is, before sending the SCR′ and NPN messages,at which point all other messages in the transcriptare already fixed. Once this collision is found, theMitM sends these two messages on the correspondingconnections and completes the handshakes. A can thenimpersonate C at S by forwarding any application-levelchannel-bound credentials sent by C (for A) to S.

Implementing a Proof-Of-Concept We implementeda man-in-the-middle attacker to demonstrate the attack.We used an OpenSSL client as C and the main Googlewebsite as S, since this website supports the next-protocol-negotiation protocol extension. After receivingthe client hello CH from the client and the server hellodone SHD from the server-side, the MitM runs a genericcollision search to compute SCR′ and NPN.

For the collision search, we implemented the TLSPRF mac96 function using the CUDA framework forNVIDIA GPUs. In TLS versions up to 1.1, this con-struction is built using MD5 and SHA-1; in TLS 1.2the construction uses SHA-256. However, the strengthof the hash function is immaterial because what we areattacking is the truncated 96-bit MAC. The underlyinghash function does not matter. Following the analysisexplain in Section III, it should require about 248

computations on average to get a collision.

For our demo, we used the TLS 1.1 PRF becauseit was faster than the TLS 1.2 PRF. After optimizingthe block alignment of the various elements, the com-putation of a Finished message required the evaluationof 6 MD5 compression functions, and 6 SHA1 com-pression functions per attempted collision block. TheCUDA implementation took 20 days to find a collision,using four Tesla K20Xm GPUs (2688 CUDA coreseach, with compute capability 3.5). Our demonstrationevaluated about 249.9 Finished messages before findingthis collision, which was rather unlucky, since it shouldtake half that time on average. We estimate that thiscollision would take about 20 days on a Amazonlarge GPU instance. We note that the generic collisionis completely parallelizable (even more so than thechosen-prefix collision) and hence the time for findinga collision can be brought down to an arbitrarily smallnumber by throwing enough computational power at it.Our demo only shows that finding such collisions isfeasible, even with stock GPUs. We plan to publish thecolliding transcripts with the final version of this paper.

Once the collision is found, the MitM can completethe handshake and forward any token bound credentialssent by the client to the Google website, which willaccept it, thereby defeating the protection offered bythe tls-unique channel binding.

13

Fig. 10. Man-in-the-middle downgrade attack on DTLS. The attackeruses a transcript collision to downgrade the connection to weakciphers.

Truncated HMAC is not collision-resistant Amore general lesson to be taken from our attack ontls-unique is that there are many uses of HMACin cryptographic protocols that are not protected fromcollisions in the underlying hash function. For example,although HMAC-MD5 may be a good MAC, it is notcollision-resistant when the key is known to the attacker.Similarly, when HMAC-SHA256 is truncated to 96 bits,it may still be a good MAC, but it is certainly not agood hash function (since collisions can still be foundin 248 steps). Consequently, when inspecting the useof hash functions in Internet protocols, it would be amistake to assume that all uses of HMAC are safe;it is important to look both at the mechanism and itsintended security goal. In some cases, we may needHMAC to be both a MAC and a good hash function.At least in the context of TLS, our results on the useof HMAC may seem surprising to many. Our attackappears to direct contradict, for example, the analysisof the Finished message in [3].

A similar lesson holds for cryptographic proofs.A similar attack to the tls-unique attack aboveapplies to the TLS renegotiation indication counter-measure (but with 296 work) if we assume that theTLS PRF may have collisions. However, the proof ofthis countermeasure in [11] fails to assume that thePRF is collision-resistant even when the attacker knowsthe master secret. This assumption is necessary for theproof, and in private communication, the authors haveacknowledged that this needs to be fixed.

E. Downgrading DTLSusing a Common-Prefix Transcript Collision

The DTLS protocol [33] uses a cookie-based mech-anism as a protection against denial-of-service attacks.

In response to a client hello CH, the server can send ahello verify request message HVR containing a cookie.The client is meant to restart the handshake by sendingthe exact same client hello message but with this cookieincluded in it. Since the HVR is not authenticated, thecookie field offers a network attacker a venue to injectarbitrary data into the transcript. We found a downgradeattack that exploits this feature. We do not detail it herefor lack of space but it is similar to the cookie-basedattack on IKEv2 in the next section.

The attack is depicted in Figure 10. The networkattacker A finds a collision between two cookies C1

and C2 such that the hash of the client hello from Cmatches the hash of the tampered client hello sent by theattacker to S. Consequently, the attacker can downgradethe connection to use an old protocol version or aweak ciphersuite, or it can disable important protocolextensions. This is a chosen-prefix collision, but we notethat the prefix before the colliding bitstrings C1 andC2 is the same. Therefore, the attacker can rely on acommon-prefix collision to efficiently compute C1 andC2 and since the first two bytes (containing the cookielength) of these bitstring is likely to be different, we canuse the different lengths to delete and hence downgradeciphersuites and protocol extensions. We elide detailsof this attack for lack of space, but it is similar to theattack on IKEv2 in the next section.

F. Predictable Nonces Break TLS 1.2 Server Authenti-cation

Although TLS 1.2 server signatures are not vulner-able to collisions, servers that use predictable servernonces ns can be attacked. For example a recent TLSvariant called Snap-Start [22] allows the client to choosethe server nonce. Suppose a TLS server were to agreeto sign a (EC)DHE key exchange where all valuesexcept the client nonce were predictable. For example,the server nonce may be chosen by the client andthe Diffie-Hellman group and public value may bereused. In this case, a malicious client can compute ageneric collision on the server signature transcript toimpersonate the server to any client. For RSA-MD5signatures (supported by 29% of popular websites), thisattack would take 264 MD5 hashes. We do not knowof any deployed variant or implementation of TLS 1.2that exhibits this attack, but it serves to warn protocoldesigners against tampering with the use of stronglyrandom nonces in the protocol without taking due care.

V. TRANSCRIPT COLLISIONS IN IKE AND SSH

Although the bulk of this paper has focused on colli-sions in TLS, similar attacks apply to other mainstreamprotocols like IKEv1, IKEv2, and SSH. We describetwo exemplary attacks here.

14

Fig. 11. IKEv2: A mutually-authenticated key exchange. Messageparts colored in red can have arbitrary length.

Fig. 12. Man-in-the-middle initiator impersonation attack on IKEv2.

A. Breaking IKEv2 Initiator Authenticationusing a Common-Prefix Transcript Collision

Figure 11 depicts the IKEv2 authenticated key ex-change protocol, which is similar to the SIGMA’ proto-col discussed in Section II. The initiator first sends anSA_INIT request containing its Diffie-Hellman valuegx, nonce ni, and proposed cryptographic parametersSAi, infoi. The responder typically replies with its ownpublic value gy , nonce nr and parameters SAr, infor.Alternatively, the responder may send a cookie ck,thereby asking the initiator to restart the protocol bysending the same SA_INIT request but with ck in-cluded in it (typically as the first payload).

After the SA_INIT exchange, the initiator andresponder authenticate each other by signing a portionof the message transcript. Notably the initiator signs:

hash([ck |SAi |gx |ni |infoi]︸︷︷︸m1

|nr |mac(ki,IDi))

Figure 12 depicts an attack on IKEv2 initiator au-thentication that relies on a transcript collision on thissignature. The network attacker intercepts the SA_INITrequest from I to R and responds with a cookie ck. Theinitiator I restarts the key exchange by including ck inthe new SA_INIT request (m1). However, the attacker

has chosen ck in a way that the hash of m1 is thesame as the hash of a tampered SA_INIT request m′1that contains the attacker’s Diffie-Hellman public valuegx

′. The attacker sends this tampered request m′1 to

the responder and upon receiving a response, it tamperswith the response to replace R’s Diffie-Hellman key gy

with its own key gy′. Note that the attacker does not

tamper with the nonces ni, nr.

At this point, the attacker knows the shared secretsgx

′y, gxy′

and encryption keys on the two connections.Moreover the hash used in the signature transcriptcollides all the way to the mac(ki,IDi). To completethe attack, the attacker must ensure that ki is that sameat I and R. It can ensure this by choosing x′, y′ suchthat gx

′y = gxy′

(as discussed below). Thereafter, it canforward I’s signature to R and hence impersonate I .

Implementing the Attack To implement the attack,we must first find a collision between m1 and m′1.We observe that in IKEv2 the length of the cookie issupposed to be at most 64 octets but we found that manyimplementations allow cookies of up to 216 bytes. Wecan use this flexibility in computing long collisions.

The attacker finds two length-prefixed bitstrings(C1, C2) of N bytes each such that

hash(C1 |−︸︷︷︸ck

) = hash( C2︸︷︷︸ck′

)

where the length of ck is set to N +M , that is, ck hasM empty bytes ready to fill in. Further, let’s assumethat M is the length of the bitstring SA′i|gx

′ |ni that theattacker wants to send to R in its tampered SA_INITrequest. The idea is that the attacker can now stuff thetampered message into ck, and can stuff the originalmessage into info′i to obtain a transcript collision:

hash([C1 |SA′i |gx

′|ni]︸︷︷︸

ck

|SAi |gx |ni |infoi

︸︷︷︸m1

|−) =

hash([C2]︸︷︷︸ck

|SA′i |gx

′|ni |SAi |gx |ni |infoi︸︷︷︸

info′i︸︷︷︸

m1

|−)

The collision (C1, C2) can be found easily as achosen-prefix collision attack. Since the collision occursbefore any unpredictable value has been included in themessage, it can be computed offline; that is, it does nothave to be computed while a connection is live. Suchcollisions are easy to compute for MD5, but even thoughMD5 signatures are allowed by the standard, they arenot supported by the IKEv2 implementations we lookedat. However, all IKEv2 implementations must supportSHA-1, so collisions in SHA-1 would break initiatorauthentication in most IKEv2 deployments.

15

We also observe that the only difference in thetwo prefixes is the length of the cookie; even a singlebit difference in the length is enough to mount theattack. Hence, what we need is almost a common-prefixcollision attack. We believe it should be possible todesign efficient algorithms to find such collisions, butour research in this direction is ongoing.

The final step to implement the attack is to ensurethat gxy

′= gx

′y . To achieve this, we rely on a smallsubgroup confinement attack. To see a simple example,suppose the attacker chose x′ = y′ = 0; then the twoshared secrets would have the value 1. This specific so-lution would not work in practice because most IKEv2implementations validate the received Diffie-Hellmanpublic value to ensure that it is larger than 1 and smallerthan p − 1. However, many IKEv2 implementationssupport the Diffie-Hellman groups 22-24 that are knownto have many small subgroups. These implementationsdo not validate the incoming public value, and henceare susceptible to similar small subgroup confinementattacks, as discussed in [5]. To complete our transcriptcollision attack, the MitM can use one such smallsubgroup to ensure that the shared values on the twoconnections are the same with high probability.

Attacks on IKEv1 We do not detail attacks on IKEv1for lack of space, but since IKEv1 uses MD5, it isalso vulnerable to target collision attacks. In particular,the initiator’s signature in IKEv1 is over the HMAC-MD5 of several protocol fields, some of which can betampered by the attacker. We found a generic transcriptcollision attack on this HMAC-MD5 value that allowsa signature forwarding attack whereby the attacker canimpersonate the initiator at the responder.

B. Downgrading SSH-2with a Chosen-Prefix Transcript Collision

Figure 13 depicts the SSH-2 [40] protocol, whichimplements yet another variation of an authenticatedDiffie-Hellman protocol. The client and server exchangeidentification strings Vc, Vs, negotiate protocol param-eters Ic, Is, and perform a Diffie-Hellman exchangegx, gy . To authenticate the exchange, clients and serverssign a session hash, defined as:

H = hash(Vc |Vs |Ic |Is |pkS |gx |gy |gxy)

We show that a target collision on this hash value canallow downgrade attaks.

Figure 14 depicts a downgrade attack on SSH-2.The network attacker tampers with the key exchangemessage Ic in one direction and with Is in the other. Itchooses their values in a way such that the following

Fig. 13. SSH-2: Key exchange and user authentication.

Fig. 14. Man-in-the-middle downgrade attack on SSH-2. The attackuses a transcript collision to downgrade the connection to weakciphers.

hashes coincide

hash(Vc |Vs |Ic |C1 |−︸︷︷︸I′s

) = hash(Vc |Vs | C2︸︷︷︸I′c

)

Using this collision, we leave enough space empty inI ′s to stuff the real Is inside. Consequently the sessionhashes on the two sides coincide and the connectionis completed. In this attack, the MitM does not tamperwith the Diffie-Hellman values and hence it does notknow the connection keys. However, it manages totamper with both Ic and Is, and can therefore down-grade the negotiate ciphersuite to a weak cryptographicalgorithm that the attacker knows how to break.

Implementing the target collision for SSH-2 requires

16

a chosen-prefix attack on SHA-1 which is still consid-ered impractical (at least 277 work). Moreover, since thetwo tampered fields Ic and Is are meant to be strings(not bitstrings), we cannot use arbitrary collisions. Still,we find this attack to be an interesting illustration ofthe use of transcipt collisions for downgrade attacks.

SSH GEX Client Impersonation Collision-based im-personation attacks on SSH-2 are difficult to mountdue to the peculiarity of the session hash construction.For example, by placing the shared secret gxy at theend, a number of obvious attacks become difficult toimplement. Although it makes collisions more difficult,we note that this construction is not particularly secure;since it includes the shared secret the session hashneeds to be non-leaking in addition to being collision-resistant [4]. Moreover, if the SSH server reuses itsDiffie-Hellman public value, this secret-suffix MACconstruction becomes vulnerable to key recovery attackslike the one on APOP [24].

However, other variations of SSH allow for moretampering, enabling new attacks. The SSH Diffie-Hellman Group Exchange protocol [10] allows SSHservers to choose any Diffie-Hellman group for use inthe key exchange. Hence, like in our TLS attacks, aman-in-the-middle attacker can send a bogus or weakgroup to the client—one in which the discrete logproblem is easy to solve. In this case, a new chosen-prefix transcript collision attack on the session hashallows an attacker to mount a user impersonation attackby forwarding the user’s signature over a differentconnection. Such attacks are particularly useful for SSH,because users typically have the same public key at mul-tiple servers, some of which may be more trustworthythan others. However, as with the downgrade attack, theattack relies on computing a SHA-1 collision, which isstill an open problem in the academic literature.

VI. CONCLUSIONS

Table II summarizes the attacks discussed in thispaper. Notably, we have found a number of unsafeuses of MD5 in various Internet protocols, yieldingexploitable chosen-prefix and generic collision attacks.We also found several unsafe uses of SHA-1 that willbecome dangerous when more efficient collision-findingalgorithms for SHA-1 are discovered. In all cases, thecomplexity of the target collision attack is significantlylower than the estimated work for a second preimageattack on the underlying hash function. This definitivelysettles the debate on whether the security of mainstreamcryptographic protocols depend on collision resistance.The answer is yes, cryptographers were right. Exceptin very few cases, mainstream protocols do requirecollision resistance for protection against man-in-the-middle transcript collision attacks.

Our attacks allow downgrades as well as client andserver impersonation against real-world protocol imple-mentations. In response to our reports, Mozilla NSSand GnuTLS are disabling RSA-MD5 signatures fromTLS. The TLS 1.3 protocol (draft 7) removed the use oftruncated HMAC for finished messages (based on ourrecommendation) and the use of RSA-MD5 signatureswill be removed from the next draft (in response todiscussions in the working group.) We are reaching outto various TLS websites that support RSA-MD5, andwe will report in more detail on our disclosure andits impact in the final version of this papaer. We arealso in the process of reaching out to designers andimplementers of Token Binding, FIDO, IPsec and SSH.

We have focused on transport-layer protocols, butsimilar collisions likely occur in other authenticatedkey exchange protocols (OTR, TextSecure, Tor, etc.)and in application-level authentication protocols (EAP,SASL, etc.). We encourage protocol designers and im-plementers to inspect their protocols for all uses ofhash functions. We strongly recommend that weak hashfunctions like MD5 and SHA-1 should not just bedeprecated; they should be forcefully disabled.

REFERENCES

[1] J. Altman, N. Williams, and L. Zhu. Channel bindings for TLS.IETF RFC 5929, 2010.

[2] M. Bellare. New proofs for NMAC and HMAC: securitywithout collision-resistance. In CRYPTO 2006, pages 602–619,2006.

[3] S. Bellovin and E. Rescorla. Deploying a new hash algorithm.In NDSS, 2006.

[4] F. Bergsma, B. Dowling, F. Kohlar, J. Schwenk, and D. Stebila.Multi-ciphersuite security of the secure shell (ssh) protocol.In Proceedings of the 2014 ACM SIGSAC Conference onComputer and Communications Security, CCS ’14, pages 369–381, New York, NY, USA, 2014. ACM.

[5] K. Bhargavan, A. Delignat-Lavaud, and A. Pironti. Verifiedcontributive channel bindings for compound authentication. InNDSS, Feb 2015.

[6] K. Bhargavan, A. D. Lavaud, C. Fournet, A. Pironti, and P.-Y. Strub. Triple handshakes and cookie cutters: Breaking andfixing authentication over TLS. In IEEE S&P (Oakland), 2014.

[7] I. B. Damgard. A design principle for hash functions. InCRYPTO’89, pages 416–427. Springer, 1990.

[8] T. Dierks and E. Rescorla. The Transport Layer Security (TLS)Protocol Version 1.2. IETF RFC 5246, 2008.

[9] T. Dierks and E. Rescorla. The Transport Layer Security (TLS)Protocol Version 1.3. Internet Draft, 2014.

[10] M. Friedl, N. Provos, and W. Simpson. Diffie-Hellman GroupExchange for the Secure Shell (SSH) Transport Layer Protocol.IETF RFC 4419, 2006.

[11] F. Giesen, F. Kohlar, and D. Stebila. On the security of TLSrenegotiation. In ACM CCS, 2013.

[12] D. Gillmor. Negotiated Finite Field Diffie-Hellman EphemeralParameters for TLS. Internet Draft, 2015.

[13] S. Halevi and H. Krawczyk. Strengthening digital signaturesvia randomized hashing. In CRYPTO, pages 41–59, 2006.

17

Protocol Property Mechanism Attack Collision Type Work Preimage GPU timeTLS 1.2 Client Auth RSA-MD5 Impersonation Chosen Prefix 239 2128 5hTLS 1.3 Server Auth RSA-MD5 Impersonation Chosen Prefix 239 2128 5hTLS 1.* Channel Binding HMAC (96 bits) Impersonation Generic 248 2256 80dDTLS 1.0 Handshake Integrity MD5|SHA-1 Downgrade Common Prefix 267 2160

TLS 1.1 Handshake Integrity MD5|SHA-1 Downgrade Chosen Prefix 277 2160

IKE v2 Initiator Auth RSA-MD5 Impersonation Common/Chosen Prefix 216 − 239 (offline) 2128 1s-5hIKE v2 Initiator Auth RSA-SHA-1 Impersonation Common/Chosen Prefix 261 − 267 (offline) 2160 20 years (est.)IKE v1 Initiator Auth HMAC-MD5 Impersonation Generic 264 2128 140 years (est.)SSH-2 Integrity SHA-1 Downgrade Chosen Prefix 277 2160

SSH GEX User Auth RSA-SHA-1 Impersonation Chosen Prefix 277 2160

TABLE II. SUMMARY OF TRANSCRIPT COLLISION ATTACKS ON INTERNET PROTOCOLS

[14] B. Hill, D. Baghdasaryan, B. Blanke, R. Lindemann, andJ. Hodges. FIDO UAF Application API and Transport BindingSpecification v1.0. Draft Specification, 2015.

[15] P. Hoffman. Use of Hash Algorithms in Internet Key Exchange(IKE) and IPsec. IETF RFC 4894, 2007.

[16] P. Hoffman and B. Schneier. Attacks on Cryptographic Hashesin Internet Protocols. IETF RFC 4270, 2005.

[17] T. Jager, F. Kohlar, S. Schage, and J. Schwenk. On the securityof TLS-DHE in the standard model. In CRYPTO, 2012.

[18] A. Joux. Multicollisions in iterated hash functions. applicationto cascaded constructions. In M. K. Franklin, editor, CRYPTO,volume 3152 of Lecture Notes in Computer Science, pages 306–316. Springer, 2004.

[19] D. Knuth. Seminumerical algorithms, volume 2 of the art ofcomputer programming, 1981.

[20] H. Krawczyk. Sigma: The sign-and-macapproach to authen-ticated diffie-hellman and its use in the ike protocols. In Ad-vances in Cryptology-CRYPTO 2003, pages 400–425. Springer,2003.

[21] H. Krawczyk, K. G. Paterson, and H. Wee. On the security ofthe TLS protocol: A systematic analysis. In CRYPTO, 2013.

[22] A. Langley. Transport Layer Security (TLS) Snap Start. InternetDraft, 2010.

[23] A. Langley. Transport Layer Security (TLS) Next ProtocolNegotiation Extension. Internet Draft, 2012.

[24] G. Leurent. Practical key-recovery attack against apop, an md5-based challenge-response authentication. IJACT, 1(1):32–46,2008.

[25] N. Mavrogiannopoulos, F. Vercauteren, V. Velichkov, andB. Preneel. A cross-protocol attack on the TLS protocol. InACM CCS, 2012.

[26] F. Mendel, C. Rechberger, and M. Schlaffer. MD5 is weakerthan weak: Attacks on concatenated combiners. In M. Matsui,editor, ASIACRYPT, volume 5912 of Lecture Notes in ComputerScience, pages 144–161. Springer, 2009.

[27] A. Menon-Sen, N. Williams, A. Melnikov, and C. New-man. Salted Challenge Response Authentication Mechanism(SCRAM) SASL and GSS-API Mechanisms. IETF RFC 5802,2010.

[28] R. C. Merkle. A certified digital signature. In Advances inCryptologyCRYPTO89 Proceedings, pages 218–238. Springer,1990.

[29] J. M. Pollard. A monte carlo method for factorization. BITNumerical Mathematics, 15(3):331–334, 1975.

[30] J. M. Pollard. Monte carlo methods for index computation.Mathematics of computation, 32(143):918–924, 1978.

[31] A. Popov, M. Nystroem, D. Balfanz, and A. Langley. TheToken Binding Protocol Version 1.0. Internet Draft, 2015.

[32] B. Preneel and P. C. van Oorschot. On the security of two MAC

algorithms. In U. M. Maurer, editor, Advances in Cryptology -EUROCRYPT ’96, International Conference on the Theory andApplication of Cryptographic Techniques, Saragossa, Spain,May 12-16, 1996, Proceeding, volume 1070 of Lecture Notesin Computer Science, pages 19–32. Springer, 1996.

[33] E. Rescorla and N. Modadugu. Datagram Transport LayerSecurity Version 1.2. IETF RFC 6347, 2012.

[34] M. Stevens. Project hashclash. https://marc-stevens.nl/p/hashclash/.

[35] M. Stevens. Counter-cryptanalysis. In R. Canetti and J. A.Garay, editors, CRYPTO, volume 8042 of Lecture Notes inComputer Science, pages 129–146. Springer, 2013.

[36] M. Stevens. New collision attacks on SHA-1 based on optimaljoint local-collision analysis. In T. Johansson and P. Q.Nguyen, editors, EUROCRYPT, volume 7881 of Lecture Notesin Computer Science, pages 245–261. Springer, 2013.

[37] M. Stevens, A. K. Lenstra, and B. de Weger. Chosen-prefixcollisions for MD5 and applications. IJACT, 2(4):322–359,2012.

[38] P. C. van Oorschot and M. J. Wiener. Parallel collision searchwith cryptanalytic applications. J. Cryptology, 12(1):1–28,1999.

[39] X. Wang and H. Yu. How to break MD5 and other hashfunctions. In R. Cramer, editor, EUROCRYPT, volume 3494of Lecture Notes in Computer Science, pages 19–35. Springer,2005.

[40] T. Ylonen and C. Lonvick. The Secure Shell (SSH) TransportLayer Protocol. RFC 4253 (Proposed Standard), 2006.

18

https://marc-stevens.nl/p/hashclash/

https://marc-stevens.nl/p/hashclash/

Documents

Transcript Collision Attacks: Breaking Authentication in TLS, IKE, … · 2015. 11. 1. · from the draft version of the TLS 1.3 protocol. Outline Section II introduces transcript