4 Digital Signature and Hash Functionsshodhganga.inflibnet.ac.in/bitstream/10603/184906/... · MD5 or any Secure Hash Algorithm (SHA) variant and present a new digital signature technique

Page 90

4

Digital Signature and Hash

Functions

�!�� +,��-.��

�/��0��*��(�� "�*��"4��

Digital Signature technique is widely being used to detect unauthorized

modification to data and to authenticate the identity of the signatory.

Applications for digital signatures range from digital certificates for secure e-

commerce to legal signing of contracts to secure software updates. Together

with key establishment over insecure channels, they form the most important

instance for public-key cryptography. [12] It is essential for secure transaction

over unsecure/ open networks. Digital Signature schemes are mostly used in

cryptographic protocols to provide services like entity authentication,

authenticated key transport and key agreement. [88]

Digital signatures share some functionality with handwritten signatures.

In particular, they provide a method to assure that a message is authentic to

one user, i.e., it in fact originates from the person who claims to have

generated the message. However, they actually provide much more

functionality. [80,105,106]

4.1.1 Introduction and need for digital signature

For years, people have been using various types of signatures to

associate their identities to documents. In the middle ages, a nobleman sealed

a document with a wax imprint of his insignia. The assumption was that the

Digital Signature and Hash Functions

Page 91

noble was the only person able to reproduce the insignia. In modern

transactions, credit card slips are signed. The salesperson is supposed to verify

the signature by comparing with the signature on the card. With the

development of electronic commerce and electronic documents, these methods

no longer suffice. [107,108] In this fast-paced technological world the

importance of information and communication systems is escalating with the

increasing significance and quantity of data that is transmitted to minimize

operational cost and provide enhanced services [80,75]. Unfortunately the

vulnerability of systems and data are highly rising due to variety of threats,

such as unauthorized access and use, destruction, alteration and

misappropriation. Cryptography is the foundation of all data as well as

information security aspects. Internet has changed the way of business and

transactions as it allows the exchange of information more quickly and easily.

Digital Signature is also a cryptographic protocol. [7,74,71]

Digital signature software helps organizations sustain signer

authenticity, accountability, data integrity, and non-repudiation of documents

and transactions as shown Figure 14.

Figure 14: Security Advantages of Digital Signatures

��

��

��

��

��

��

��

��


Page 92

The crypto schemes had two main goals: either to encrypt data (e.g.,

with AES, 3DES or RSA encryption) or to establish a shared key (e.g., with

the Diffie–Hellman or elliptic curve key exchange). One might be tempted to

think that we are now in a position to satisfy any security needs that arise in

practice. However, there are many other security needs besides encryption and

key exchange, which are in fact termed security services. [12] These security

services are [108]:

• Signer authentication: If public and private keys are associated with

an identified signer, the digital signature attributes the message to the

signer. The digital signature cannot be forged, unless the signer loses

control of the private key.

• Message authentication: Digital signature identifies the signed

message with far greater certainty and precision than paper signatures.

Verification reveals any tempering since the comparison of hash result

shows whether the message is the same as when signed.

• Non-repudiation: Creating a digital signature requires the signer to use

his private key. This alters the signer that he is consummating a

transaction with legal consequences, decreasing the chances of

litigation later on.

• Integrity: Digital signature creation and verification processes provide

a high level of assurance that the digital signature is that of the signer.

Compared to tedious and labor intensive paper methods, such as

checking signature cards, digital signatures yield a high degree of

assurance without adding resources for processing.

Electronic transactions are gaining in importance as nations around the

globe because of significantly reducing the need for paper documentation

while providing the opportunity for tremendous efficiency and productivity

gains. As a result, digital signatures are poised to enter the mainstream as

primary vehicle for establishing trust for a wide variety of electronic


Page 93

transactions. [73,24] The information handled in electronic transactions is

valuable and sensitive and must be protected against tampering by malicious

third parties (who are neither the senders nor the recipients of the

information). Sometimes, there is a need to prevent the information or items

related to it (such as date/time it was created, sent and received) from being

tampered with by the sender and/or the recipient. [80]

A digital signature is a checksum which depends on the time period

during which it was produced. [71] It is computed using a set of rules and a set

of parameters such that the identity of the signatory and integrity of the data

can be verified. [86]

Basically, the idea behind digital signatures is the same as handwritten

signature which are traditionally used to validate and authenticate paper

documents. A digital signature is an electronic analogue of a written signature;

the digital signature can be used to provide assurance that the claimed

signatory signed the information. A major difference between handwritten and

digital signature is that a digital signature cannot be constant; it must be a

function of the document that is sign. [88] It is used to authenticate the fact

that one promised something that he can’t take back later. For electronic

documents, a similar mechanism is necessary. Digital signatures, which are

nothing but a string of ones and zeroes generated by using a digital signature

algorithm, serve the purpose of validation and authentication of electronic

documents. Validation refers to the process of certifying the contents of the

document, while authentication refers to the process of certifying the sender of

the document. [80] In addition, a digital signature may be used to detect

whether or not the information was modified after it was signed i.e. to detect

the integrity of the signed data. [109] In� �� situation where there is not

complete trust between sender and� receiver, something more than

authentication is needed and the most�attractive solution for this problem is

the digital signature. Mainly digital signature is use in e-mail, electronic data


Page 94

interchange, software distribution, and other applications that require data

integrity assurance and data origin authentication. The wireless protocols, like

HiperLAN/2 [110], and WAP [111], have specified security layers and the

digital signature algorithm have been applied for the authentication purposes.

[88]

It must have some salient features such as verify the author and the

date and time of signature; authenticate the contents at the time of signature;

must be verifiable by third parties, to resolve disputes; it must be a bit pattern

that depends on the message of being signed; must use some information

unique to sender, to prevent both forgery and denial; must be relatively easy to

produce to recognize and verify digital signature but computationally

infeasible to forge it and must have legitimate concern [73].

A digital signature can also be used to verify that information has not

been altered after it was signed. A digital signature is an electronic signature to

be used in all imaginable type of electronic transfer. Digital signature

significantly differs from other electronic signatures in term of process and

results. These differences make digital signature more serviceable for legal

purposes [73].

Digital signatures are based on mathematical algorithms which

includes a signature generation process and a signature verification process. A

signatory uses the generation process to generate a digital signature on data; a

verifier uses the verification process to verify the authenticity of the signature.

These require the signature holder to have a key-pair (one private and one

public key) for signing and verification. [7,24,109]

Basic idea of digital signatures is each signer has a unique key called

private key. There is also other part of key called public key. Whenever singer

has to authenticate a document it creates a bit string called signature by

applying his private key on the message or some hashed image of message as


Page 95

shown in Figure 15. User who receives this message then applies his public

key on the signature and checks the validity of the bit-string as shown in

Figure 16. If receiver is convinced that document is signed by legitimate

signer, it accepts the document. Later if there is some dispute between sender

and receiver regarding the validity of document, a third party inspects the

signature and using the public key of signer verifies the signature.

[14,104,106]

The notion of digital signatures goes back to the beginning of public-

key cryptography. In their landmark paper Whitfield Diffie and Martin

Hellman introduced the idea that someone could form a digital signature using

public-key cryptography that anyone else could verify but which no one else

could generate. [5] After it RSA [24] has become the most proven and most

popular, and achieved the widest adoption by standards bodies and in practice.

RSA digital signature scheme offers several compelling benefits, most notably

security, continued use of existing RSA keys and hardware accelerators, and

straightforward, localized software change. [73] Therefore RSA digital

signature scheme is being discussed.

It has three phases namely (1) Key Generation (2) Signature

Generation (3) Signature Verification. The Key Generation phase is the

foundation phase for it.

4.1.2 Key-pair Generation

To generate key-pair for digital signature - RSA algorithm, most widely-

used public key cryptography algorithm in the world, is used. The idea is that

it is relatively easy to multiply prime numbers but much more difficult to

factor. Multiplication can be computed in polynomial time where as factoring

time can grow exponentially proportional to the size of the numbers. The

algorithm is as follows:

a. Select p, q such that p and q both are primes and p � q.


Page 96

b. Calculate n = p x q.

c. Calculate �(n) = (p -1) x (q - 1).

d. Select integer e such that gcd(�(n),e)=1 and where 1 < e < �(n).

e. Calculate d = e-1

mod �(n). i.e. ed = 1 mod �(n)

f. Public key KU = {e, n}.

g. Private key KR = {d, n}.

The

Figure 24 shows the data flow diagram of key-pair generation.

4.1.3 Sign Generation

a. Given a message m, we apply a suitable hash function H (MD5,

SHA1 or SHA2) to obtain the hash result M = H(m).

b. To sign a message m, we use M < n to compute Signature

S = M d

(mod n) where d is the private key of the signer.

The Figure 27 shows the data flow diagram of Signature generation module in

application RSAAPP.

4.1.4 Sign Verification

a. To verify the message m, we use the digital signature S to compute

M = Se

(mod n) where e is the public key of the signer.

Message (m)

Hash

Function

(MD5,

SHA)

Message

Digest

M=H(m)

Digital

Signature

S = Md

(mod n)

Message

+

Digital

Signature

Signer’s Private Key

To V

erifier

Figure 15: Signature Generation


Page 97

b. Then we obtain M' = H(M) and compare it with M.

c. If both are same then the message is authentic otherwise it is

tempered.

The Figure 28: DFD of Digital Signature Verification Module shows the data

flow diagram of Signature verification module in application RSAAPP.

4.1.5 Digital Signature Standard

The National Institute of Standards and Technology has published

Federal Information processing standards Publications (FIPS PUBS 186-3) in

June 2009, known as digital signature standard (DSS). The DSS makes use of

MD5 or any Secure Hash Algorithm (SHA) variant and present a new digital

signature technique called the Digital Signature Algorithm (DSA) appropriate

for applications requiring a digital rather than written signature. The DSA

digital signature is a pair of large numbers represented in a computer as strings

of binary digits. The digital signature is computed using a set of rules (i.e., the

DSA) and a set of parameters such that the identity of the signatory and

integrity of the data can be verified. The DSA provides the capability to

generate and verify signatures. Signature generation makes use of a private

key to generate a digital signature. Signature Verification makes use of a

public key, which corresponds to, but is not the same as, the private key. Each

Message

+

Digital

Signature

Signer’s Private Key

Fro

m

Sig

ner

Hash

Function

(MD5,

SHA)

Message

Digest

Verify

Function

(RSA) Valid (y/n)?

Figure 16: Signature Verification


Page 98

user possesses a private and public key pair. Public keys are assumed

signatures for stored as well as transmitted data. Anyone can verify the

signature of a user by employing that user's public key. Signature generation

can be performed only by the possessor of the user's private key.

A hash function is used in the signature generation process to obtain a

condensed version of data, called a message digest (Fig. 7). The message

digest is then input to the DSA to generate the digital signature. The digital

signature is sent to the intended verifier along with the signed data (often

called the message). The verifier of the message and signature verifies the

signature by using the sender's public key. The same hash function must also

be used in the verification process. The hash function MD5 and SHA are

specified in a separate standard, MD5 standard is Internet Engineering Task

Force (IETF) Request for Comments (RFC) 1321 and the Secure Hash

Standard (SHS), FIPS PUBS 180. Similar procedures may be used to generate

and verify signatures for stored as well as transmitted data.

The DSA authenticates the integrity of the signed data and the identity of the

signatory. The DSA may also be used in proving to a third party that data was

actually signed by the generator of the signature. The DSA is intended for use

in electronic mail, electronic funds transfer, electronic data exchange, software

distribution, data storage, and other applications which require data integrity

assurance and data origin authentication.

The DSA may be implemented in software, firmware, hardware, or any

combination thereof. NIST is developing a validation program to test

implementations for conformance to this standard.

4.1.6 Future Directions

The incompatibility of the signature schemes limits interoperability.

Some steps have already been taken to address this concern. A gradual

upgrade process must be going on, which will yield the eventual benefits. The

primary area for interoperability is the padding format itself, but for full


Page 99

interoperability, additional aspects must be compatible as well, such as the

choice of hash function and the range of key sizes supported.

RSA is highly secured but “brute-force” is one of the known ways to

attack it. This attack can easily defeated by simply increasing the key size.

However, this approach can lead to a number of problems: Increased

processing time – approximately decryption time increases 8-fold as key sizes

double; Computational Overheads – the computation required performing the

public key and private key transformations and; Increased key storage

requirement – RSA key storage (private keys and public key) requires

significant amounts of memory for storage. Therefore more concentration is

needed for how key pairs are generated and how private keys are stored. These

assurance issues are the ones where industry, banking, and government

standards often need to diverge; the padding format itself need not be a cause

for incompatibility.

(�$ ��%#� 4�-"��%�

A basic component of many cryptographic algorithms is what is known

as a hash function. [107] Cryptographic hash functions cannot be thought of

outside mathematics. Hash functions are an important cryptographic primitive,

use no key and are widely used in protocols. [12] A hash function is a function

that takes some message of any length as input and transforms it into short,

fixed length bit string called a hash value, a message digest, a checksum, or a

digital fingerprint. For a particular message, the digest can be seen as the

fingerprint of a message i.e. a unique representation of a message. [84,112]

When a hash function satisfies certain non-invertibility properties, it can be

used to make many algorithms more efficient. A typical hash function takes a

variable length message and produces a fixed length hash. Given the hash, it is

impossible to find a message with that hash; in fact one cannot determine any

usable information about a message with that hash, not even a single bit. Hash

function are used to digest or condense a message down to a fixed size, which


Page 100

then be signed, in a way that makes finding other messages with the same hash

extremely difficult (so the signature would not apply easily to other messages).

In Digital signature, one important technique used is called hashing

because hashing produces a message digest that is a small and unique

representation (a bit like a sophisticated checksum) of the complete message.

Hashing algorithms are a one-way encryption, i.e. it is impossible to derive the

message from the digest. The main reasons for producing a message digest

are: (1) The message integrity being sent is preserved; any message alteration

will immediately be detected; (2) The digital signature will be applied to the

digest, which is usually considerably smaller than the message itself; (3)

Hashing algorithms are much faster than any encryption algorithm

(asymmetric or symmetric). [105]

A hash function is a function ƒ: D R, where the domain D = {0, 1}*,

which means that the elements of the domain consist of binary string of

variable length; and the range R = {0, 1}n for some n � 1, which means that

the elements of the range are binary string of fixed-length. So, ƒ is a function

which takes as input a message M of any size and produces a fixed-length

hash result h of size n. A hash function ƒ is referred to as compression

function when its domain D is finite, in other word, when the function ƒ takes

as input a fixed-length message and produces a shorter fixed-length output.

[84]

A cryptographic hash function H is a hash function with additional

security properties:

1. H should accept a block of data of any size as input.

2. H should produce a fixed-length output no matter what the length of the

input data is.

3. H should behave like random function while being deterministic and

efficiently reproducible. H should accept an input of any length, and

outputs a random string of fixed length. H should be deterministic and


Page 101

efficiently reproducible in that whenever the same input is given, H should

always produce the same output.

4. Given a message M, it is easy to compute its corresponding digest h;

meaning that h can be computed in polynomial time O(n) where n is the

length of the input message, this makes hardware and software

implementations cheap and practical.

5. Given a message digest h, it is computationally difficult to find M such

that H(M) = h. This is called the one-way or pre-image resistance property.

It simply means that one should not be capable of recovering the original

message from its hash value.

6. Given a message M1, it is computationally infeasible to find another

message M2 � M1 with H(M1) = H(M2). This is called the weak collision

resistance or second preimage resistance property.

7. It is computationally infeasible to find any pair of distinct messages

(M1, M2) such that H(M1) = H(M2). This is referred to as the strong

collision resistance property.

These properties are required in order to prevent or withstand certain

types of attacks which may render a cryptographic hash function useless and

insecure. In addition to producing a “digital fingerprint” of a message M that is

unique and to providing strong collision resistance, a cryptographic hash

function should also be highly sensitive to the smallest change in the input

message. Such that a change, as small as a single digit, in the input message

should produce a large change in the hash value of the message. Note that a

message in this context can be a binary text file, audio file, or executable

program. [84]


Page 102

The security of the hash function does not originate in keeping the hash

function itself secret but comes from its ability to produce one-way hash

values alongside with the property of being collision-free, we talk about

collision when two or more different messages results in the exact same hash

value. So far we have talked about hash functions used without a key, but hash

functions can be used with a key; both symmetric (shared key) and

asymmetric keys can be used; in which case the function is called a message

authentication code or MAC. [81,113]

4.2.1 Standard Cryptographic hash functions

Cryptographic hash functions come in different shape and size. There

are basically two main categories of hash functions. Hash functions that

depends on a key for their computation, usually known as Message

Authentication Code or MAC and hash functions that do not depend on a key

for their computation, generally known as un-keyed hash function or simply

hash function.

All well known hash functions are either based on a block cipher or on

modular arithmetic. But before stepping into their details, we study a popular

h

x

h(x)

preimage resistance

x1 x2=?

h

h(x1)=h(x2)

second preimage

resistance

h

x1=?

h(x1)=h(x2)

collision resistance

x2=?

Figure 17 : The three security properties of hash functions


Page 103

method used to build the most popular hash functions, the

Merkle-Damgård construction.

The Merkle-Damgård construction Named after its two inventors, the American Ralph C. Merkle and the

Danish Ivan Damgård, the Merkle-Damgård structure defines a generic step

by step procedure for deriving a fixed-length output value from a variable-

length input value. The process is depicted in Figure 18 below.

The main building blocks of the Merkle-Damgård structure are:

IV: Initialization Vector or Initial Value is a fixed value used as the chaining

variable for the very first iteration.

f: the compression function or one-way hash function which is either specially

designed for hashing or based on a block cipher. The compression function

generally takes an input of fixed length and produces an output of fixed length.

Finalisation: an output transformation function which usually reduces further

the length of the output value of the last iteration.

Hash: the message digest or the hash result.

As we can see from the Figure 18, the entire message to be hashed is

first divided into n blocks of equal length. The actual length of the message

blocks depends on the requirements set by the compression function f. The

message is then padded, always, such that its length is a multiple of some

Message

Block 1

Message

Block 2

Message

Block n

Message

Block 1

Message

Block 2

Message

Block n

Length

Padding

IV f f f f Finalisation IV

Figure 18 : The Merkle-Damgård construction


Page 104

specific number. The padding is done by adding after the last bit of the last

message block a single 1-bit followed by the necessary number of 0-bits. The

length padding which consists of appending a k-bit representation the length in

bits of the original message (that is, the message before any padding has been

applied) takes place in such a way that the padding length bits are added as the

last bits of the padded message block prior to being processed by the

compression function. Every block is processed by the compression function

in the same iterative manner.

The compression function always takes two inputs in each step or

iteration, a message block and a chaining variable. In the first iteration, the

chaining variable is the IV or Initialization Vector. It is given, together with

the first block of message, as inputs to the compression function. The output

of the compression function f in the first iteration is the chaining variable in

the second iteration. The output of the compression function f in the ith

iteration is the chaining variable in the (i + 1)th

iteration and so on until we

reach the last iteration.

In the last iteration, the output of the compression function is used as

an input to a finalization function which reduces further the length of the final

output value from the compression function (however, in some cases the

finalization function is not present and the output value of the compression

function f in the last iteration is used as the final hash result).

In general, for a message M consisting of t blocks M0, M1,……………., Mt-1,

the computation of the message digest can be defined as follows:

H0 = IV,

Hi+1 = f(Hi, Mi) for 0 � i < t

H(M) = Finalisation(Ht)

In fact, the Merkle-Damg°ard construction is the single most important

reason why it is wrong (dangerous, reckless, ignorant) to think of hash

functions as black boxes. The iterative construction was designed to meet a


Page 105

very modest goal that of extending the domain of a collision-resistant function,

and should not be expected to provide security guarantees beyond that. [78]

The Message Digest-5 (MD5) Algorithm [114,115,116]

MD5 is an algorithm that is used to verify data integrity through the

creation of a 128-bit message digest from data input (which may be a message

of any length) that is claimed to be as unique to that specific data as a

fingerprint is to the specific individual. MD5, which was developed by

Professor Ronald L. Rivest of MIT in 1992 [117], is intended for use

with digital signature applications, which require that large files must be

compressed by a secure method before being encrypted with a secret key,

under a public key cryptosystem. MD5 is currently a standard, Internet

Engineering Task Force (IETF) Request for Comments (RFC) 1321.

According to the standard, it is “computationally infeasible” that any two

messages that have been input to the MD5 algorithm could have as the output

the same message digest, or that a false message could be created through

apprehension of the message digest. MD5 is the third message digest

algorithm created by Rivest. All three (the others are MD2 and MD4) have

similar structures, but MD2 was optimized for 8-bit machines, in comparison

with the two later formulas, which are optimized for 32-bit machines. The

MD5 algorithm is an extension of MD4 [118], which the critical review found

to be fast, but possibly not absolutely secure. In comparison, MD5 is not quite

as fast as the MD4 algorithm, but offers much more assurance of data security.

This is a classical example where security is favoured at the expense of speed.

The C implementation of MD5 used in this application is based largely

on the following description from the original MD5 RFC 1321 by its inventor

Ronald Rivest. [117]

The algorithm takes as input a message of arbitrary length and

produces as output a 128-bit “fingerprint” or “message digest” of the input. It

is conjectured that it is computationally infeasible to produce two messages


Page 106

having the same message digest, or to produce any message having a given

prespecified target message digest. The MD5algorithm is intended for digital

signature applications, where a large file must be “compressed” in a secure

manner before being encrypted with a private (secret) key under a public-key

cryptosystem such as RSA. Figure 19 depicts the way the input message is

turned into a 128-bit message digest.

Figure 19 : The MD5 Algorithm

We begin by supposing that we have a b-bit message as input, and that

we wish to find its message digest. Here b is an arbitrary nonnegative integer;

b may be zero, it need not be a multiple of eight, and it may be arbitrarily

large. We imagine the bits of the message written down as follows:

m_0 m_1 ... m_{b-1}

The following five steps are performed to compute the message digest

of the message.


Page 107

Step 1. Append Padding Bits

The message is "padded" (extended) so that its length (in bits) is

congruent to 448, modulo 512. That is, the message is extended so that it is

just 64 bits shy of being a multiple of 512 bits long. Padding is always

performed, even if the length of the message is already congruent to 448,

modulo 512. Padding is performed as follows: a single “1” bit is appended to

the message, and then “0” bits are appended so that the length in bits of the

padded message becomes congruent to 448, modulo 512. In all, at least one bit

and at most 512 bits are appended.

Step 2. Append Length

A 64-bit representation of b (the length of the message before the

padding bits were added) is appended to the result of the previous step. In the

unlikely event that b is greater than 264

, and then only the low-order 64 bits of

b are used. (These bits are appended as two 32-bit words and appended low-

order word first in accordance with the previous conventions.)

At this point the resulting message (after padding with bits and with b)

has a length that is an exact multiple of 512 bits. Equivalently, this message

has a length that is an exact multiple of 16 (32-bit) words. Let M[0 ... N-1]

denote the words of the resulting message, where N is a multiple of 16.

Step 3. Initialize MD Buffer

word A: 01 23 45 67

word B: 89 ab cd ef

word C: fe dc ba 98

word D: 76 54 32 10


Page 108

A four-word buffer (A,B,C,D) is used to compute the message digest.

Here each of A, B, C, D is a 32-bit register. These registers are initialized to

the following values in hexadecimal, low-order bytes first):

Step 4. Process Message in 16-Word Blocks

We first define four auxiliary functions that each take as input three

32-bit words and produce as output one 32-bit word.

F(X, Y, Z) = (X AND Y) OR ((NOT X) AND Z)

G(X, Y, Z) = (X AND Z) OR (Y AND (NOT Z))

H(X, Y, Z) = X XOR Y XOR Z

I(X, Y, Z) = Y XOR (X OR (NOT Z))

In each bit position F acts as a conditional: if X then Y else Z. The

function F could have been defined using + instead of OR since XY and

not(X)Z will never have 1's in the same bit position. It is interesting to note

that if the bits of X, Y, and Z are independent and unbiased, the each bit of

F(X,Y,Z) will be independent and unbiased.

The functions G, H, and I are similar to the function F, in that they act

in “bitwise parallel” to produce their output from the bits of X, Y, and Z, in

such a manner that if the corresponding bits of X, Y, and Z are independent

and unbiased, then each bit of G(X,Y,Z), H(X,Y,Z), and I(X,Y,Z) will be

independent and unbiased. Note that the function H is the bit-wise “xor” or

“parity” function of its inputs.

This step uses a 64-element table T[1 ... 64] constructed from the sine

function. Let T[i] denote the i-th element of the table, which is equal to the

integer part of 4294967296 times abs(sin(i)), where i is in radians.


Page 109

Do the following:

/* Process each 16-word block. */

For i = 0 to N/16-1 do

/* Copy block i into X. */

For j = 0 to 15 do

Set X[j] to M[i*16+j].

end /* of loop on j */

/* Save A as AA, B as BB, C as CC, and D as DD. */

AA = A

BB = B

CC = C

DD = D

/* Round 1. */

/* Let [abcd k s i] denote the operation

a = b + ((a + F(b,c,d) + X[k] + T[i]) <<< s). */

/* Do the following 16 operations. */

[ABCD 0 7 1] [DABC 1 12 2] [CDAB 2 17 3] [BCDA 3 22 4]




/* Round 2. */

/* Let [abcd k s i] denote the operation

a = b + ((a + G(b,c,d) + X[k] + T[i]) <<< s). */






/* Round 3. */

/* Let [abcd k s t] denote the operation

a = b + ((a + H(b,c,d) + X[k] + T[i]) <<< s). */






/* Round 4. */

/* Let [abcd k s t] denote the operation

a = b + ((a + I(b,c,d) + X[k] + T[i]) <<< s). */


Page 110






/* Then perform the following additions. (That is increment each of the four

registers by the value it had before this block was started.) */

A = A + AA

B = B + BB

C = C + CC

D = D + DD

end /* of loop on i */

Step 5. Output

The message digest produced as output is A, B, C, D. That is, we begin

with the low-order byte of A, and end with the high-order byte of D.

The MD5 message-digest algorithm is simple to implement, and

provides a "fingerprint" or message digest of a message of arbitrary length. It

is conjectured that the difficulty of coming up with two messages having the

same message digest is on the order of 264

operations, and that the difficulty of

coming up with any message having a given message digest is on the order of

2128

operations. The MD5 algorithm has been carefully scrutinized for

weaknesses. It is, however, a relatively new algorithm and further security

analysis is of course justified, as is the case with any new proposal of this sort.

The following are the differences between MD4 and MD5 [7]:

1. A fourth round has been added.

2. Each step now has a unique additive constant.

3. The function g in round 2 was changed from (XY v XZ v YZ) to (XZ v

Y not(Z)) to make g less symmetric.

4. Each step now adds in the result of the previous step. This promotes a

faster "avalanche effect".


Page 111

5. The order in which input words are accessed in rounds 2 and 3 is

changed, to make these patterns less like each other.

6. The shift amounts in each round have been approximately optimized, to

yield a faster "avalanche effect." The shifts in different rounds are

distinct.

4.2.1.1.1 Security of MD5

The MD5 algorithm has one interesting property that every bit of the

output is a function of every bit of the input. The complexity in the repeated

use of the primitive functions and the additive constant T[i] together with the

circular left shifts unique to every round produce a well mixed hash result.

This procedure makes it very unlikely that two messages that show a

similar regularity will have the same hash result.

Ron Rivest conjectures that MD5 is as strong as possible for a 128-bit

hash code. What Rivest means is that finding two messages having the same

hash value requires 264 operations. This is due to the birthday paradox

explained earlier; and finding a message given its corresponding message

digest will take 2128 operations, this is known as the pre-image attack.

However, this is only an opinion which is not proven and any

mechanism which takes fewer operations than that will prove the MD5

algorithm less secure than originally believed.

4.2.1.1.2 Attacks on MD5

More generally, we can distinguish three main categories of attacks on

cryptographic hash function which are pre-image attacks, second pre-image

attacks and collision attacks. In one way or another, all kind of attacks fall into

either one of these categories. This is the main reason why one-way hash

functions are required to be preimage resistant and second preimage resistant

while in addition to these two properties, collision-resistant hash function need

to exhibit the property of being collision-resistant. We take a look at some

commonly known attacks.


Page 112

Collision attack in MD5 cryptographic hash function-

In August 2004 Xiaoyun Wang and Hongbo Yu [79] of Shandong

University in China published an article in which they describe an algorithm

that can find two different sequences of 128 bytes with the same MD5 hash.

Their research was motivated by the possibility of finding a colliding pair of

messages, each consisting of two blocks. The attack revolves around finding

two distinct message blocks (M0, N0) and (M1, N1) where the first blocks

differ only in a predefined constant vector cv1 such that M1 = M0 + cv1 and the

second message blocks differs in a predefined constant vector cv2 (cv2 = - cv1

modulo 232

) such that N1 = N0 +cv2 and MD5(M0, N0) = MD5(M1, N1).

However, there is a considerable amount of conditions that have to be

met for the attack to be successful. Basically, the attack makes use of

differential or differences in message that are spread over a length of two

message blocks. The first block's difference introduces a small difference in

the original state or initialization vector whereas the second block's difference

cancels out the difference introduced by the first block. We also note that

finding the first blocks (M0, M1) takes about 239

MD5 operations, and finding

the second blocks (N0, N1) takes about 232

MD5 operations. The application of

this attack on IBM P690 takes about one hour to find M0 and M1, where in the

fastest cases it takes only 15 minutes. Then, it takes only between 15 seconds

to 5 minutes to find the second blocks N0 and N1.

Pure Brute-force Attack-

The pure brute-force attack is one in which all possible words of a

certain length are tried until the correct one is found. This attack is guaranteed

to work, that is why one usually chooses the length of the hash result in such a

way that the brute-force attack becomes impractical or too slow and thus less

attractive.


Page 113

Secure Hash Algorithms (SHA1 and SHA2) The Secure Hash Algorithm (SHA) was developed by the National

Security Agency (NSA) and published in 1993 by the National Institute of

Standard and Technology (NIST) as a U.S. Federal Information Processing

Standard (FIPS PUB 180). SHA is based on and shares the same building

blocks as the MD4 algorithm and taking some cues from MD5. Unlike the

MD4 algorithm, the design of SHA introduced a new procedure which

expands the 16-word message block input to the compression function to an

80-word block among other things.

In 1994, NIST announced that a technical flaw in SHA was found. And

that this flaw makes the algorithm less secure than originally believed. No

further details were given to the public, only that a small modification was

made to the algorithm which was now known as SHA-1 and published in

FIBS PUB 180-1.

The SHA-2 family of hash algorithm consists of four cryptographic

hash functions denoted by SHA-224, SHA-256, SHA-384, and SHA-512. In

fact, NIST updated its hash function standard to FIPS PUB 180-2 in 2002.

This update specified three new hash functions, next to SHA-1, known as

SHA-256, SHA-384, and SHA-512. SHA-1 is designed to produce a 160-bit

message digest. The other hash functions have the length of their message

digest indicated by the number following prefix “SHA-”, thus SHA-256

produces a 256-bit message digest, whereas SHA-384 produces a 384-bit hash

value and so on. SHA-224 which produces a message digest of 224-bit was

added to the standard in 2004 [78,119].

The SHA3 [19] is a hash function formerly called Keccak, chosen in

2012 after a public competition among non-NSA designers. This function

designed by Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles Van

Assche, building upon RadioGatún. SHA-3 is not meant to replace SHA-2, as

no significant attack on SHA-2 has been demonstrated. Because of the

successful attacks on MD5, SHA-0 and theoretical attacks on SHA-


Page 114

1, NIST perceived a need for an alternative, dissimilar cryptographic hash,

which became SHA-3. It supports the same hash lengths as SHA-2, and its

internal structure differs significantly from the rest of the SHA family. NIST

has said that FIPS 180-5 will include SHA-3. [19]

Here we describe SHA1 and SHA-2 (SHA-256) in details as we use

these algorithms in our application.

4.2.1.1.3 SHA1 Algorithm [120,121]

The Secure Hash Algorithm (SHA-1) is required for use with the

Digital Signature Algorithm (DSA) as specified in the Digital Signature

Standard (DSS) and whenever a secure hash algorithm is required for federal

applications. For a message of length < 264

bits, the SHA-1 produces a 160-bit

condensed representation of the message called a message digest. The

message digest is used during generation of a signature for the message. The

SHA-1 is also used to compute a message digest for the received version of

the message during the process of verifying the signature. Any change to the

message in transit will, with very high probability, result in a different

message digest, and the signature will fail to verify. [121]

The SHA-1 algorithm accepts as input a message with a maximum

length of 264

-1 and produces a 160-bit message digest as output. The message

is processed by the compression function in 512-bit block. Each block is

divided further into sixteen 32-bit words denoted by Mt for t = 0, 1, ……, 15.

The compression function consists of four rounds, each round is made up of a

sequence of twenty steps. A complete SHA-1 round consists of eighty steps

where a block length of 512 bits is used together with a 160-bit chaining

variable to finally produce a 160-bit hash value.

The C implementation of SHA-1 used in this application is based

largely on the following description from the original SHA-1 FIBS PUB 180-1

by NSA.


Page 115

Step 1: Message Padding

The SHA-1 is used to compute a message digest for a message or data

file that is provided as input. The message or data file should be considered to

be a bit string. The length of the message is the number of bits in the message

(the empty message has length 0). If the number of bits in a message is a

multiple of 8, for compactness we can represent the message in hex. The

purpose of message padding is to make the total length of a padded message a

multiple of 512. The SHA-1 sequentially processes blocks of 512 bits when

computing the message digest. In a nut shell, a “1” followed by m “0”s

followed by a 64-bit integer are appended to the end of the message to produce

a padded message of length 512 * n. The 64-bit integer is l, the length of the

original message. The padded message is then processed by the SHA-1 as n

512-bit blocks.

The padded message will contain 16 * n words for some n > 0. The

padded message is regarded as a sequence of n blocks M1 , M2, ... , Mn, where

each Mi contains 16 words and M1 contains the first characters (or bits) of the

message.

Step 2: Preparing Processing Functions

A sequence of logical functions f0, f1,..., f79 is used in the SHA-1. Each ft, 0 <=

t <= 79, operates on three 32-bit words B, C, D and produces a 32-bit word as

output. ft(B,C,D) is defined as follows: for words B, C, D,

ft(B,C,D) = (B AND C) OR ((NOT B) AND D) ( 0 <= t <= 19)

ft(B,C,D) = B XOR C XOR D (20 <= t <= 39)

ft(B,C,D) = (B AND C) OR (B AND D) OR (C AND D) (40 <= t <= 59)

ft(B,C,D) = B XOR C XOR D (60 <= t <= 79).

Step 3: Preparing Processing Constants

A sequence of constant words K(0), K(1), ... , K(79) is used in the SHA-1. In

hex these are given by:

K = 5A827999 ( 0 <= t <= 19)

Kt = 6ED9EBA1 (20 <= t <= 39)

Kt = 8F1BBCDC (40 <= t <= 59)

Kt = CA62C1D6 (60 <= t <= 79).


Page 116

Step 4: Computing the Message Digest

The message digest is computed using the final padded message. The

computation uses two buffers, each consisting of five 32-bit words, and a

sequence of eighty 32-bit words. The words of the first 5-word buffer are

labelled A, B, C, D, E. The words of the second 5-word buffer are labelled H0,

H1, H2, H3, H4. The words of the 80-word sequence are labelled W0, W1,...,

W79. A single word buffer TEMP is also employed.

To generate the message digest, the 16-word blocks M1, M2,..., Mn

defined in step 1are processed in order. The processing of each Mi involves 80

steps. Before processing any blocks, the {Hi} are initialized as follows: in hex,

H0 = 67452301

H1 = EFCDAB89

H2 = 98BADCFE

H3 = 10325476

H4 = C3D2E1F0.

Now M1, M2, ... , Mn are processed. To process Mi, we proceed as follows:

a. Divide Mi into 16 words W0, W1, ... , W15, where W0 is the left-most

word.

b. For t = 16 to 79 let Wt = S1 (Wt-3 XOR Wt-8 XOR Wt- 14 XOR Wt-16).

c. Let A = H0, B = H1, C = H2, D = H3, E = H4.

d. For t = 0 to 79 do

TEMP = S5(A) + ft(B,C,D) + E + Wt + Kt;

e. E = D; D = C; C = S30

(B); B = A; A = TEMP;

f. Let H0 = H0 + A, H1 = H1 + B, H2 = H2 + C, H3 = H3 + D, H4 = H4 + E.

After processing Mn, the message digest is the 160-bit string represented by

the 5 words H0 H1 H2 H3 H4.

Step 4: Output

The contents in H0, H1, H2, H3, H4, H5 are returned in sequence the message

digest.

Figure 20 depicts one iteration within the SHA-1 compression function below.


Page 117

4.2.1.1.4 Security of SHA-1

(a) Pre-image resistance: The difficulty of producing any message having a

given message digest is of the order of 2160

.

(b) Second pre-image resistance: The difficulty of producing two distinct

messages having the same message digest is of the order of 280

operations.

(c) Speed, simplicity and architecture: Addition in SHA-1 is performed

modulo 232

, thus it is well suited for a 32-bit architecture. Because SHA-1

involves 80 steps and must process 160-bit buffer (five 32-bit registers), it

is slower than MD5 on the same architecture. SHA-1 is also simple to

describe and to implement in both software and hardware. Finally SHA-1

uses a big-endian scheme for interpreting a message as a sequence of 32-

bit words.

4.2.1.1.5 Attacks against SHA-1

If one cannot find collision in less than 280

operations, then SHA-1 is

considered secure and can still be used. But in 2005 Prof. Xiaoyun Wang

[122] announced a differential attack on the SHA-1 hash function; with her

recent improvements, this attack is expected to find a hash collision (two

messages with the same hash value) with an estimated work of 263

operations,

rather than the ideal 280

operations that should be required for SHA-1 or any

good 160-bit hash function. This is a very large computation, and to our

knowledge nobody has yet verified Prof. Wong’s method by finding a SHA-1

collision, but 263

operations is plainly within the realm of feasibility for a high

resource attacker. NIST accepts that Prof. Wang has indeed found a practical

collision attack on SHA-1.


Page 118

Figure 20: One iteration within the SHA-1 compression function: A, B, C, D and E are 32

bit words of the state;

F is a nonlinear function that varies;

n denotes a left bit rotation by n places;

n varies for each operation;

Wt is the expanded message word of round t;

Kt is the round constant of round t;

denotes addition modulo 232

Source: http://en.w ikipedia.org/wiki/File:SHA1.svg

4.2.1.1.6 SHA2 (SHA-256) Algorithm [119]

In August 2002, FIPS PUB 180-2 became the new Secure Hash

Standard, replacing FIPS PUB 180-1. In 2005, security flaws were identified

in SHA-1, namely that a mathematical weakness might exist, indicating that a

stronger hash function would be desirable [119]. Although SHA-2 bears some

similarities to the SHA-1 algorithm, these attacks have not been successfully

extended to SHA-2. SHA-256 operates on 32-bit words. It is iterative, one-

way hash functions that can process a message to produce a condensed

representation called a message digest of 256-bit. The algorithm enables the

determination of a message’s integrity: any change to the message will, with a

very high probability, results in a different message digest. This property is

useful in the generation and verification of digital signatures and message


Page 119

authentication codes, and in the generation of random numbers (bits). The

SHA-256 is used for Digital Signature.

SHA-256 may be used to hash a message, M, having a length of � bits,

where 0 ≤ l < 264

. The algorithm uses 1) a message schedule of sixty-four 32-

bit words, 2) eight working variables of 32 bits each, and 3) a hash value of

eight 32-bit words. The final result of SHA-256 is a 256-bit message digest.

The words of the message schedule are labelled W0, W1,…, W63. The

eight working variables are labelled a, b, c, d, e, f, g, and h. The words of the

hash value are labelled H0

(i)

, H1

(i)

,…………, H7

(i)

, which will hold the initial

hash value, H(0)

, replaced by each successive intermediate hash value (after

each message block is processed), H(i)

, and ending with the final hash value,

H(N)

. SHA256 also uses two temporary words, T1 and T2.

The C implementation of SHA-256 used in this application is based

largely on the following description from the original SHA-2 FIBS PUB 180-2

by NSA.

Step 1: Padding the Message

The message, M, shall be padded before hash computation begins. The

purpose of this padding is to ensure that the padded message is a multiple of

512 bits. Suppose that the length of the message, M, is � bits. Append the bit

“1” to the end of the message, followed by k zero bits, where k is the smallest,

non-negative solution to the equation � + 1 + k � 448 mod512. Then append

the 64-bit block that is equal to the number l expressed using a binary

representation. For example, the (8-bit ASCII) message “abc” has length

8x3 = 24, so the message is padded with a one bit, then 448 -(24 +1) = 423

zero bits, and then the message length, to become the 512-bit padded message.

The length of the padded message should now be a multiple of 512 bits.


Page 120

Step 2: Parsing the Padded Message

After a message has been padded, it must be parsed into N m-bit blocks

before the hash computation can begin. The padded message is parsed into N 512-

bit blocks, M(1)

, M(2)

, ….., M(N)

. Since the 512 bits of the input block may be

expressed as sixteen 32-bit words, the first 32 bits of message block i are denoted

�� the next 32 bits are �� , and so on up to �� .Step3: Setting the Initial Hash Value (H

(0)

)

For SHA-256, the initial hash value, H(0)

, shall consist of the following eight 32-

bit words, in hex:

H 0

(0)

= 6a09e667

H1

(0)

= bb67ae85

H2

(0)

= 3c6ef372

H3

(0)

= a54ff53a

H4

(0)

= 510e527f

H 5

(0)

= 9b05688c

H 6

(0)

= 1f83d9ab

H 7

(0)

= 5be0cd19.

These words were obtained by taking the first thirty-two bits of the fractional

parts of the square roots of the first eight prime numbers.

Step 4: SHA-256 Hash Computation

SHA-256 uses six logical functions, where each function operates on

32-bit words, which are represented as x, y, and z. The result of each function

is a new 32-bit word.

Ch(x, y ,z) = (x�� y) � (x�� z)

Maj(x, y ,z) = (x�� y) � (x�� z)�� (y�� z)

� �� = ROTR2(x) ��ROTR

13(x) �ROTR

22 (x)


Page 121

� �� = ROTR6(x) ��ROTR

11(x) �ROTR

25 (x)

�0

{256}

(x) = ROTR7(x) ��ROTR

18(x) �ROTR

3 (x)

�1

{256}

(x) = ROTR17

(x) ��ROTR19

(x) �ROTR10

(x)

SHA-256 uses a sequence of sixty-four constant 32-bit words, K0

{256}

,

K1

{256}

, ……., K63

{256}

, These words represent the first thirty-two bits of the

fractional parts of the cube roots of the first sixty-four prime numbers. In hex,

these constant words are (from left to right)

428a2f98 71374491 b5c0fbcf e9b5dba5 3956c25b 59f111f1 923f82a4

ab1c5ed5 d807aa98 12835b01 243185be 550c7dc3 72be5d74 80deb1fe

9bdc06a7 c19bf174 e49b69c1 efbe4786 0fc19dc6 240ca1cc 2de92c6f

4a7484aa 5cb0a9dc 76f988da 983e5152 a831c66d b00327c8 bf597fc7

c6e00bf3 d5a79147 06ca6351 14292967 27b70a85 2e1b2138 4d2c6dfc

53380d13 650a7354 766a0abb 81c2c92e 92722c85 a2bfe8a1 a81a664b

c24b8b70 c76c51a3 d192e819 d6990624 f40e3585 106aa070 19a4c116

1e376c08 2748774c 34b0bcb5 391c0cb3 4ed8aa4a 5b9cca4f 682e6ff3

748f82ee 78a5636f 84c87814 8cc70208 90befffa a4506ceb bef9a3f7

c67178f2.

After completion of the above steps each message block, M(1)

, …, M(N)

, is

processed in order, using the following steps:

For i = 1 to N:

{

1. Prepare the message schedule, {Wt}:

Wt = � �� !�� " �!# "�� !�� " �!��$� � �� $%2. Initialize the eight working variables, a, b, c, d, e, f, g, and h, with the

(i-1)st hash value:

a = H0

(i-1)

b = H1

(i-1)

c = H2

(i-1)

d = H3

(i-1)


Page 122

e = H4

(i-1)

f = H5

(i-1)

g = H6

(i-1)

h = H7

(i-1)

3. For t = 0 to 63:

{

&� ' ( "�� )� " �*(�)+ ,+ -� " .�� " ��&� ' � �/� " � /0�/+ 1+ 2�� h = g

g = f

f = e

e = d + T1

d = c

c = b

b = a

a = T1 + T2

}

4. Compute the ith intermediate hash value H(i):

H0

(i)

= a + H0

(i-1)

H1

(i)

= b + H1

(i-1)

H2

(i)

= c + H2

(i-1)

H3

(i)

= d + H3

(i-1)

H4

(i)

= e + H4

(i-1)

H5

(i)

= f + H5

(i-1)

H6

(i)

= g + H6

(i-1)

H7

(i)

= h + H7

(i-1)

}

Step 5: Output

After repeating steps one through four a total of N times (i.e., after

processing M(N)

), the resulting 256-bit message digest of the message, M, is

3�4� 5 �3��4� 5 �3��4� 5 �36�4� 5 �37�4� 5 �3��4� 5 �3��4� 5 �3#�4�


Page 123

4.2.1.1.7 Security of SHA-256

There are two meet-in-the-middle preimage attacks against SHA-2

with a reduced number of rounds. The first one attacks 41-round SHA-256 out

of 64 rounds with time complexity of 2253.5

and space complexity of 216

. The

second one attacks 42-round SHA-256 with time complexity of 2251.7

and

space complexity of 212

. Figure 21 shows one iteration in a SHA-256

compression function below.

Figure 21: One iteration in a SHA-256 compression function. The blue components perform

the following operations:

The red is addition modulo 232

.

Source: http://en.wikipedia.org/wiki/File:SHA2.svg

Page 124

4.2.2 Comparisons Of various Cryptographic hash functions

Table 17: Comparison of various cryptographic hash functions

Algorithm Year of

Standard

Output size

(bits)

Internal

state

size

Block

size

Length

size

Word

size Rounds

Best known attacks

(complexity: rounds)

Collision Second

preimage Preimage

MD2 1989 128 384 128 - 32 864 Yes (263.3

) Yes (273

) Yes (273

)

MD4 1990 128 128 512 64 32 48 Yes (3) Yes (264

) Yes (278.4

)

MD5 1992 128 128 512 64 32 64 Yes (220.96

) Yes (2123.4

) Yes (2123.4

)

SHA-0 1993 160 160 512 64 32 80 Yes (233.6

) No No

SHA-1 1995 160 160 512 64 32 80 Yes (251

) No No

SHA-256

SHA-224

2002

2004

256

224 256 512 64 32 64

Theoretica

l (228.5

:24)

Theoretica

l (2248.4

:42)

Theoretica

l (2248.4

:42)

SHA-512

SHA-384

2002

2002

512

384 512 1,024 128 64 80

Theoretica

l (232.5

:24)

Theoretica

l (2494.6

:42)

Theoretica

l (2494.6

:42)

SHA-3 2012 224/256/384

/512 1600

1600-

2*bits- 64 24 No No No

SHA-3-224 2012 224 1600 1152 - 64 24 No No No

SHA-3-256 2012 256 1600 1088 - 64 24 No No No

SHA-3-384 2012 384 1600 832 - 64 24 No No No

SHA-3-512 2012 512 1600 576 - 64 24 No No No Note : The internal state here means the "internal hash sum" after each compression of a data block. Most hash algorithms also internally use some

additional variables such as length of the data compressed so far since that is needed for the length padding in the end. (Source: Wikipedia)

Documents

4 Digital Signature and Hash Functionsshodhganga.inflibnet.ac.in/bitstream/10603/184906/... · MD5 or any Secure Hash Algorithm (SHA) variant and present a new digital signature technique