An Area Efficient Universal Cryptography Processor For Smart Cards

JOGINPALLY M.N. RAO WOMENs ENGINEERING COLLEGE 1

CHAPTER 1

INTRODUCTION

1.1 Introduction

Security is a broad topic and covers a multitude of sins. Most security problems are intentionally

caused by malicious people trying to gain some benefit or harm someone. The requirement of information

security has undergone two major changes in last two decades. In earlier days cabinets with a combination

lock for storing sensitive documents were used. With introduction of computer, the need for automated tools

for protecting files and other information became evident. This is very important in case of shared systems as

well as for data network or internet. The generic term for the collection of the tools designed to protect data

and thwart hackers is Computer Security

In this digital world, with the increment of Internet in human life every step like Banking, payment,

financial transaction etc. The importance of network security is also increasing. Security forms the backbone

of todays digital world.

1.1.1 Aim

To implement a n area efficient universal cryptography processor for smart cards.

1.1.2 Previous System

Data Encryption Standard (DES)

1. This is a well-established algorithm that has been used for more than two decades (since 1977) in

military and commercial data exchange and storage.

2. The algorithm is designed to encipher and decipher blocks of data consisting of 64 b using a 56-b key.

3. It uses 2 basic techniques of cryptography: Confusion & Diffusion. Confusion is achieved through

numerous permutations & Diffusion is achieved through XOR and Shift operations.

1.1.3 Present System

Advanced Encryption Standard (AES)

1. AES, also known as Rijndael, is a block encryption algorithm which encrypts blocks of 128 b using a

unique key for both encryption and decryption.


2. Three versions of the algorithm are available differing only in the key generation procedure and in the

number of rounds the data is processed for a complete encryption (decryption).

3. The 128-b input data is considered as a 4X4 array of 8-b bytes (also called state in the algorithm).

1.2 Objectives

The objective of this project is to find concurrent structure independent fault detection schemes for

reaching reasonable fault coverage. It makes robust implementation of AES against these above attacks and

provides highest efficiencies, showing reasonable area and time complexity overheads.

1.3 Literary Survey

Xilinx ISE

The Xilinx ISE is a design environment for FPGA products from Xilinx, and is tightly-coupled to the

architecture of such chips, and cannot be used with FPGA products from other vendors.[2] The Xilinx ISE is

primarily used for circuit synthesis and design, while the ModelSim logic simulator is used for system-level

testing. Other components shipped with the Xilinx ISE include the Embedded Development Kit (EDK), a

Software Development Kit (SDK) and ChipScope Pro.

Verilog:

Verilog HDL is an accepted IEEE standard. In 1995, the original standard IEEE 1364-1995 was approved.

IEEE 1364-2001 is the latest Verilog HDL standard that made significant improvements to the original

standard.

1. Most popular logic synthesis tools support Verilog HDL. This makes it the language of choice for

designers.

2. Verilog HDL is a general-purpose hardware description language that is easy to learn and easy to use. It is

similar in syntax to the C programming language. Designers with C programming experience will find it

easy to learn Verilog HDL.

3. Verilog is both a behavioral and structural language.

1.4 Organization of Project

Chapter 2 explains the general theory related to the project.

Chapter 3 explains the hardware description of the project.

Chapter 4 explains the software description of the project.

Chapter 5 gives the result analysis of the project.


CHAPTER 2

GENERAL THEORY

2.1 Introduction to VLSI

Very-large-scale integration (VLSI) is the process of creating an integrated circuit (IC) by combining

thousands of transistors into a single chip. VLSI began in the 1970s when

complex semiconductor and communication technologies were being developed. The microprocessor is a

VLSI device. Before the introduction of VLSI technology most ICs had a limited set of functions they could

perform. An electronic circuit might consist of a CPU, ROM, RAM and other glue logic. VLSI lets IC

designers add all of these into one chip.

Overview:

The first semiconductor chip held one transistor each. Subsequent advances added more and more

transistors, and as a consequence, more individual functions or systems were integrated over time. The first

integrated circuits held only a few devices, perhaps as many as ten diodes , transistors, resistors and

capacitors, making it possible to fabricate one or more logic gates on a single device. Now known

retrospectively as "small-scale integration"(SSI), improvements in technique led to devices with hundreds of

logic gates, known as large-scale integration(LSI),i.e., system with at least a thousand logic gates. Current

technology has moved far past this mark and today's microprocessor have many millions of gates and

hundreds of millions of individual transistors.

At one time, there was an effort to name and calibrate various levels of large-scale integration above

VLSI. Terms like Ultra-large-scale Integration (ULSI) were used. But the huge number of gates and

transistors available on common devices has rendered such fine distinctions moot. Term suggesting greater

than VLSI levels of integration are no longer in widespread use. Even VLSI is now somewhat quaint, given

the common assumption that all microprocessors are VLSI or better .

As of early 2008 , billion-transistor processors are commercially available, an example of which is

Intel's Montecito Itanium chip. This is expected to become more common place as semiconductor

fabrication moves from the current generation of 65 nm processor to the next 45 nm generations(while

experiencing new challenges such as increased variation across process corner). Another notable example is

NVIDIA's 280 series GPU.


This microprocessor is unique in the fact that its 1.4 Billion transistor count capable of a teraflop of

performance, is almost entirely dedicated to logic(Itanium's transistor count is largely due to the 24MB L3

cache).Current design, as opposed to the earliest devices, use extensive design automation logic synthesis to

lay out the transistors, enabling higher levels of complexity in the resulting logic functionality. Certain high-

performance logic blocks like the SRAM cell, however, are still designed by hand to ensure the highest

efficiency(sometimes by bending or breaking established design rules to obtain the last bit of performance

by trading stability).

What is VLSI?

VLSI stands for Very Large Scale Integration". This is the field which involves packing more and

more logic devices into smaller and smaller areas.

1. Simply we say Integrated circuit is many transistors on one chip.

1 Design/manufacturing of extremely small, complex circuitry using modified semiconductor

material

2 Integrated circuit(IC) may contain millions of transistors, each a few mm in size

3 Applications wide ranging: most electronic logic devices.

2.2 History of Scale Integration:

1. Late 1940s Transistor invented at Bell Labs

2. Late 1950s First IC (JK-FF by Jack Kilby at TI)

3. Late 1960s Medium Scale Integration (MSI)

4. 100s of transistors on a chip

5. Early 1970s Large Scale Integration(LSI)

6. 1000s of transistor on a chip

7. Early 1960s Small Scale Integration(SSI)

8. Early 1980s VLSI 10,000s of transistors

9. Chip (later 100,000s&now 1,000,000s)

10. Ultra LSI is sometimes used for 1,000,000s

11. SSI - Small-Scale Integration(0-102)

12. MSI - Medium-Scale Integration(102-103)

13. LSI - Large-Scale Integration(103-105)

14. VLSI - Very Large-Scale Integration(105-10)

15. ULSI - Ultra Large-Scale Integration(>=107)


2.3 Advantages of ICs over Discrete Components

While we will concentrate on integrated circuits, the properties of integrated circuits-what we can and

cannot efficiently put in an integrated circuit-largely determine the architecture of the entire system.

Integrated circuits improve system characteristics in several critical ways. ICs have three key advantages over

digital circuits built from discrete components:

Size:

Integrated circuits are much smaller-both transistors and wires are shrunk to micrometer sizes,

compared to the millimeter or centimeter scales of discrete components. Small size leads to advantages in

speed and power consumption, since smaller components have smaller parasitic resistances, capacitances, and

inductances.

Speed:

Signal can be switched between logic 0 and logic 1 much quicker within a chip than they can between

chip. Communication within a chip can occur hundreds of times faster than communication between chips on

a printed circuits board. The high speed of circuits on-chip is due to their small size-smaller components and

wires have smaller parasitic capacitance to slow down the signals.

Power consumption:

Logic operations within a chip also take much less power. Once again, lower power consumption is

largely due o the small size of circuits on the chip-smaller parasitic capacitances and resistance require less

power to drive them.

VLSI and systems:

These advantages of integrated circuits translate into advantages at the system level:

Smaller physical size:

Smallness is often an advantage in itself considers portable television or handheld cellular telephones.


Lower power consumption:

Replacing a handful of standard parts with single chip reduces total power consumption. Reducing

power consumption has a ripple effect on the rest of the system: a smaller, cheaper power supply can be used;

since less power consumption means less heat, a fan may no longer be necessary.

Reduced cost:

Reducing the number of components, the power supply requirements, cabinet cost, and so an, will

inevitably reduce system cost. The ripple effect o integration is such that the cost of a system built from

custom ICs can be less, even though the individual ICs cost more than the standrads parts they replace.

Understanding why integrated circuit technology has such profound influse on the design of digital

system requires understanding both the technology of IC manufacturing and the economics of ICs and digital

systems.

Applications:

1. Electronics system in cars.

2. Digital electronics control VCRs.

3. Transaction processing system, ATM

4. Personal computers and Workstations

5. Medical electronic system.

2.4 Applications of VLSI

Electronic system now performs a wide variety of tasks in daily life. Electronic system in some cases

has replaced mechanisms that operated mechanically, hydraulically, or by other means; electronics are usually

smaller, more flexible, and easier to service. In other cases electronic systems have created totally new

applications. Electronic system perform a variety of tasks, some of them visible, some are hidden:

1. Personal entertainment system such as portable MP3 players and DVD players perform sophisticated

algorithm with remarkably little energy.

2. Electronic system in cars operate stereo systems and displays; they also control fuel injection systems,

adjust suspensions to varying terrain, and perform the control functions required for anti-lock

braking(ABS)systems.

3. Digital electronics compress and decompress video, even at high definition data rates, on-the-fly in

consumer electronics.


4. Low-cost terminals for Web browsing still require sophisticated electronics, despite their dedicated

function.

5. Personal computers and workstations provide word-processing, financial analysis, and games.

Computers include both central processing units (CPUs) and special-purpose hardware for disk access,

faster screen display, etc.

6. Medical electronic system measure bodily functions and perform complex processing algorithms to

warn about unusual conditions. The availability of these complex systems, far from overwhelming

consumers, only creates demand for even more complex systems.

The growing sophisticated of application continually pushes the design and manufacturing of

integrated circuits and electronic systems to new levels of complexity.

And perhaps the most amazing characteristic of this collection of systems is its variety-as systems

become more complex, we build not a few general purpose computers but an ever wider range of special-

purpose systems. our ability to do so is a testament to our growing mastery of both integrated circuit

manufacturing and design, but the increasing demands of customers continue to test the limits of design and

manufacturing

2.5 The main digital VLSI circuit testing problems

Due to the portability of the integrated circuit chips in their manufacturing, the physical test of the

designed chip is not possible. There are certain testing methods which must have to follow to find the

problems in that design. Those testing methods includes the functional testing methods, accessing of primary

I/O ports of the circuit board and by providing the limited coverage and poor diagnostic facilities for the

circuit board which is under test. All these testing requirements are mainly to detect the primary fault in the

circuit board itself. These testing methods are mainly helpful for finding the errors, so that the cost of the test

equipment will be decreases and also lead to the research taken place to find the main problems occurring

while testing the digital VLSI circuits.

The major problems founded so far are as follows.

Test generation problems.

The input combinatorial problems.

The gate to I/O pin ratio problems.


Test Generation Problems

The computers are taking very high automatic-test generation time of weeks to months sometimes for

the process of computation. This is because of the high number of gates in the digital circuit of its portability.

This affects the test patterns and the computation cost, which is held by external testing equipment. And

another main aspect is the time, the testing time also increased because of that problem.

There is one more test generation problem. Computer algorithms can generate automatic test patterns.

These are suited well only for the combinatorial logic circuits. This is the main problem because, they does

not suits for a sequential logic circuits, because they need more space in the memory and the steps followed

by the evaluating computation techniques is somewhat difficult.

The input combinatorial problems

For a combinatorial logic circuit; the number of test vectors for the exhaustive testing of the digital

circuit is given by 2N, for the combinatorial circuit having the N number of inputs. On the other hand, the

number of test pattern vectors is required for the Medium Scale Integrated circuit such as a 32 bit micro

processor is undefined.

The gate to I/O pin ration problems

These days, the number of gates of ICs is keeping on growing. Due to the rapid increase in these gates

of ICs, it is very much difficult to the corresponding pins of the IC to control the input signal, which is

nothing but controllability and also difficult for the output pins to observe the proper signal, which is called

observe ability. The rate of count of the pins is much slower than the rates of counts of gates, which affects

the controllability and observe ability conditions.

Because of the problems stated above, the designing engineers were motivated and they searched for reliable

testing circuitry. Obvious solution will be the special circuit, inserted on the digital VLSI circuit which has to

be tested is BIST, because of its special testing ability.

2.6 Introduction to Cryptography

Security is a broad topic and covers a multitude of sins. Most security problems are intentionally

caused by malicious people trying to gain some benefit or harm someone. The requirement of information

security has undergone two major changes in last two decades. In earlier days cabinets with a combination

lock for storing sensitive documents were used.


With introduction of computer, the need for automated tools for protecting files and other information

became evident. This is very important in case of shared systems as well as for data network or internet. The

generic term for the collection of the tools designed to protect data and thwart hackers is Computer Security.

Development of Cryptography

The history of cryptography begins thousands of years ago. Until recent decades, it has been the story

of what might be called classic cryptography that is, of methods of encryption that use pen and paper, or

perhaps simple mechanical aids. In the early 20th century, the invention of complex mechanical and

electromechanical machines, such as the Enigma rotor machine, provided more sophisticated and efficient

means of encryption; and the subsequent introduction of electronics and computing has allowed elaborate

schemes of still greater complexity, most of which are entirely unsuited to pen and paper.

The development of cryptography has been paralleled by the development of cryptanalysis the

"breaking" of codes and ciphers. The discovery and application, early on, of frequency analysis to the reading

of encrypted communications has on occasion altered the course of history. Thus the Zimmermann

Telegram triggered the United States' entry into World War I; and Allied reading of Nazi Germany's ciphers

shortened World War II, in some evaluations by as much as two years.

Until the 1970s, secure cryptography was largely the preserve of governments. Two events have since

brought it squarely into the public domain: the creation of a public encryption standard (DES), and the

invention of public-key cryptography.

Need of Cryptography

The main use of cryptography is mentioned below:

1) Private or confidentiality

2) Data integrity

3) Authentication

4) Non-repudiation

1. Confidentiality is a service used to keep the content of information from all but those authorized to

posses it. Secrecy is a term synonymous with confidentiality and privacy. There are numerous

approaches to providing confidentiality, ranging from physical protection to mathematical algorithms

which render data unintelligible.


2. Data integrity is a service which addresses the unauthorized alteration of data. To assure data

integrity, one must have the ability to detect data manipulation by unauthorized parties. Data

manipulation includes such things as insertion, deletion, and substitution.

3. Authentication is a service related to identification. This function applies to both entities and

information itself. Two parties entering into a communication should identify each other. Information

delivered over a channel should be authenticated as to origin, date of origin, data content, time sent,

etc. For these reasons this aspect of cryptography is usually subdivided into two major classes: Entity

authentication and data origin authentication. Data origin authentication implicitly provides data

integrity (for if a message is modified, the source has changed).

4. Non-repudiation is a service which prevents an entity from denying previous commitments or actions.

When disputes arise due to an entity denying that certain actions were taken, a means to resolve the

situation is necessary. For example, one entity may authorize the purchase of property by another

entity and later deny such authorization was granted. A procedure involving a trusted third party is

needed to resolve the dispute.

A fundamental goal of cryptography is to adequately address these four areas in both theory and

practice. Cryptography is about the prevention and detection of cheating and other malicious activities and

to secure what you have as sensitive information.

2.7 Basics of Cryptography

2.7.1 Encryption

In cryptography, encryption is the process of encoding messages or information in such a way that

only authorized parties can read it. Encryption does not of itself prevent interception, but denies the message

content to the interceptor. In an encryption scheme, the message or information, referred to as plaintext, is

encrypted using an encryption algorithm, generating ciphertext that can only be read if decrypted.[2] For

technical reasons, an encryption scheme usually uses a pseudo-random encryption key generated by an

algorithm. It is in principle possible to decrypt the message without possessing the key, but, for a well-

designed encryption scheme, large computational resources and skill are required. An authorized recipient can

easily decrypt the message with the key provided by the originator to recipients, but not to unauthorized

interceptors.


Block diagram to converts plain text into cipher text.

Fig.2.1 converts plain text into cipher text

2.7.2 Decryption

Decryption is the process of transforming data that has been rendered unreadable through encryption

back to its unencrypted form. In decryption, the system extracts and converts the garbled data and transforms

it to texts and images that are easily understandable not only by the reader but also by the system. Decryption

may be accomplished manually or automatically. It may also be performed with a set of keys or passwords.

One of the foremost reasons for implementing an encryption-decryption system is privacy. As

information travels over the World Wide Web, it becomes subject to scrutiny and access from unauthorized

individuals or organizations. As a result, data is encrypted to reduce data loss and theft. Some of the common

items that are encrypted include email messages, text files, images, user data and directories. The person in

charge of decryption receives a prompt or window in which a password may be entered to access encrypted

information.

Block diagram to convert cipher text into plain text.

Fig 2.2 convert cipher text into plain text

What Is Cryptography?

Cryptography is the science of using mathematics to encrypt and decrypt data. Cryptography enables

you to store sensitive information or transmit across insecure networks (like the Internet) so that it cannot be

read by anyone except the intended recipient.


While cryptography is the science of securing data, cryptanalysis is the science of analyzing and

breaking secure communication. Classical cryptanalysis involves an interesting combination of analytical

reasoning, application of mathematical tools, pattern finding, patience, determination, and luck. Cryptanalysts

are also called attackers. Cryptology embraces both cryptography and cryptanalysis.

A related discipline is steganography, which is the science of hiding messages rather than making

them unreadable. Steganography is not cryptography; it is a form of coding. It relies on the secrecy of the

mechanism used to hide the message. If, for example, you encode a secret message by putting each letter as

the first letter of the first word of every sentence, its secret until someone knows to look for it, and then it

provides no security at all.

How Does Cryptography Work?

A cryptographic algorithm, or cipher, is a mathematical function used in the encryption and

decryption process. A cryptographic algorithm works in combination with a keya word, number, or

phraseto encrypt the plaintext. The same plaintext encrypts to different cipher text with different keys. The

security of encrypted data is entirely dependent on two things: the strength of the cryptographic algorithm and

the secrecy of the key. A cryptographic algorithm, plus all possible keys and all the protocols that make it

work, comprise a cryptosystem. PGP is a cryptosystem.

2.8 Types of Cryptography

There are two main types of cryptography:

Secret key cryptography

Public key cryptography

In cryptographic systems, the term key refers to a numerical value used by an algorithm to alter

information, making that information secure and visible only to individuals who have the corresponding key

to recover the information.

2.8.1 Secret Key Cryptography

Secret key cryptography is also known as symmetric key cryptography. With this type of cryptography,

both the sender and the receiver know the same secret code, called the key. Messages are encrypted by the

sender using the key and decrypted by the receiver using the same key.


This method works well if you are communicating with only a limited number of people, but it

becomes impractical to exchange secret keys with large numbers of people. In addition, there is also the

problem of how you communicate the secret key securely.

Block diagram of secret key cryptography.

Plaintext Encryption Ciphertext Decryption Plaintext

Fig2.3 secret key cryptography

Types of Secret Key Cryptography:-

Stream Ciphers:

Stream ciphers operate on a single bit (byte or computer word) at a time, and implement some form of

feedback mechanism so that the key is constantly changing.

Block Ciphers:

The scheme encrypts one block of data at a time using the same key on each block.

Stream Ciphers:

Stream ciphers come in several flavors but two are worth mentioning here :

Self-synchronizing stream ciphers calculate each bit in the keystream as a function of the previous n

bits in the keystream.


Synchronous stream ciphers generate the keystream in a fashion independent of the message stream

but by using the same keystream generation function at sender and receiver.

Block Ciphers

Block ciphers can operate in one of several modes; the following four are the most important:

Electronic Codebook (ECB) mode :

Cipher Block Chaining (CBC) mode :

Cipher Feedback (CFB) mode :

Output Feedback (OFB) mode

Symmetric Key Cryptographic Algorithms:

The symmetric key cryptographic algorithms are as follow:-

i. DES

ii. TRIPLE-DES

iii. BLOWFISH

iv. IDEA

v. RC4

vi. RC5

vii. TwoFish

2.8.2 Public Key Cryptography

Public key cryptography, also called asymmetric encryption, uses a pair of keys for encryption and

decryption. With public key cryptography, keys work in pairs of matched public and private keys.

The public key can be freely distributed without compromising the private key, which must be kept secret by

its owner. Because these keys work only as a pair, encryption initiated with the public key can be decrypted

only with the corresponding private key. The following example illustrates how public key cryptography

works:

Ann wants to communicate secretly with Bill. Ann encrypts her message using Bills public key

(which Bill made available to everyone) and Ann sends the scrambled message to Bill.


When Bill receives the message, he uses his private key to unscramble the message so that he can read

it.

When Bill sends a reply to Ann, he scrambles the message using Anns public key.

When Ann receives Bills reply, she uses her private key to unscramble his message.

The major advantage asymmetric encryption offers over symmetric key cryptography is that senders and

receivers do not have to communicate keys up possible using the public keys.

Block diagram of public-key cryptography.

Public-key private-key

Plaintext Encryption Ciphertext Decryption Plaintext

Fig 2.4 public-key cryptography

Public Key Cryptographic Algorithms:

The Asymmetric (public key) key cryptographic algorithms are as follow:-

a. RSA

b. Diffie-Hellman

c. Elliptic curve


How PGP Works?

PGP then creates a session key, which is a one-time-only secret key. This key is a random number

generated from the random movements of your mouse and the keystrokes you type. The session key works

with a very secure, fast conventional encryption algorithm to encrypt the plaintext; the result is ciphertext.

Once the data is encrypted, the session key is then encrypted to the recipients public key. This public key-

encrypted session key is transmitted along with the ciphertext to the recipient.

Block diagram of Encryption.

Plaintext is encrypted with

Session key

Ciphertext + encrypted session key

Fig 2.5 Encryption

Decryption works in the reverse. The recipients copy of PGP uses his or her private key to recover the

session key, which PGP then uses to decrypt the conventionally encrypted ciphertext.


Block Diagram of decryption process.

Encrypted Message Encrypted Session Recipients Private Key Used

Key To Decrypt Session Key

Ciphertext Session Key Used Original

To Decrypt Ciphertext Plaintext

Fig 2.6 Decryption

The combination of the two encryption methods combines the convenience of public-key encryption

with the speed of conventional encryption. Conventional encryption is about 10,000 times faster than public-

key encryption. Public-key encryption in turn provides a solution to key distribution and data transmission

issues. Used together, performance and key distribution are improved without any sacrifice in security.

Keys:

1. A key is a value that works with a cryptographic algorithm to produce a specific ciphertext. Keys are

basically really, really, really big numbers. Key size is measured in bits; the number representing a

2048-bit key is darn huge. In public-key cryptography, the bigger the key, the more secure the

ciphertext.

2. However, public key size and conventional cryptographys secret key size are totally unrelated. A

conventional 80-bit key has the equivalent strength of a 1024-bit public key. A conventional 128-bit

key is equivalent to a 3000-bit public key. Again, the bigger the key, the more secure, but the

algorithms used for each type of cryptography are very different and thus comparison is like that of

apples to oranges.


Digital Signatures:

1. A major benefit of public key cryptography is that it provides a method for employing digital

signatures. Digital signatures let the recipient of information verify the authenticity of the

informations origin, and also verify that the information was not altered while in transit. Thus, public

key digital signatures provide authentication and data integrity. These features are every bit as

fundamental to cryptography as privacy, if not more.

2. A digital signature serves the same purpose as a seal on a document, or a handwritten signature.

However, because of the way it is created, it is superior to a seal or signature in an important way. A

digital signature not only attests to the identity of the signer, but it also shows that the contents of the

information signed have not been modified. A physical seal or handwritten signature cannot do that.

However, like a physical seal that can be created by anyone with possession of the signet, a digital

signature can be created by anyone with the private key of that signing keypair.

3. Some people tend to use signatures more than they use encryption. For example, you may not care if

anyone knows that you just deposited $1,000 in your account, but you do want to be darn sure it was

the bank teller you were dealing with.

4. The basic manner in which digital signatures are created is shown in the following figure. The

signature algorithm uses your private key to create the signature and the public key to verify it. If the

information can be decrypted with your public key, then it must have originated with you.

Block diagram of Private key and public key

Private Key Public Key

Original Text Signing Signed Text Verifying Verified Text

Fig 2.7 Private Key and Public Key


The Advantages of Public-Key Cryptography Compared with Secret-Key Cryptography is as follow:-

i. The primary advantage of public-key cryptography is increased security and convenience: private keys

never need to transmitted or revealed to anyone. In a secret-key system, by contrast, the secret keys

must be transmitted (either manually or through a communication channel), and there may be a chance

that an enemy can discover the secret keys during their transmission.

ii. Another major advantage of public-key systems is that they can provide a method for digital

signatures. Authentication via secret-key systems requires the sharing of some secret and sometimes

requires trust of a third party as well. As a result, a sender can repudiate a previously authenticated

message by claiming that the shared secret was somehow compromised by one of the parties sharing

the secret. For example, the Kerberos secret-key authentication system involves a central database that

keeps copies of the secret keys of all users; an attack on the database would allow widespread forgery.

Public-key authentication, on the other hand, prevents this type of repudiation; each user has sole

responsibility for protecting his or her private key. This property of public-key authentication is often

called non-repudiation.

The disadvantages of Public-Key Cryptography Compared with Secret-Key Cryptography are as follow:-

i. A disadvantage of using public-key cryptography for encryption is speed: there are popular secret-key

encryption methods that are significantly faster than any currently available public-key encryption

method. Nevertheless, public-key cryptography can be used with secret-key cryptography to get the

best of both worlds. For encryption, the best solution is to combine public- and secret-key systems in

order to get both the security advantages of public-key systems and the speed advantages of secret-key

systems. The public-key system can be used to encrypt a secret key which is used to encrypt the bulk

of a file or message. Such a protocol is called a digital envelope.

ii. Public-key cryptography may be vulnerable to impersonation, however, even if users' private keys

are not available. A successful attack on a certification authority will allow an adversary to

impersonate whomever the adversary chooses to by using a public-key certificate from the

compromised authority to bind a key of the adversary's choice to the name of another user.

iii. In some situations, public-key cryptography is not necessary and secret-key cryptography alone is

sufficient. This includes environments where secure secret-key agreement can take place, for example

by users meeting in private. It also includes environments where a single authority knows and

manages all the keys, e.g., a closed banking system. Since the authority knows everyone's keys

already, there is not much advantage for some to be "public" and others "private." Also, public-key

cryptography is usually not necessary in a single-user environment. For example, if you want to keep


your personal files encrypted, you can do so with any secret-key encryption algorithm using, say, your

personal password as the secret key. In general, public-key cryptography is best suited for an open

multi-user environment.

iv. Public-key cryptography is not meant to replace secret-key cryptography, but rather to supplement it,

to make it more secure. The first use of public-key techniques was for secure key exchange in an

otherwise secret-key system; this is still one of its primary functions. Secret-key cryptography remains

extremely important and is the subject of much ongoing study and research. Some secret-key

cryptosystems are discussed in the sections on block ciphers and stream ciphers.

Why Three Encryption Techniques?

The three encryption techniques are used for following reasons:

a. Hash functions : for data integrity

b. Secret-key cryptography: ideally suited to encrypting message

c. public-key cryptography : for Key exchange

Examples:

1. The ABC Company maintains payroll information for a variety of organizations. This payroll

information is frequently transmitted over the Internet from participating companies. For security

reasons, the ABC Company conducts all of its Internet transactions using public key cryptography.

The company owns both a public and a private encryption key.

The public key is made available to all participating organizations and in fact is openly

available to anyone who wants to download it from the ABC website. The private key is kept secure in

a bank vault at ABC headquarters. When the XYZ Company wants to transmit its payroll data to the

ABC company, it first encrypts the data using the ABC companys public key. Once its encrypted, the

scrambled payroll data is transmitted securely over the Internet to the ABC companys processing

department.

If the information is intercepted along the way, all the interceptors will see is scrambled

information. Even if they have the public key, which is very possible, they will not be able to

unscramble the information. Only the private key can do that. Once the information is received by

ABC, the private key is used to unscramble the information, allowing the processing department to

process the payroll.

2. Using symmetric cryptography the ABC Company would have to deliver, through some secure means

(such as a courier), a copy of its one and only private key. Since the same key is used to both encrypt

and decrypt the information, both sender and receiver must have a copy.


So if XYZ is a new client for ABC, ABC must send XYZ a copy of the secret key so that XYZ

can then encrypt its payroll information and transmit it to ABC. ABC, using the same key, decrypts

XYZs information and processes the payroll data. Since a system is only as strong as its weakest link,

key security during transmission becomes as important for XYZ as encrypting the data.

3. As mentioned earlier, public key cryptography lends itself to a new technology called digital

signatures. Digital signatures involve a reversing of the normal public/private encryption/decryption

process. Here is an example that demonstrates its use. Suppose Mary wants to send the ABC company

a request for a special document. Before the ABC company can send that document, they must be

assured that the requestor is actually Mary.

A digital signature can verify Marys validity to ABC in the following way. Mary first encrypts

her name using her private key. She then encrypts the request along with the encrypted name using the

ABC companys well-known public key. When the ABC company receives the message, it decrypts

the request using its private key and then decrypts the signature using Marys well-publicized public

key. If the name decrypts successfully, then it must be Marys signature since she is the only one who

could have encrypted it with her secret private key. The request can be safely processed.

4. Digital signatures are gaining popularity in many Internet transactions involving signature verification

such as contracts and other legal negotiations as well as court documents. Recent enhancements to

digital signatures include digital time stamps. Digital timestamps apply a when criteria to a digital

signature by attaching a widely publicized summary number to the signature.

That summary number is only produced at some given point in time, essentially linking that

signature to a certain date/time. Its an especially effective technology since it doesnt rely on the

security of keys

5. As mentioned earlier that for large documents, use of public key cryptography is prohibitive because

transmission speeds are so slow. By using something called a digital envelope, the best of both

symmetric (transmission speed) and public key (security) cryptography can be used. Here is an

example of how a digital envelope works. Mary wants to send a very large document to her main

office overseas. Because of its sensitivity, Mary believes it should be sent using public key

cryptography but knows she cant because its too large. She decides to use a digital envelope.

6. Mary first creates a special session key and uses this key to symmetrically encrypt her document. That

is, she uses a symmetric cryptographic algorithm. She then encrypts the session key with her

organizations public key. So now the document is encrypted using symmetric cryptography and the

key that encrypted it is encrypted using public key cryptography. The encrypted key is called the

digital envelope. She then transmits both the key and the document to the main office.


CHAPTER 3

HARDWARE DESCRIPTION

3.1 Advanced encryption Standards

The Advanced Encryption Standard (AES), also referenced as Rijndael (its original name), is a

specification for the encryption of electronic data established by the U.S. National Institute of Standards and

Technology (NIST) in 2001.

AES is based on the Rijndael cipher developed by two Belgian cryptographers, Joan

Daemen and Vincent Rijmen, who submitted a proposal to NIST during the AES selection process. Rijndael

is a family of ciphers with different key and block sizes.

For AES, NIST selected three members of the Rijndael family, each with a block size of 128 bits, but

three different key lengths: 128, 192 and 256 bits.

AES has been adopted by the U.S. government and is now used worldwide. It supersedes the Data

Encryption Standard (DES), which was published in 1977. The algorithm described by AES is a symmetric-

key algorithm, meaning the same key is used for both encrypting and decrypting the data.

In the United States, AES was announced by the NIST as U.S. FIPS PUB 197 (FIPS 197) on November

26, 2001. This announcement followed a five-year standardization process in which fifteen competing designs

were presented and evaluated, before the Rijndael cipher was selected as the most suitable (seeAdvanced

Encryption Standard process for more details).

AES became effective as a federal government standard on May 26, 2002 after approval by

the Secretary of Commerce. AES is included in the ISO/IEC 18033-3 standard. AES is available in many

different encryption packages, and is the first publicly accessible and open cipher approved by the National

Security Agency (NSA) for top secret information when used in an NSA approved cryptographic module

The name Rijndael (Dutch pronunciation: [rindal]) is a play on the names of the two inventors (Joan

Daemen and Vincent Rijmen). It is also a combination of the Dutch name for the Rhine river and a Dale.


Block diagram of AES.

Fig 3.1 Block diagram of AES

i. AES is a block cipher with a block length of 128 bits.

ii. AES allows for three different key lengths: 128, 192, or 256 bits. Most of our discussion will assume

that the key length is 128 bits. [With regard to using a key length other than 128 bits, the main thing

that changes in AES is how you generate the key schedule from the key an issue I address at the

end . The notion of key schedule in AES is explained].

Block diagram of Advanced Encryption Standards.


iii. Encryption consists of 10 rounds of processing for 128-bit keys, 12 rounds for 192-bit keys, and 14

rounds for 256-bit keys.

iv. Except for the last round in each case, all other rounds are identical.

v. Each round of processing includes one single-byte based substitution step, a row-wise permutation

step, a column-wise mixing step, and the addition of the round key. The order in which these four

steps are executed is different for encryption and decryption.

vi. To appreciate the processing steps used in a single round, it is best to think of a 128-bit block as

consisting of a 4 4 matrix of bytes, arranged as follows.

vii. Therefore, the first four bytes of a 128-bit input block occupy the first column in the 4 4 matrix of

bytes. The next four bytes occupy the second column, and so on.

viii. The 4 4 matrix of bytes is referred to as the state array.

ix. AES also has the notion of a word. A word consists of four bytes that is 32 bits. Therefore, each

column of the state array is a word, as is each row.

x. Each round of processing works on the input state array and produces an output state array.

xi. The output state array produced by the last round is rearranged into a 128-bit output block.

xii. Unlike DES, the decryption algorithm differs substantially from the encryption algorithm. Although,

overall, the same steps are used in encryption and decryption, the order in which the steps are carried

out is different, as mentioned previously.

xiii. AES, notified by NIST as a standard in 2001, is a slight variation of the Rijndael cipher invented by

two Belgian cryptographers Joan Daemen and Vincent Rijmen.

xiv. Whereas AES requires the block size to be 128 bits, the original Rijndael cipher works with any block

size (and any key size) that is a multiple of 32 as long as it exceeds 128. The state array for the

different block sizes still has only four rows in the Rijndael cipher. However, the number of columns


depends on size of the block. For example, when the block size is 192, the Rijndael cipher requires a

state array to consist of 4 rows and 6 columns.

xv. As explained in Lecture 3, DES was based on the Feistel network. On the other hand, what AES uses

is a substitution permutation network in a more general sense. Each round of processing in AES

involves byte-level substitutions followed by word-level permutations. Speaking generally, DES also

involves substitutions and permutations, except that the permutations are based on the Feistel notion of

dividing the input block into two halves, processing each half separately, and then swapping the two

halves.

xvi. The nature of substitutions and permutations in AES allows for a fast software implementation of the

algorithm.

The Encryption Key and Its Expansion

i. a 128-bit key, the key is also arranged in the form of a matrix of 4 4 bytes. As with the input Assuming

block, the first word from the key fills the first column of the matrix, and so on.

ii. The four column words of the key matrix are expanded into a schedule of 44 words. (As to how exactly

this is done, we will explain that later in Section 8.8.) Each round consumes four words from the key

schedule.

iii. The figure below depicts the arrangement of the encryption key in the form of 4-byte words and the

expansion of the key into a key schedule consisting of 44 4-byte words.

Block diagram shows the four words of the original 128-bit key being expanded into a key schedule

consisting of 44 words

Fig 3.2 The four words of the original 128-bit key being expanded into a key schedule consisting of 44

words


The Overall Structure Of AES

i. The overall structure of AES encryption/decryption is shown in Figure 3.2

ii. The number of rounds shown in Figure 2, 10, is for the case when the encryption key is 128 bit long.

iii. Before any round-based processing for encryption can begin, the input state array is XORed with the

rst four words of the key schedule. The same thing happens during decryption except that now we

XOR the ciphertext state array with the last four words of the key schedule.

iv. For encryption, each round consists of the following four steps: 1) Substitute bytes, 2) Shift rows, 3)

Mix columns, and 4) Add round key. The last step consists of XORing the output of the previous three

steps with four words from the key schedule.

v. For decryption, each round consists of the following four steps: 1) Inverse shift rows, 2) Inverse

substitute bytes, 3) Add round key, and 4) Inverse mix columns. The third step consists of XORing the

output of the previous two steps with four words from the key schedule. Note the dierences between

the order in which substitution and shifting operations are carried out in a decryption round vis-a-vis

the order in which similar operations are carried out in an encryption round.

vi. The last round for encryption does not involve the Mix columns step. The last round for decryption

does not involve the Inverse mix columns step.

Block diagram of overall structure of AES

Fig 3.3 The overall structure of AES for the case of 128-bit encryption key


3.2 Over flow of AES Algorithm

High-level description of the algorithm

1. KeyExpansionsround keys are derived from the cipher key using Rijndael's key schedule. AES

requires a separate 128-bit round key block for each round plus one more.

2. InitialRound

1. AddRoundKeyeach byte of the state is combined with a block of the round key using bitwise

xor.

3. Rounds

1. SubBytesa non-linear substitution step where each byte is replaced with another according to

a lookup table.

2. ShiftRowsa transposition step where the last three rows of the state are shifted cyclically a

certain number of steps.

3. MixColumnsa mixing operation which operates on the columns of the state, combining the

four bytes in each column.

4. AddRoundKey

4. Final Round (no MixColumns)

1. SubBytes

2. ShiftRows

3. AddRoundKey.


3.3 Individual blocks

The Four Steps In Each Round Of Processing

The dierent steps that are carried out in each round except the last one.

Fig 3.4 One round of encryption is shown at left and one round of decryption at right

3.3.1 The Sub byte step

In the SubBytes step, each byte in the state matrix is replaced with a SubByte using an

8-bit substitution box, the Rijndael S-box. This operation provides the non-linearity in the cipher. The S-box

used is derived from the multiplicative inverse over GF(28), known to have good non-linearity properties. To

avoid attacks based on simple algebraic properties, the S-box is constructed by combining the inverse function

with an invertible affine transformation. The S-box is also chosen to avoid any fixed points (and so is

a derangement), i.e.,

,

And also any opposite fixed points, i.e.

While performing the decryption, Inverse SubBytes step is used, which requires first taking the affine

transformation and then finding the multiplicative inverse? In the SubBytes step, each byte in the state is

replaced with its entry in a fixed 8-bit lookup table, S; bij = S(aij)


Block diagram of SubByte step

Fig 3.5 Sub-byte step

3.3.2 The ShiftRows step

The ShiftRows step operates on the rows of the state; it cyclically shifts the bytes in each row by a

certain offset. For AES, the first row is left unchanged. Each byte of the second row is shifted one to the left.

Similarly, the third and fourth rows are shifted by offsets of two and three respectively. For blocks of sizes

128 bits and 192 bits, the shifting pattern is the same. Row n is shifted left circular by n-1 bytes. In this way,

each column of the output state of the ShiftRows step is composed of bytes from each column of the input

state. (Rijndael variants with a larger block size have slightly different offsets). For a 256-bit block, the first

row is unchanged and the shifting for the second, third and fourth row is 1 byte, 3 bytes and 4 bytes

respectivelythis change only applies for the Rijndael cipher when used with a 256-bit block, as AES does

not use 256-bit blocks. The importance of this step is to avoid the columns being linearly independent, in

which case, AES degenerates into four independent block ciphers.

In the ShiftRows step, bytes in each row of the state are shifted cyclically to the left. The number of

places each byte is shifted differs for each row.

Fig3.6 Shift-row step


3.3.3 The Mixcolumns Step

In the MixColumns step, the four bytes of each column of the state are combined using an

invertible linear transformation. The MixColumns function takes four bytes as input and outputs four bytes,

where each input byte affects all four output bytes. Together

with ShiftRows, MixColumns provides diffusion in the cipher.

During this operation, each column is multiplied by a fixed matrix:

Matrix multiplication is composed of multiplication and addition of the entries, and here the

multiplication operation can be defined as this: multiplication by 1 means no change, multiplication by 2

means shifting to the left, and multiplication by 3 means shifting to the left and then performing XOR with the

initial unshifted value. After shifting, a conditional XOR with 0x1B should be performed if the shifted value

is larger than 0xFF. (These are special cases of the usual multiplication in GF(28).) Addition is simply XOR.

In more general sense, each column is treated as a polynomial over GF(28) and is then multiplied

modulo x4+1 with a fixed polynomial c(x) = 0x03 x3 + x2 + x + 0x02. The coefficients are displayed in

their hexadecimal equivalent of the binary representation of bit polynomials from GF(2)[x].

The MixColumns step can also be viewed as a multiplication by the shown particular MDS matrix in the finite

field GF(28). This process is described further in the article Rijndael mix columns.

In the mixcolumns step, each column of the state is multiplied with a fixed polynomial c(x).

Fig 3.7. Mix-Column step


3.3.4 The Addroundkey Step

In the AddRoundKey step, the subkey is combined with the state. For each round, a subkey is derived

from the main key using Rijndael's key schedule; each subkey is the same size as the state. The subkey is

added by combining each byte of the state with the corresponding byte of the subkey using bitwise XOR.

In the AddRoundKey step, each byte of the state is combined with a byte of the round subkey using

the XOR operation ().

Fig 3.8 Addroundkey step

Optimization of the cipher

On systems with 32-bit or larger words, it is possible to speed up execution of this cipher by

combining the SubBytes and ShiftRows steps with theMixColumns step by transforming them into a sequence

of table lookups. This requires four 256-entry 32-bit tables, and utilizes a total of four kilobytes (4096 bytes)

of memory one kilobyte for each table. A round can then be done with 16 table lookups and 12 32-bit

exclusive-or operations, followed by four 32-bit exclusive-or operations in the AddRoundKey step.[11]

If the resulting four-kilobyte table size is too large for a given target platform, the table lookup

operation can be performed with a single 256-entry 32-bit (i.e. 1 kilobyte) table by the use of circular rotates.

Using a byte-oriented approach, it is possible to combine the SubBytes, ShiftRows,

and MixColumns steps into a single round operation.


3.3.5 The Key Expansion Algorithm

i. Each round has its own round key that is derived from the original 128-bit encryption key in the

manner described in this section. One of the four steps of each round, for both encryption and

decryption, involves XORing of the round key with the state array.

ii. The AES Key Expansion algorithm is used to derive the 128bit round key for each round from the

original 128-bit encryption key. As youll see, the logic of the key expansion algorithm is desiged to

ensure that if you change one bit of the encryption key, it should aect the round keys for several

rounds.

iii. In the same manner as the 128-bit input block is arranged in the form of a state array, the algorithm

rst arranges the 16 bytes of the encryption key in the form of a 44 array of bytes.

iv. The rst four bytes of the encryption key constitute the word w0, the next four bytes the word w1, and

so on.

v. The algorithm subsequently expands the words [w0,w1,w2,w3] into a 44-word key schedule that can

be labeled w0, w1, w2, w3,................., w43

vi. Of these, the words [w0,w1,w2,w3] are bitwise XORed with the input block before the round-based

processing begins.

vii. The remaining 40 words of the key schedule are used four words at a time in each of the 10 rounds.

viii. The above two statements are also true for decryption, except for the fact that we now reverse the

order of the words in the key schedule. The last four words of the key schedule are bitwise XORed

with the 128-bit ciphertext block before any round-based processing begins. Subsequently, each of the

four words in the remaining 40 words of the key schedule are used in each of the ten rounds of

processing.


ix. Now comes the dicult part: How does the Key Expansion Algorithm expand four words

w0,w1,w2,w3 into the 44 words w0,w1,w2,w3,w4,w5,........,w43

x. The key expansion algorithm will be explained in the next subsection with the help of Figure 3.8. As

shown in the gure, the key expansion takes place on a four-word to four-word basis, in the sense that

each grouping of four words decides what the next grouping of four words will be.

The block diagram of key expansion algorithm

Fig 3.9 The key expansion takes place on a four-word to four-word basis.

3.4 Construction of the 16 16 Arrays

a) The subBytes Step

1. We rst ll each cell of the 16 16 table with the byte obtained by joining together its row

index and the column index. [The row index of this table runs from hex 0 through hex F.

Likewise, the column index runs from hex 0 through hex F.]

2. For example, for the cell located at row index 2 and column indexed 7, we place hex 0x27 in

the cell. So at this point the table will look like.


3. We next replace the value in each cell by its multiplicative inverse in GF(28) based on the

irreducible polynomial x8+x4+x3+x+1

4. The hex value 0x00 is replaced by itself since this element has no multiplicative inverse.

5. After the above step, lets represent a byte stored in a cell of the table by b7b6b5b4b3b2b1b0

where b7 is the MSB and b0 the LSB. For example, the byte stored in the cell (9, 5) of the

above table is the multiplicative inverse (MI) of 0x95, which is 0x8A. Therefore, at this point,

the bit pattern stored in the cell with row index 9 and column index 5 is 10001010, implying

that b7 is 1 and b0 is 0. [Verify the fact that the MI of 0x95 is indeed 0x8A. The polynomial

representation of 0x95 (bit pattern: 10010101) is x7 +x4 +x2 +1, and the same for 0x8A (bit

pattern: 10001010) is x7 + x3 + x. Now show that the product of these two polynomials

modulo the polynomial x8 + x4 + x3 + x + 1 is indeed 1.]For bit scrambling, we next apply the

following transformation to each bit bi of the byte stored in a cell of the lookup table:

b i = bib(i+4) mod 8b(i+5) mod 8b(i+6) mod 8b(i+7) mod 8ci

where ci is the ith bit of a specially designated byte c whose hex value is 0x63.

( c7c6c5c4c3c2c1c0 01100011 )

6. The above bit-scrambling step is better visualized as the following vector-matrix operation.

Note that all of the additions in the product of the matrix and the vector are actually XOR

operations. [Because of the [A]~x +~ b appearance of this transformation, it is commonly

referred to as the ane transformation.

7. The very important role played by the c byte of value 0x63: Consider the following two

conditions on the SubBytes step: (1) In order for the byte substitution step to be invertible, the

byte-to-byte mapping given to us by the 16 16 table must be one-one.


That is, for each input byte, there must be a unique output byte. And, to each output

byte there must correspond only one input byte. (2) No input byte should map to itself, since a

byte mapping to itself would weaken the cipher.

8. Taking multiplicative inverses in the construction of the table does give us unique entries in the

table for each input byte except for the input byte 0x00 since there is no MI dened for the all-

zeros byte. What is interesting is that if it were not for the c byte, the bit scrambling step would

also leave the input byte 0x00 unchanged.With the ane mapping shown above, the 0x00

input byte is mapped to 0x63. At the same time, it preserves the one-one mapping for all other

bytes.

9. In addition to ensuring that every input byte is mapped to a dierent and unique output byte,

the bit-scrambling step also breaks the correlation between the bits before the substitution and

the bits after the substitution.

10. The 16 16 table created in this manner is called the S-Box. The S-Box is the same for all the

bytes in the state array.

11. The steps that go into constructing the 16 16 lookup table are reversed for the decryption

table, meaning that you rst apply the reverse of the bit-scrambling operation to each byte, as

explained in the next step, and then you take its multiplicative inverse in GF(28).

12. For bit scrambling for decryption, you carry out the following bit-level transformation in each

cell of the table:

where di is the ith bit of a specially designated byte d whose hex value is 0x05.

( d7d6d5d4d3d2d1ddc0 = 00000101 ) Finally, you replace the byte in the cell by its

multiplicative inverse in GF(28).

13. The bytes c and d are chosen so that the S-box has no xed points. That is, we do not want S

box(a) = a for any a. Neither do we want S box(a) = a ,where a is the bit wise complement of

a.


b) The Shift Rows Step

This is where the matrix representation of the state array becomes important.

i. The ShiftRows transformation consists of (i) not shifting the rst row of the state array at

all; (ii) circularly shifting the second row by one byte to the left; (iii) circularly shifting the

third row by two bytes to the left; and (iv) circularly shifting the last row by three bytes to

the left.

ii. This operation on the state array can be represented by

iii. Recall again that the input block is written column-wise. That is the first four bytes of the

input block fill the first column of 22 Computer and Network Security by Avi Kak Lecture

8 the state array, the next four bytes the second column, etc. As a result, shifting the rows

in the manner indicated scrambles up the byte order of the input block.

iv. For decryption, the corresponding step shifts the rows in exactly the opposite fashion. The

rst row is left unchanged, the second row is shifted to the right by one byte, the third row

to the right by two bytes, and the last row to the right by three bytes, all shifts being

circular.


c) The Mix Columns Step

This step replaces each byte of a column by a function of all the bytes in the same column.

i. More precisely, each byte in a column is replaced by two times that byte, plus three times

the the next byte, plus the byte that comes next, plus the byte that follows. [As you know

from Lecture 7, additions in GF(28) mean the same thing as XOR. So plus implies

XOR.] The words next and follow refer to bytes in the same column, and their meaning

is circular, in the sense that the byte that is next to the one in the last row is the one in the

rst row. [By two times and three times, we mean multiplications in GF(28) by the bit

patterns 000000010 and 00000011, respectively.]

ii. For the bytes in the rst row of the state array, this operation can be stated as

iii. For the bytes in the second row of the state array, this operation can be stated as

iv. For the bytes in the third row of the state array, this operation can be stated as

v. And, for the bytes in the fourth row of the state array, this operation can be stated as


vi. More compactly, the column operations can be shown as

where, on the left hand side, when a row of the leftmost matrix multiples a column of the state

array matrix, additions involved are meant to be XOR operations.

vii. The corresponding transformation during decryption is given by


CHAPTER 4

SOFTWARE DESCRIPTION

4.1 Introduction to Xilinx

The Xilinx ISE is a design environment for FPGA products from Xilinx, and is tightly-coupled to the

architecture of such chips, and cannot be used with FPGA products from other vendors.[2] The Xilinx ISE is

primarily used for circuit synthesis and design, while the ModelSim logic simulator is used for system-level

testing. Other components shipped with the Xilinx ISE include the Embedded Development Kit (EDK), a

Software Development Kit (SDK) and ChipScope Pro.

The main challenging areas in VLSI are performance, cost, testing, area, reliability and power. The

demand for portable computing devices and communication system are increasing rapidly. These applications

require low power dissipation for VLSI circuits [1]. The ability to design, fabricate and test Application

Specific Integrated Circuits (ASICs) as well as FPGAs with gate count of the order of a few tens of millions

has led to the development of complex embedded SOC. Hardware components in a SOC may include one or

more processors, memories and dedicated components for accelerating critical tasks and

interfaces to various peripherals. One of the approaches for SOC design is the platform based approach. For

example, the platform FPGAs such as Xilinx Virtex II Pro and Altera Excalibur include custom designed

fixed programmable processor cores together with millions of gates of reconfigurable logic devices.

In addition to this, the development of Intellectual Property (IP) cores for the FPGAs for a variety of

standard functions including processors, enables a multimillion gate FPGA to be configured to contain all the

components of a platform based FPGA. Development tools such as the Altera System-On-Programmable

Chip (SOPC) builder enable the integration of IP cores and the user designed custom blocks with the Nios II

soft-core processor. Soft-core processors are far more flexible than the hard-core processors and they can be

enhanced with custom hardware to optimize them for specific application. Power dissipation is a challenging

problem for todays System-on-Chips (SOCs) design and test.

Evolution of Computer-Aided Digital Design

Digital circuit design has evolved rapidly over the last 25 years. The earliest digital circuits were

designed with vacuum tubes and transistors. Integrated circuits were then invented where logic gates were

placed on a single chip. The first integrated circuit (IC) chips were SSI (Small Scale Integration) chips where


the gate count was very small. As technologies became sophisticated, designers were able to place circuits

with hundreds of gates on a chip. These chips were called MSI (Medium Scale Integration) chips. With the

advent of LSI (Large Scale Integration), designers could put thousands of gates on a single chip. At this point,

design processes started getting very complicated, and designers felt the need to automate these

processes. Electronic Design Automation (EDA), techniques began to evolve. Chip designers began to use

circuit and logic simulation techniques to verify the functionality of building blocks of the order of about 100

transistors. The circuits were still tested on the breadboard, and the layout was done on paper or by hand on a

graphic computer terminal.

The earlier edition of the book used the term CAD tools. Technically, the term Computer-Aided

Design (CAD) tools refers to back-end tools that perform functions related to place and route, and layout of

the chip . The term Computer-Aided Engineering (CAE) tools refers to tools that is used for front-end

processes such HDL simulation, logic synthesis, and timing analysis. Designers used the terms CAD and CAE

interchangeably. Today, the term Electronic Design Automation is used for both CAD and CAE. For the sake

of simplicity, in this book, we will refer to all design tools as EDA tools.

With the advent of VLSI (Very Large Scale Integration) technology, designers could design single chips with

more than 100,000 transistors. Because of the complexity of these circuits, it was not possible to verify these

circuits on a breadboard. Computer-aided techniques became critical for verification and design of VLSI

digital circuits. Computer programs to do automatic placement and routing of circuit layouts also became

popular. The designers were now building gate-level digital circuits manually on graphic terminals. They

would build small building blocks and then derive higher-level blocks from them. This process would

continue until they had built the top-level block. Logic simulators came into existence to verify the

functionality of these circuits before they were fabricated on chip.

As designs got larger and more complex, logic simulation assumed an important role in the design process.

Designers could iron out functional bugs in the architecture before the chip was designed further.

Emergence of HDLs

For a long time, programming languages such as FORTRAN, Pascal, and C were being used to

describe computer programs that were sequential in nature. Similarly, in the digital design field, designers felt

the need for a standard language to describe digital circuits. Thus, Hardware Description Languages (HDLs)

came into existence. HDLs allowed the designers to model the concurrency of processes found in hardware

elements. Hardware description languages such as Verilog HDL and VHDL became popular. Verilog HDL

originated in 1983 at Gateway Design Automation. Later, VHDL was developed under contract from

DARPA. Both Verilog and VHDL simulators to simulate large digital circuits quickly gained acceptance from

designers.


Even though HDLs were popular for logic verification, designers had to manually translate the HDL-

based design into a schematic circuit with interconnections between gates. The advent of logic synthesis in the

late 1980s changed the design methodology radically. Digital circuits could be described at a register transfer

level (RTL) by use of an HDL. Thus, the designer had to specify how the data flows between registers and

how the design processes the data. The details of gates and their interconnections to implement the circuit

were automatically extracted by logic synthesis tools from the RTL description.

Thus, logic synthesis pushed the HDLs into the forefront of digital design. Designers no longer had to

manually place gates to build digital circuits. They could describe complex circuits at an abstract level in

terms of functionality and data flow by designing those circuits in HDLs. Logic synthesis tools would

implement the specified functionality in terms of gates and gate interconnections.

HDLs also began to be used for system-level design. HDLs were used for simulation of system boards,

interconnect buses, FPGAs (Field Programmable Gate Arrays), and PALs (Programmable Array Logic). A

common approach is to design each IC chip, using an HDL, and then verify system functionality via

simulation.

Today, Verilog HDL is an accepted IEEE standard. In 1995, the original standard IEEE 1364-1995

was approved. IEEE 1364-2001 is the latest Verilog HDL standard that made significant improvements to the

original standard.

4.2 Typical Design Flow

A typical design flow for designing VLSI IC circuits is shown in Figure 4-1. Un-shaded blocks show

the level of design representation; shaded blocks show processes in the design flow.


Block diagram of typical design flow.

Fig 4.1. Typical Design Flow

The design flow shown in Figure 4.1 is typically used by designers who use HDLs. In any design,

specifications are written first. Specifications describe abstractly the functionality, interface, and overall

architecture of the digital circuit to be designed. At this point, the architects do not need to think about how

they will implement this circuit.

A behavioral description is then created to analyze the design in terms of functionality, performance,

compliance to standards, and other high-level issues. Behavioral descriptions are often written with HDLs.


New EDA tools have emerged to simulate behavioral descriptions of circuits. These tools combine the

powerful concepts from HDLs and object oriented languages such as C++. These tools can be used instead of

writing behavioral descriptions in Verilog HDL.

The behavioral description is manually converted to an RTL description in an HDL. The designer has

to describe the data flow that will implement the desired digital circuit. From this point onward, the design

process is done with the assistance of EDA tools.

Logic synthesis tools convert the RTL description to a gate-level netlist. A gate-level netlist is a

description of the circuit in terms of gates and connections between them. Logic synthesis tools ensure that

the gate-level netlist meets timing, area, and power specifications. The gate-level netlist is input to an

Automatic Place and Route tool, which creates a layout. The layout is verified and then fabricated on a chip.

Thus, most digital design activity is concentrated on manually optimizing the RTL description of the

circuit. After the RTL description is frozen, EDA tools are available to assist the designer in further processes.

Designing at the RTL level has shrunk the design cycle times from years to a few months. It is also possible to

do many design iterations in a short period of time.

Behavioral synthesis tools have begun to emerge recently. These tools can create RTL descriptions

from a behavioral or algorithmic description of the circuit. As these tools mature, digital circuit design will

become similar to high-level computer programming. Designers will simply implement the algorithm in an

HDL at a very abstract level. EDA tools will help the designer convert the behavioral description to a final IC

chip.

It is important to note that, although EDA tools are available to automate the processes and cut design

cycle times, the designer is still the person who controls how the tool will perform. EDA tools are also

susceptible to the "GIGO : Garbage In Garbage Out" phenomenon. If used improperly, EDA tools will lead to

inefficient designs. Thus, the designer still needs to understand the nuances of design methodologies, using

EDA tools to obtain an optimized design.

Importance of HDLs

HDLs have many advantages compared to traditional schematic-based design.

i. Designs can be described at a very abstract level by use of HDLs. Designers can write their RTL

description without choosing a specific fabrication technology. Logic synthesis tools can automatically

convert the design to any fabrication technology. If a new technology emerges, designers do not need

to redesign their circuit.

ii. They simply input the RTL description to the logic synthesis tool and create a new gate-level net list,

using the new fabrication technology. The logic synthesis tool will optimize the circuit in area and

timing for the new technology.


iii. By describing designs in HDLs, functional verification of the design can be done early in the design

cycle. Since designers work at the RTL level, they can optimize and modify the RTL description until

it meets the desired functionality. Most design bugs are eliminated at this point. This cuts down design

cycle time significantly because the probability of hitting a functional bug at a later time in the gate-

level net list or physical layout is minimized.

iv. Designing with HDLs is analogous to computer programming. A textual description with comments is

an easier way to develop and debug circuits. This also provides a concise representation of the design,

compared to gate-level schematics. Gate-level schematics are almost incomprehensible for very

complex designs.

v. HDL-based design is here to stay.[3] With rapidly increasing complexities of digital circuits and

increasingly sophisticated EDA tools, HDLs are now the dominant method for large digital designs.

No digital circuit designer can afford to ignore HDL-based design.

vi. New tools and languages focused on verification have emerged in the past few years. These languages

are better suited for functional verification. However, for logic design, HDLs continue as the preferred

choice.

Popularity of Verilog HDL

Verilog HDL has evolved as a standard hardware description language. Verilog HDL offers many useful

features

i. Verilog HDL is a general-purpose hardware description language that is easy to learn and easy to use.

It is similar in syntax to the C programming language. Designers with C programming experience will

find it easy to learn Verilog HDL.

ii. Verilog HDL allows different levels of abstraction to be mixed in the same model. Thus, a designer

can define a hardware model in terms of switches, gates, RTL, or behavioral code. Also, a designer

needs to learn only one language for stimulus and hierarchical design.

iii. Most popular logic synthesis tools support Verilog HDL. This makes it the language of choice for

designers.

iv. All fabrication vendors provide Verilog HDL libraries for postlogic synthesis simulation. Thus,

designing a chip in Verilog HDL allows the widest choice of vendors.

v. The Programming Language Interface (PLI) is a powerful feature that allows the user to write custom

C code to interact with the internal data structures of Verilog. Designers can customize a Verilog HDL

simulator to their needs with the PLI.


Trends in HDLs

The speed and complexity of digital circuits have increased rapidly. Designers have responded by

designing at higher levels of abstraction. Designers have to think only in terms of functionality. EDA tools

take care of the implementation details. With designer assistance, EDA tools have become sophisticated

enough to achieve a close-to-optimum implementation.

The most popular trend currently is to design in HDL at an RTL level, because logic synthesis tools

can create gate-level net lists from RTL level design. Behavioral synthesis allowed engineers to design

directly in terms of algorithms and the behavior of the circuit, and then use EDA tools to do the translation

and optimization in each phase of the design. However, behavioral synthesis did not gain widespread

acceptance. Today, RTL design continues to be very popular. Verilog HDL is also being constantly enhanced

to meet the needs of new verification methodologies.

Formal verification and assertion checking techniques have emerged. Formal verification applies

formal mathematical techniques to verify the correctness of Verilog HDL descriptions and to establish

equivalency between RTL and gate-level netlists. However, the need to describe a design in Verilog HDL will

not go away. Assertion checkers allow checking to be embedded in the RTL code. This is a convenient way to

do checking in the most important parts of a design.

New verification languages have also gained rapid acceptance. These languages combine the

parallelism and hardware constructs from HDLs with the object oriented nature of C++. These languages also

provide support for automatic stimulus creation, checking, and coverage. However, these languages do not

replace Verilog HDL. They simply boost the productivity of the verification process. Verilog HDL is still

needed to describe the design.

For very high-speed and timing-critical circuits like microprocessors, the gate-level net list provided

by logic synthesis tools is not optimal. In such cases, designers often mix gate-level description directly into

the RTL description to achieve optimum results. This practice is opposite to the high-level design paradigm,

yet it is frequently used for high-speed designs because designers need to squeeze the last bit of timing out of

circuits, and EDA tools sometimes prove to be insufficient to achieve the desired results.

Another technique that is used for system-level design is a mixed bottom-up methodology where the

designers use either existing Verilog HDL modules, basic building blocks, or vendor-supplied core blocks to

quickly bring up their system simulation. This is done to reduce development costs and compress design

schedules.

For example, consider a system that has a CPU, graphics chip, I/O chip, and a system bus. The CPU

designers would build the next-generation CPU themselves at an RTL level, but they would use behavioral

models for the graphics chip and the I/O chip and would buy a vendor-supplied model for the system bus.


Thus, the system-level simulation for the CPU could be up and running very quickly and long before the RTL

descriptions for the graphics chip and the I/O chip are completed.

Hierarchical Modeling Concepts

Before we discuss the details of the Verilog language, we must first understand basic hierarchical

modeling concepts in digital design. The designer must use a "good" design methodology to do efficient

Verilog HDL-based design. In this

Documents

An Area Efficient Universal Cryptography Processor For Smart Cards