105
Abstract SEA is a scalable encryption algorithm targeted for small embedded applications. It was initially designed for software implementations in controllers, smart cards, or processors. In this letter, we investigate its performances in recent field-programmable gate array (FPGA) devices. For this purpose, a loop architecture of the block cipher is presented. Beyond its low cost performances, a significant advantage of the proposed architecture is its full flexibility for any parameter of the scalable encryption algorithm, taking advantage of generic VHDL coding. The letter also carefully describes the implementation details allowing us to keep small area requirements. Finally, a comparative performance discussion of SEA with the Advanced Encryption Standard Rijndael and ICEBERG (a cipher purposed for efficient FPGA implementations) is proposed. It illustrates the interest of platform/context-oriented block cipher design and, as far as SEA is concerned, its low area requirements and reasonable efficiency. Scalable encryption algorithm (SEA) is a parametric block cipher for resource constrained systems (e.g., sensor networks, RFIDs) that has been introduced in [1]. It was initially designed as a low-cost encryption/ authentication routine (i.e., with small code size and memory) targeted for processors with a limited instruction set (i.e., AND, OR, XOR gates, word rotation, and modular addition). Additionally and contrary to most recent block ciphers (e.g., the DES [2] and AES Rijndael [3], [4]), the algorithm takes the plaintext, key, and the bus sizes as parameters and, therefore, can be straightforwardly adapted to various implementation contexts and/or security requirements. Compared to older solutions for low-cost encryption like tiny encryption algorithm (TEA) [5] or Yuval’s proposal [6], SEA also benefits from a stronger security analysis, derived from recent advances in block cipher design/cryptanalysis. 1

Scalable Encryption Algorithm

Embed Size (px)

Citation preview

Page 1: Scalable Encryption Algorithm

Abstract

SEA is a scalable encryption algorithm targeted for small embedded applications It was initially designed for software implementations in controllers smart cards or processors In this letter we investigate its performances in recent field-programmable gate array (FPGA) devices For this purpose a loop architecture of the block cipher is presented Beyond its low cost performances a significant advantage of the proposed architecture is its full flexibility for any parameter of the scalable encryption algorithm taking advantage of generic VHDL coding The letter also carefully describes the implementation details allowing us to keep small area requirements Finally a comparative performance discussion of SEA with the Advanced Encryption Standard Rijndael and ICEBERG (a cipher purposed for efficient FPGA implementations) is proposed It illustrates the interest of platformcontext-oriented block cipher design and as far as SEA is concerned its low area requirements and reasonable efficiency

Scalable encryption algorithm (SEA) is a parametric block cipher for resource constrained systems (eg sensor networks RFIDs) that has been introduced in [1] It was initially designed as a low-cost encryption authentication routine (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) Additionally and contrary to most recent block ciphers (eg the DES [2] and AES Rijndael [3] [4]) the algorithm takes the plaintext key and the bus sizes as parameters and therefore can be straightforwardly adapted to various implementation contexts andor security requirements Compared to older solutions for low-cost encryption like tiny encryption algorithm (TEA) [5] or Yuvalrsquos proposal [6] SEA also benefits from a stronger security analysis derived from recent advances in block cipher designcryptanalysis

In practice SEA has been proven to be an efficient solution for embedded software applications using microcontrollers but its hardware performances have not yet been investigated Consequently and as a first step towards hardware performance analysis this letter explores the features of a low-cost field-programmable gate array (FPGA) encryption decryption core for SEA In addition to the performance evaluation we show that the algorithmrsquos scalability can be turned into a fully generic VHDL design so that any text key and bus size can be straightforwardly reimplemented without any modification of the hardware description language with standard synthesis and implementation tools

1

CONTENTS

CHAPTER 1 Introduction to VLSI 9 11 Introduction 9 12 VLSI Design Style 10

13 VLSI Design Flow 11 14 VLSI Features 11

CHAPTER 2 Introduction to VHDL 1221 Introduction 1222 Capabilities 1323 Abstraction levels of VHDL 13 24 Basic Terminology 1425 Modeling Techniques for VHDL 1726 Process Statements 1827 Conditional Statements 19 28 Active HDL Overview 2129 Macro language 22 210 Compilation 23 211 Simulation 23 212 X Linix 24

CHAPTER 3 Introduction to SEA 26 31 Specifications 2732 Design properties 3033 Overall Structure 3134 Security Analysis 3135 Performance Analysis 35

CHAPTER 4 An Exposition Of SEA 37 41 Overview of SEA 38

CHAPTER 5 SEA Architecture 39 51 Key Generation 4052 Encryption 4253 Decryption 44

Appendix-I Simulation Results 47Appendix-II Synthesis Reports 50 Appendix- III Implementation 79

Appendix-IV Advantages 80Appendix-V Conclusion 81

Appendix-VI Bibliography 82

2

CH1 INTRODUCTION TO VLSI

The first digital circuit was designed by using electronic components like vacuum tubes and transistors Later Integrated Circuits (ICs) were invented where a designer can be able to place digital circuits on a chip consists of less than 10 gates for an IC called SSI (Small Scale Integration) scale With the advent of new fabrication techniques designer can place more than 100 gates on an IC called MSI (Medium Scale Integration) Using design at this level one can create digital sub blocks (adders multiplexes counters registers and etc) on an IC This level is LSI (Large Scale Integration) using this scale of integration people succeeded to make digital subsystems (Microprocessor IO peripheral devices and etc) on a chip

At this point design process started getting very complicated ie manually conversion from schematic level to gate level or gate level to layout level was becoming somewhat lengthy process and verifying the functionality of digital circuits at various levels became critical This created new challenges to digital designers as well as circuit designers Designers felt need to automate these processes In this process Rapid advances in Software Technology and development of new higher level programming languages taken place People could able to develop CADCAE (Computer Aided DesignComputer Aided Engineering) tools for design electronics circuits with assistance of software programs Functional verification and Logic verification of design can be done using CAD simulation tools with greater efficiency It became very easy to a designer to verify functionality of design at various levels

With advent of new technology ie CMOS (Complementary Metal Oxide Semiconductor) process technology One can fabricate a chip contains more than Million of gates At this point design process still became critical because of manual converting the design from one level to other Using latest CAD tools could solve the problem Existence of logic synthesis tools design engineer can easily translate to higher-level design description to lower levels This way of designing (using CAD tools) is certainly a revolution in electronic industry This may be leading to development of sophisticated electronic products for both consumer as well as business Designing Systems using Hardware always gives best results when compared to software (like Speed Reliability performance and etc) Using CMOS VLSI Design methodology designer could design and fabricate ICs without spending much time when compared to traditional way of designing

3

Specifications

Behavioral Description

RTL Description

Behavioral Simulation

Functional Simulation

Behavioral Synthesis

Logic Synthesis

Gate Level Net list

Constraints

Constraints

Library

AutomaticPampR

Layout

Logic simulation

Fabrication

Lay Out Management

12 TYPICAL IC DESIGN FLOW

4

Micron Technology

Micron SM DSM VDSM

13 MICRON TECHNOLOGY

The micron technology can be classified into 4 categories Evolving from micron technology and extending up to VDSM

Micron Technology The technology up to 10-6 m is the micron

Technology

Submicron Technology The technology below 1um is known as the Submicron technology It generally ranges up to 036m

DSM(Deep Sub Micron technology) The technology extending up to 018m is DSM

VDSM(Very Deep Sub Micron technology) The presently used technology is VDSM It ranges up to 009um

14 FEATURES

5

21 INTRODUCTION TO VHDL

VHDL is acronym for VHSIC hardware Description languageVHSIC is acronym for very high speed Integrated Circuits It is a hardware description language that can be used to model a digital system at many levels of abstraction ranging from the algorithmic level to the gate level

The VHDL language can be regarded as an integrated amalgamation of the following languages

Sequential language

Concurrent language

Net-list language

Timing specifications

Waveform generation language VHDL

This language not only defines the syntax but also defines very clear simulation semantics for each language construct Therefore models written in this language can be verified using a VHDL simulator This subset is usually sufficient to model most applications The complete language however has sufficient power to capture the descriptions of the most complex chips to a complete electronic system

HISTORY

The requirements for the language were first generated in 1988 under the VHSIC chips for the department of Defence (DOD) Reprocurement and reuse was also a big issue Thus a need for a standardized hardware description language for the design documentation and verification of the digital systems was generated The IEEE in the December 1987 standardized VHDL language this version of the language is known as the IEEE STD 1076-1987 The official language description appears in the IEEE standard VHDL language Reference manual available from IEEE The language has also been recognized as an American National Standards Institute (ANSI) standard

According to IEEE rules an IEEE standard has to be reballoted every 5 years so that it may remain a standard so that it may remain a standard Consequently the language was upgraded with new features the syntax of many constructs was made more uniform and many ambiguities present in the 1987 version of the language were resolved This new version of the language is known as the IEEE STD 1076-1993

6

22 CAPABILITIES

The following are the major capabilities that the language provides along with the features that the language provides along with the features that differentiate it from other hardware languages

The language can be used as exchange medium between chip vendors and CAD tool users Different chip vendors can provide VHDL descriptions of their components to system designers

The language can be used as a communication medium between different CAD and CAE tools

The language supports hierarchy that is a digital can be modeled as asset of interconnected components each component in turn can be modeled as a set of interconnected subcomponents

The language supports flexible design methodologies top-down bottom-up or mixed It supports both synchronous and asynchronous timing models

Various digital modeling techniques such as finite ndashstate machine descriptions and Boolean equations can be modeled using the language

The language is publicly available human-readable and machine-readable

The language supports three basic different styles Structural Dataflow and behavioral

It supports a wide range of abstraction levels ranging from abstract behavioral descriptions to very precise gate-level descriptions

Arbitrarily large designs can be modeled using the language and there are no limitations imposed by the language on the size of the design

23 HARDWARE ABSTRACTION

VHDL is used to describe a model for a digital hardware device This model specifies the external view of the device and one or more internal views The internal view of the device specifies functionality or structure while the external view specifies the interface of the device through which it communicates with the other modules in the environment In VHDL each device model is treated as a distinct representation of a unique device called an Entity The Entity is thus a hardware abstraction of the actual hardware device Each Entity is described using one model which contains one external view and one or more internal views

7

24 Basic terminology

VHDL is a hardware description language that can be used to model a digital system A hardware abstraction of this digital system is called an entity An entity X when used in another entity Y becomes a component for the entity YTo describe an entity VHDL provides five different types of primary constructs called design units They are

1 Entity declaration 2 Architecture body 3 Configuration declaration 4 Package declaration 5 Package body

1 An entity is modeled using an entity declaration and at least one architecture body the Entity declaration describes the external view of the entity

For example the input and output signal names2 The architecture body contains the internal description of the entity for

example as a set of interconnected components that represents the structure of the entity or a set of concurrent or sequential statements that represents the behavior of the entity

3 A configuration declaration is used to create a configuration for an entity It specifies the binding of one architecture body from the many architecture bodies that may be associated with the entity It may also specify the bindings of the architecture components used in the selected architecture body to other entities An entity may have any number of configurations

4 A package declaration encapsulates a set of related declarations such type of declaration s subtype declaration and subprogram declaration which can be shared across two or more design units

5 A package body contains the definition of subprogram declared in a package declaration

Once an entity has been modeled it needs to be validated by a VHDL system A typical VHDL system consists of an analyzer and a simulator The analyzer reads in one or more design units contained in a single file and compiles them into a design library after validating the syntax and performing some static checks

The language is case insensitive that is lowercase and uppercase characters are treated alike the Language is also free format comments are specified in the language by preceding the text with two Consecutive dashes (- -)

Entity Declaration

The entity declaration specifies the name of entity being modeled and lists the set of inter face ports Ports are signals through which entity communicates with other models in its external environment

8

EXAMPLE

Entity declaration for the half adder circuit is

Entity half adder is Port (A B in Bit sum carry out Bit) End half adder

The entity called half adder has two input ports A and B and two out put ports sum and carry Bit is predefined type of the language

Architecture Body

An architecture body using any of the following modeling styles specifies the internal details of an entity

1 As a set of interconnected components (to represent structure)2 As a set of concurrent assignment statements (to represent data flow)3 As a set of sequential assignment statements (to represent behavior)4 As any combination of the above three

25 Structural style of modeling

In this one an entity is described as a set of interconnected components Such a model for the HALF_ADDER entity is described in a n architecture body

Architecture ha of ha isComponent Xor2 Port (X Y in BIT Zout BIT)End componentComponent And2 Port (L M in BIT NoutBIT) End component Begin X1 Xor2portmap (A B SUM) A1 AND2portmap (A B CARRY) End ha

The name of the architecture body is ha the entity declaration for half adder specifies the interface ports for this architecture body The architecture body is composed of two parts the declaration part and the statement part Two component declarations are present in the declarative part of the architecture body

The declared components are instantiated in the statement part of the architecture body using component instantiation The signals in the port map of a component instantiation and the port signals in the component declaration are associated by the position

9

DATAFLOW STYLE OF MODELING

In this modeling style the flow of data through the entity is expressed primarily using concurrent signal assignment statements The data flow model for the half adder is described using two concurrent signal assignment statements In a signal assignment statement the symbol lt=implies an assignment of a value to a signal

BEHAVIORAL STYLE OF MODELING

The behavioral style of modeling specifies the behavior of an entity as a set of statements that are executed sequentially in the specific order These sets of sequential statements which are specified inside a process statement do not explicitly specify the structure of the entity but merely its functionality A process statement is a concurrent statement that can appear with in an architecture body MIXED STYLE OF MODELING

It is possible to mix the three modeling styles in a single architecture body That is within an architecture body we could use component instantiation statements concurrent signal assignment statements and process statements

MODEL ANALYSIS

Once an entity is declared in VHDL it can be validated using analyzer and a simulator that are apart of a VHDL system The first step in the validation process is analysis The analyzer takes a file that contains one or more design units and compile s them into an intermediate form The generated intermediate form is stored in a specific design library that has been designated as the working library

There is a design library with the logic name STD predefined by the VHDL language environment This library contains two packages STANDARD and TEXTIO The STANDARD package contains declarations for all the predefined types of the language The TexTIO package contains procedures and functions that are necessary for supporting formatted text read and write operations There also exists an IEEE standard package called STD_LOGIC_1164and contains its associated sub types overloaded operator functions and other useful utilities This standard is called the IEEE STD 1164 ndash1993

SIMULATION

For a hierarchical entity to be simulated all of its lowest ndashlevel components must be described at the behavioral level A simulation can be performed on either one of the following

1 An entity declaration and an architecture body pair

2 A configuration

10

Preceding the actual simulation are two major steps

1 Elaboration phase IN this phase the hierarchy of the entity is expanded and linked components are bound to entities in a library and the top- level entity is built as a network of behavioral models that is ready to be simulated

2 Initialization phase Driving and effective values for all explicitly declared signals are computed implicit signals are assigned values processes are executed once until they suspend and simulation time is set to 0ns

Simulation commences by advancing time to that of the next event Values that are assigned to signals at this time are assigned If the value of a signal changes and if that signal is present in the sensitivity list of a process the process is executed until it suspends Simulation stops when an assertion occurs depending on the implementation of the VHDL system or when the maximum time as defined by the language is reached

Entity Declaration

An entity declaration describes the external interface of the entity It specifies the name of the entity the names of the interface ports their mode and the type of ports The syntax for entity declaration is

Entity entity _name is [generic (list of ndashgenerics and ndashtheir types)] [port (list of interface-port-names-and their types )]

[entity item declarations] [begin entity statements] end [entity][entity name]

The entity ndashname is the name of the entity and the interface ports are the signals through which entity passes the information to and from its external environment Each interface port can have one of the following modes

1 in The value of an input port can only read with in the entity model 2 out The value of an out put port can only be updated within the entity model3 inout The value of a bi directional port can be read and updated within the entity

model4 buffer The value of a buffer port can be read and updated within the entity

model It cannot have more than one source

Declarations that are placed in the entity are common to all the design units that are associated with that entity declaration

ARCHITECTURE BODY

An architecture body describes the internal view of an entity It describes the functionality of the structure of the entity

11

Architecture ltarchitecture namegt oflt entity namegt is Begin Concurrent statements Process statements Block statements Concurrent signal assignment-statement Component ndashinstantiation-statement Generate statement End [architecture] [architecture name]

The concurrent statements describe the internal composition of the entity All

concurrent statements are executed in parallel The internal composition of an entity can be expressed in terms of structure dataflow and sequential behavior

Here we describe an entity by using the behavioral model A process statement which is a concurrent statement is the primary mechanism used to describe the functionality of an entity in this modeling style

26 PROCESS STATEMENT

A process statement contains sequential statements that describe the functionality of a portion of an entity in sequential terms The syntax for the process statement is

[Process-label] process [(sensitivity-list)] [is] begin sequential statements variable-assignment-statement signal assignment-statement wait statement if-statement case-statement loop-statement null-statement exit-statement next-statement assertion-statement report-statement procedure-call-statement return end process [process label]

A set of signals to which the process is sensitive is defined by the sensitivity list In other words each time an event occurs on any of the signals in the sensitivity list the sequential statements with in the process are executed in a sequential order that is in the order in which they appear The process then suspends after executing the last sequential statement and waits for another event to occur on a signal in the sensitivity list

12

VARIABLE ASSIGNMENT STAEMENT

Variables can be declared and used inside a process statement A variable is assigned a value using the variable assignment statement that typically has the form

Variable-object = expression

The expression is evaluated when the statement is executed and the computed value is assigned to the variable object instantaneously that is at the concurrent simulation time

A variable can be declared outside of a process or subprogram Such a variable can be read and updated by more than one process These variables are called shared variables

SIGNAL ASSIGNMENT STATEMENT

Signals are assigned values using a signal assignment statement The simplest form of a signal assignment statement is

Signal-object lt= expression [after a delay value]

A signal assignment statement can appear within a process or outside of a process If it occurs outside of a process it is considered to be a concurrent signal assignment statement

When a signal assignment statement appears with in a process it is considered to be a sequential signal assignment statement and is executed in sequences with respect to the other statements which appear with in the process

27 CONDITIONAL STATEMENTS

IF STATEMENT An if statement selects a sequence of statements for execution of statements for execution based on the value of a condition the condition The condition can be any expression that evaluates to a Boolean value The general form of an if statement is

If Boolean expression thenSequential statementselsif Boolean-expression thenSequential-statements[else sequential statements] end if

The if statement is executed by checking each condition sequentially until the first true condition is found the set of sequential statements associated with this condition is executed An if statement is also a sequential statement

13

CASE STATEMENT

The format of a case statement is Case expression is

When choices =gtsequential statementsWhen choices =gtsequential statements End case

The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement

LOOP STATEMENTS

A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is

[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]

14

28 Active HDL Overview

Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries

Standards Supported

VHDL

The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard

Verilog

The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL

EDIF

Active-HDL supports Electronic Design Interchange Format version 2 0 0

VITAL

The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21

WAVES

Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system

15

29 ACTIVE-HDL Macro Language

All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)

1 HDL Editor

HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts

2 Block Diagram Editor

Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

3 State Diagram Editor

State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

4 Waveform Editor

Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors

5 Design Browser

The Design Browser window displays the contents of the current design that is

Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the

current design

16

6 Console window

The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console

210 Compilation

Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following

VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)

In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram

A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection

Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled

211 Simulation

The purpose of simulation is to verify that the circuit works as desired

The Active-HDL simulator provides two simulation engines

Event-Driven Simulation Cycle-Based Simulation

17

The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven

212 XILINX

Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish

ISE enables you to start your design with any of a number of different source types including

HDL (VHDL Verilog HDL ABEL)

Schematic design files

EDIF

NGCNGO

State Machines

IP Cores

From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration

Design Entry

ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports

18

Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow

CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories

Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints

PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints

State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL

Implementation

Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file

Map - The Map program maps a logical design to a Xilinx FPGA

Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator

Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design

FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing

Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design

Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file

Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments

19

Device Download and Program File Formatting

BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration

iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device

XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices

Integration with ChipScope Pro

CH 3 Introduction to SEA

Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication

In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle

20

SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo

Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems

For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted

31 Specifications

Parameters and Definitions

SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds

As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0

Basic Operations

21

Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _

These operations are formally defined as follows

1 Bitwise XOR

The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1

2 Substitution Box S

SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition

S Znb2b rarr Znb

2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR

Word Rotation R

The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1

Bit Rotation r

The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word

22

Addition mod2b _

The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1

The Round and Key Round

Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that

[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri

[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri

[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi

23

R

R-1

r S

ki

Li Ri

Li+1 Ri+1

KLi KRi

R r S

Ci

KLi+1

KRi+1

FIG 31 Encryptdecrypt round and key round

The Complete Cipher

The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K

key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))

encryptionfor i in 1 to nr2

[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)

24

finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD

32 Design Properties of the Components

Substitution Box S

The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table

Bit and Word Rotations r and R

The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3

The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds

Addition mod 2b _

Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in

25

most processors (4) necessity to avoid structural attacks

33 Overall Structure

The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round

34 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis

From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round

Extensions of Linear and Differential Cryptanalysis

Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear

26

cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack

However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC

It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks

As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account

A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a

These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and

27

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 2: Scalable Encryption Algorithm

CONTENTS

CHAPTER 1 Introduction to VLSI 9 11 Introduction 9 12 VLSI Design Style 10

13 VLSI Design Flow 11 14 VLSI Features 11

CHAPTER 2 Introduction to VHDL 1221 Introduction 1222 Capabilities 1323 Abstraction levels of VHDL 13 24 Basic Terminology 1425 Modeling Techniques for VHDL 1726 Process Statements 1827 Conditional Statements 19 28 Active HDL Overview 2129 Macro language 22 210 Compilation 23 211 Simulation 23 212 X Linix 24

CHAPTER 3 Introduction to SEA 26 31 Specifications 2732 Design properties 3033 Overall Structure 3134 Security Analysis 3135 Performance Analysis 35

CHAPTER 4 An Exposition Of SEA 37 41 Overview of SEA 38

CHAPTER 5 SEA Architecture 39 51 Key Generation 4052 Encryption 4253 Decryption 44

Appendix-I Simulation Results 47Appendix-II Synthesis Reports 50 Appendix- III Implementation 79

Appendix-IV Advantages 80Appendix-V Conclusion 81

Appendix-VI Bibliography 82

2

CH1 INTRODUCTION TO VLSI

The first digital circuit was designed by using electronic components like vacuum tubes and transistors Later Integrated Circuits (ICs) were invented where a designer can be able to place digital circuits on a chip consists of less than 10 gates for an IC called SSI (Small Scale Integration) scale With the advent of new fabrication techniques designer can place more than 100 gates on an IC called MSI (Medium Scale Integration) Using design at this level one can create digital sub blocks (adders multiplexes counters registers and etc) on an IC This level is LSI (Large Scale Integration) using this scale of integration people succeeded to make digital subsystems (Microprocessor IO peripheral devices and etc) on a chip

At this point design process started getting very complicated ie manually conversion from schematic level to gate level or gate level to layout level was becoming somewhat lengthy process and verifying the functionality of digital circuits at various levels became critical This created new challenges to digital designers as well as circuit designers Designers felt need to automate these processes In this process Rapid advances in Software Technology and development of new higher level programming languages taken place People could able to develop CADCAE (Computer Aided DesignComputer Aided Engineering) tools for design electronics circuits with assistance of software programs Functional verification and Logic verification of design can be done using CAD simulation tools with greater efficiency It became very easy to a designer to verify functionality of design at various levels

With advent of new technology ie CMOS (Complementary Metal Oxide Semiconductor) process technology One can fabricate a chip contains more than Million of gates At this point design process still became critical because of manual converting the design from one level to other Using latest CAD tools could solve the problem Existence of logic synthesis tools design engineer can easily translate to higher-level design description to lower levels This way of designing (using CAD tools) is certainly a revolution in electronic industry This may be leading to development of sophisticated electronic products for both consumer as well as business Designing Systems using Hardware always gives best results when compared to software (like Speed Reliability performance and etc) Using CMOS VLSI Design methodology designer could design and fabricate ICs without spending much time when compared to traditional way of designing

3

Specifications

Behavioral Description

RTL Description

Behavioral Simulation

Functional Simulation

Behavioral Synthesis

Logic Synthesis

Gate Level Net list

Constraints

Constraints

Library

AutomaticPampR

Layout

Logic simulation

Fabrication

Lay Out Management

12 TYPICAL IC DESIGN FLOW

4

Micron Technology

Micron SM DSM VDSM

13 MICRON TECHNOLOGY

The micron technology can be classified into 4 categories Evolving from micron technology and extending up to VDSM

Micron Technology The technology up to 10-6 m is the micron

Technology

Submicron Technology The technology below 1um is known as the Submicron technology It generally ranges up to 036m

DSM(Deep Sub Micron technology) The technology extending up to 018m is DSM

VDSM(Very Deep Sub Micron technology) The presently used technology is VDSM It ranges up to 009um

14 FEATURES

5

21 INTRODUCTION TO VHDL

VHDL is acronym for VHSIC hardware Description languageVHSIC is acronym for very high speed Integrated Circuits It is a hardware description language that can be used to model a digital system at many levels of abstraction ranging from the algorithmic level to the gate level

The VHDL language can be regarded as an integrated amalgamation of the following languages

Sequential language

Concurrent language

Net-list language

Timing specifications

Waveform generation language VHDL

This language not only defines the syntax but also defines very clear simulation semantics for each language construct Therefore models written in this language can be verified using a VHDL simulator This subset is usually sufficient to model most applications The complete language however has sufficient power to capture the descriptions of the most complex chips to a complete electronic system

HISTORY

The requirements for the language were first generated in 1988 under the VHSIC chips for the department of Defence (DOD) Reprocurement and reuse was also a big issue Thus a need for a standardized hardware description language for the design documentation and verification of the digital systems was generated The IEEE in the December 1987 standardized VHDL language this version of the language is known as the IEEE STD 1076-1987 The official language description appears in the IEEE standard VHDL language Reference manual available from IEEE The language has also been recognized as an American National Standards Institute (ANSI) standard

According to IEEE rules an IEEE standard has to be reballoted every 5 years so that it may remain a standard so that it may remain a standard Consequently the language was upgraded with new features the syntax of many constructs was made more uniform and many ambiguities present in the 1987 version of the language were resolved This new version of the language is known as the IEEE STD 1076-1993

6

22 CAPABILITIES

The following are the major capabilities that the language provides along with the features that the language provides along with the features that differentiate it from other hardware languages

The language can be used as exchange medium between chip vendors and CAD tool users Different chip vendors can provide VHDL descriptions of their components to system designers

The language can be used as a communication medium between different CAD and CAE tools

The language supports hierarchy that is a digital can be modeled as asset of interconnected components each component in turn can be modeled as a set of interconnected subcomponents

The language supports flexible design methodologies top-down bottom-up or mixed It supports both synchronous and asynchronous timing models

Various digital modeling techniques such as finite ndashstate machine descriptions and Boolean equations can be modeled using the language

The language is publicly available human-readable and machine-readable

The language supports three basic different styles Structural Dataflow and behavioral

It supports a wide range of abstraction levels ranging from abstract behavioral descriptions to very precise gate-level descriptions

Arbitrarily large designs can be modeled using the language and there are no limitations imposed by the language on the size of the design

23 HARDWARE ABSTRACTION

VHDL is used to describe a model for a digital hardware device This model specifies the external view of the device and one or more internal views The internal view of the device specifies functionality or structure while the external view specifies the interface of the device through which it communicates with the other modules in the environment In VHDL each device model is treated as a distinct representation of a unique device called an Entity The Entity is thus a hardware abstraction of the actual hardware device Each Entity is described using one model which contains one external view and one or more internal views

7

24 Basic terminology

VHDL is a hardware description language that can be used to model a digital system A hardware abstraction of this digital system is called an entity An entity X when used in another entity Y becomes a component for the entity YTo describe an entity VHDL provides five different types of primary constructs called design units They are

1 Entity declaration 2 Architecture body 3 Configuration declaration 4 Package declaration 5 Package body

1 An entity is modeled using an entity declaration and at least one architecture body the Entity declaration describes the external view of the entity

For example the input and output signal names2 The architecture body contains the internal description of the entity for

example as a set of interconnected components that represents the structure of the entity or a set of concurrent or sequential statements that represents the behavior of the entity

3 A configuration declaration is used to create a configuration for an entity It specifies the binding of one architecture body from the many architecture bodies that may be associated with the entity It may also specify the bindings of the architecture components used in the selected architecture body to other entities An entity may have any number of configurations

4 A package declaration encapsulates a set of related declarations such type of declaration s subtype declaration and subprogram declaration which can be shared across two or more design units

5 A package body contains the definition of subprogram declared in a package declaration

Once an entity has been modeled it needs to be validated by a VHDL system A typical VHDL system consists of an analyzer and a simulator The analyzer reads in one or more design units contained in a single file and compiles them into a design library after validating the syntax and performing some static checks

The language is case insensitive that is lowercase and uppercase characters are treated alike the Language is also free format comments are specified in the language by preceding the text with two Consecutive dashes (- -)

Entity Declaration

The entity declaration specifies the name of entity being modeled and lists the set of inter face ports Ports are signals through which entity communicates with other models in its external environment

8

EXAMPLE

Entity declaration for the half adder circuit is

Entity half adder is Port (A B in Bit sum carry out Bit) End half adder

The entity called half adder has two input ports A and B and two out put ports sum and carry Bit is predefined type of the language

Architecture Body

An architecture body using any of the following modeling styles specifies the internal details of an entity

1 As a set of interconnected components (to represent structure)2 As a set of concurrent assignment statements (to represent data flow)3 As a set of sequential assignment statements (to represent behavior)4 As any combination of the above three

25 Structural style of modeling

In this one an entity is described as a set of interconnected components Such a model for the HALF_ADDER entity is described in a n architecture body

Architecture ha of ha isComponent Xor2 Port (X Y in BIT Zout BIT)End componentComponent And2 Port (L M in BIT NoutBIT) End component Begin X1 Xor2portmap (A B SUM) A1 AND2portmap (A B CARRY) End ha

The name of the architecture body is ha the entity declaration for half adder specifies the interface ports for this architecture body The architecture body is composed of two parts the declaration part and the statement part Two component declarations are present in the declarative part of the architecture body

The declared components are instantiated in the statement part of the architecture body using component instantiation The signals in the port map of a component instantiation and the port signals in the component declaration are associated by the position

9

DATAFLOW STYLE OF MODELING

In this modeling style the flow of data through the entity is expressed primarily using concurrent signal assignment statements The data flow model for the half adder is described using two concurrent signal assignment statements In a signal assignment statement the symbol lt=implies an assignment of a value to a signal

BEHAVIORAL STYLE OF MODELING

The behavioral style of modeling specifies the behavior of an entity as a set of statements that are executed sequentially in the specific order These sets of sequential statements which are specified inside a process statement do not explicitly specify the structure of the entity but merely its functionality A process statement is a concurrent statement that can appear with in an architecture body MIXED STYLE OF MODELING

It is possible to mix the three modeling styles in a single architecture body That is within an architecture body we could use component instantiation statements concurrent signal assignment statements and process statements

MODEL ANALYSIS

Once an entity is declared in VHDL it can be validated using analyzer and a simulator that are apart of a VHDL system The first step in the validation process is analysis The analyzer takes a file that contains one or more design units and compile s them into an intermediate form The generated intermediate form is stored in a specific design library that has been designated as the working library

There is a design library with the logic name STD predefined by the VHDL language environment This library contains two packages STANDARD and TEXTIO The STANDARD package contains declarations for all the predefined types of the language The TexTIO package contains procedures and functions that are necessary for supporting formatted text read and write operations There also exists an IEEE standard package called STD_LOGIC_1164and contains its associated sub types overloaded operator functions and other useful utilities This standard is called the IEEE STD 1164 ndash1993

SIMULATION

For a hierarchical entity to be simulated all of its lowest ndashlevel components must be described at the behavioral level A simulation can be performed on either one of the following

1 An entity declaration and an architecture body pair

2 A configuration

10

Preceding the actual simulation are two major steps

1 Elaboration phase IN this phase the hierarchy of the entity is expanded and linked components are bound to entities in a library and the top- level entity is built as a network of behavioral models that is ready to be simulated

2 Initialization phase Driving and effective values for all explicitly declared signals are computed implicit signals are assigned values processes are executed once until they suspend and simulation time is set to 0ns

Simulation commences by advancing time to that of the next event Values that are assigned to signals at this time are assigned If the value of a signal changes and if that signal is present in the sensitivity list of a process the process is executed until it suspends Simulation stops when an assertion occurs depending on the implementation of the VHDL system or when the maximum time as defined by the language is reached

Entity Declaration

An entity declaration describes the external interface of the entity It specifies the name of the entity the names of the interface ports their mode and the type of ports The syntax for entity declaration is

Entity entity _name is [generic (list of ndashgenerics and ndashtheir types)] [port (list of interface-port-names-and their types )]

[entity item declarations] [begin entity statements] end [entity][entity name]

The entity ndashname is the name of the entity and the interface ports are the signals through which entity passes the information to and from its external environment Each interface port can have one of the following modes

1 in The value of an input port can only read with in the entity model 2 out The value of an out put port can only be updated within the entity model3 inout The value of a bi directional port can be read and updated within the entity

model4 buffer The value of a buffer port can be read and updated within the entity

model It cannot have more than one source

Declarations that are placed in the entity are common to all the design units that are associated with that entity declaration

ARCHITECTURE BODY

An architecture body describes the internal view of an entity It describes the functionality of the structure of the entity

11

Architecture ltarchitecture namegt oflt entity namegt is Begin Concurrent statements Process statements Block statements Concurrent signal assignment-statement Component ndashinstantiation-statement Generate statement End [architecture] [architecture name]

The concurrent statements describe the internal composition of the entity All

concurrent statements are executed in parallel The internal composition of an entity can be expressed in terms of structure dataflow and sequential behavior

Here we describe an entity by using the behavioral model A process statement which is a concurrent statement is the primary mechanism used to describe the functionality of an entity in this modeling style

26 PROCESS STATEMENT

A process statement contains sequential statements that describe the functionality of a portion of an entity in sequential terms The syntax for the process statement is

[Process-label] process [(sensitivity-list)] [is] begin sequential statements variable-assignment-statement signal assignment-statement wait statement if-statement case-statement loop-statement null-statement exit-statement next-statement assertion-statement report-statement procedure-call-statement return end process [process label]

A set of signals to which the process is sensitive is defined by the sensitivity list In other words each time an event occurs on any of the signals in the sensitivity list the sequential statements with in the process are executed in a sequential order that is in the order in which they appear The process then suspends after executing the last sequential statement and waits for another event to occur on a signal in the sensitivity list

12

VARIABLE ASSIGNMENT STAEMENT

Variables can be declared and used inside a process statement A variable is assigned a value using the variable assignment statement that typically has the form

Variable-object = expression

The expression is evaluated when the statement is executed and the computed value is assigned to the variable object instantaneously that is at the concurrent simulation time

A variable can be declared outside of a process or subprogram Such a variable can be read and updated by more than one process These variables are called shared variables

SIGNAL ASSIGNMENT STATEMENT

Signals are assigned values using a signal assignment statement The simplest form of a signal assignment statement is

Signal-object lt= expression [after a delay value]

A signal assignment statement can appear within a process or outside of a process If it occurs outside of a process it is considered to be a concurrent signal assignment statement

When a signal assignment statement appears with in a process it is considered to be a sequential signal assignment statement and is executed in sequences with respect to the other statements which appear with in the process

27 CONDITIONAL STATEMENTS

IF STATEMENT An if statement selects a sequence of statements for execution of statements for execution based on the value of a condition the condition The condition can be any expression that evaluates to a Boolean value The general form of an if statement is

If Boolean expression thenSequential statementselsif Boolean-expression thenSequential-statements[else sequential statements] end if

The if statement is executed by checking each condition sequentially until the first true condition is found the set of sequential statements associated with this condition is executed An if statement is also a sequential statement

13

CASE STATEMENT

The format of a case statement is Case expression is

When choices =gtsequential statementsWhen choices =gtsequential statements End case

The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement

LOOP STATEMENTS

A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is

[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]

14

28 Active HDL Overview

Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries

Standards Supported

VHDL

The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard

Verilog

The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL

EDIF

Active-HDL supports Electronic Design Interchange Format version 2 0 0

VITAL

The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21

WAVES

Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system

15

29 ACTIVE-HDL Macro Language

All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)

1 HDL Editor

HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts

2 Block Diagram Editor

Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

3 State Diagram Editor

State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

4 Waveform Editor

Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors

5 Design Browser

The Design Browser window displays the contents of the current design that is

Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the

current design

16

6 Console window

The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console

210 Compilation

Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following

VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)

In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram

A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection

Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled

211 Simulation

The purpose of simulation is to verify that the circuit works as desired

The Active-HDL simulator provides two simulation engines

Event-Driven Simulation Cycle-Based Simulation

17

The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven

212 XILINX

Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish

ISE enables you to start your design with any of a number of different source types including

HDL (VHDL Verilog HDL ABEL)

Schematic design files

EDIF

NGCNGO

State Machines

IP Cores

From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration

Design Entry

ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports

18

Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow

CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories

Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints

PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints

State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL

Implementation

Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file

Map - The Map program maps a logical design to a Xilinx FPGA

Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator

Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design

FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing

Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design

Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file

Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments

19

Device Download and Program File Formatting

BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration

iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device

XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices

Integration with ChipScope Pro

CH 3 Introduction to SEA

Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication

In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle

20

SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo

Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems

For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted

31 Specifications

Parameters and Definitions

SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds

As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0

Basic Operations

21

Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _

These operations are formally defined as follows

1 Bitwise XOR

The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1

2 Substitution Box S

SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition

S Znb2b rarr Znb

2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR

Word Rotation R

The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1

Bit Rotation r

The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word

22

Addition mod2b _

The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1

The Round and Key Round

Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that

[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri

[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri

[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi

23

R

R-1

r S

ki

Li Ri

Li+1 Ri+1

KLi KRi

R r S

Ci

KLi+1

KRi+1

FIG 31 Encryptdecrypt round and key round

The Complete Cipher

The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K

key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))

encryptionfor i in 1 to nr2

[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)

24

finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD

32 Design Properties of the Components

Substitution Box S

The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table

Bit and Word Rotations r and R

The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3

The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds

Addition mod 2b _

Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in

25

most processors (4) necessity to avoid structural attacks

33 Overall Structure

The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round

34 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis

From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round

Extensions of Linear and Differential Cryptanalysis

Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear

26

cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack

However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC

It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks

As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account

A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a

These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and

27

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 3: Scalable Encryption Algorithm

CH1 INTRODUCTION TO VLSI

The first digital circuit was designed by using electronic components like vacuum tubes and transistors Later Integrated Circuits (ICs) were invented where a designer can be able to place digital circuits on a chip consists of less than 10 gates for an IC called SSI (Small Scale Integration) scale With the advent of new fabrication techniques designer can place more than 100 gates on an IC called MSI (Medium Scale Integration) Using design at this level one can create digital sub blocks (adders multiplexes counters registers and etc) on an IC This level is LSI (Large Scale Integration) using this scale of integration people succeeded to make digital subsystems (Microprocessor IO peripheral devices and etc) on a chip

At this point design process started getting very complicated ie manually conversion from schematic level to gate level or gate level to layout level was becoming somewhat lengthy process and verifying the functionality of digital circuits at various levels became critical This created new challenges to digital designers as well as circuit designers Designers felt need to automate these processes In this process Rapid advances in Software Technology and development of new higher level programming languages taken place People could able to develop CADCAE (Computer Aided DesignComputer Aided Engineering) tools for design electronics circuits with assistance of software programs Functional verification and Logic verification of design can be done using CAD simulation tools with greater efficiency It became very easy to a designer to verify functionality of design at various levels

With advent of new technology ie CMOS (Complementary Metal Oxide Semiconductor) process technology One can fabricate a chip contains more than Million of gates At this point design process still became critical because of manual converting the design from one level to other Using latest CAD tools could solve the problem Existence of logic synthesis tools design engineer can easily translate to higher-level design description to lower levels This way of designing (using CAD tools) is certainly a revolution in electronic industry This may be leading to development of sophisticated electronic products for both consumer as well as business Designing Systems using Hardware always gives best results when compared to software (like Speed Reliability performance and etc) Using CMOS VLSI Design methodology designer could design and fabricate ICs without spending much time when compared to traditional way of designing

3

Specifications

Behavioral Description

RTL Description

Behavioral Simulation

Functional Simulation

Behavioral Synthesis

Logic Synthesis

Gate Level Net list

Constraints

Constraints

Library

AutomaticPampR

Layout

Logic simulation

Fabrication

Lay Out Management

12 TYPICAL IC DESIGN FLOW

4

Micron Technology

Micron SM DSM VDSM

13 MICRON TECHNOLOGY

The micron technology can be classified into 4 categories Evolving from micron technology and extending up to VDSM

Micron Technology The technology up to 10-6 m is the micron

Technology

Submicron Technology The technology below 1um is known as the Submicron technology It generally ranges up to 036m

DSM(Deep Sub Micron technology) The technology extending up to 018m is DSM

VDSM(Very Deep Sub Micron technology) The presently used technology is VDSM It ranges up to 009um

14 FEATURES

5

21 INTRODUCTION TO VHDL

VHDL is acronym for VHSIC hardware Description languageVHSIC is acronym for very high speed Integrated Circuits It is a hardware description language that can be used to model a digital system at many levels of abstraction ranging from the algorithmic level to the gate level

The VHDL language can be regarded as an integrated amalgamation of the following languages

Sequential language

Concurrent language

Net-list language

Timing specifications

Waveform generation language VHDL

This language not only defines the syntax but also defines very clear simulation semantics for each language construct Therefore models written in this language can be verified using a VHDL simulator This subset is usually sufficient to model most applications The complete language however has sufficient power to capture the descriptions of the most complex chips to a complete electronic system

HISTORY

The requirements for the language were first generated in 1988 under the VHSIC chips for the department of Defence (DOD) Reprocurement and reuse was also a big issue Thus a need for a standardized hardware description language for the design documentation and verification of the digital systems was generated The IEEE in the December 1987 standardized VHDL language this version of the language is known as the IEEE STD 1076-1987 The official language description appears in the IEEE standard VHDL language Reference manual available from IEEE The language has also been recognized as an American National Standards Institute (ANSI) standard

According to IEEE rules an IEEE standard has to be reballoted every 5 years so that it may remain a standard so that it may remain a standard Consequently the language was upgraded with new features the syntax of many constructs was made more uniform and many ambiguities present in the 1987 version of the language were resolved This new version of the language is known as the IEEE STD 1076-1993

6

22 CAPABILITIES

The following are the major capabilities that the language provides along with the features that the language provides along with the features that differentiate it from other hardware languages

The language can be used as exchange medium between chip vendors and CAD tool users Different chip vendors can provide VHDL descriptions of their components to system designers

The language can be used as a communication medium between different CAD and CAE tools

The language supports hierarchy that is a digital can be modeled as asset of interconnected components each component in turn can be modeled as a set of interconnected subcomponents

The language supports flexible design methodologies top-down bottom-up or mixed It supports both synchronous and asynchronous timing models

Various digital modeling techniques such as finite ndashstate machine descriptions and Boolean equations can be modeled using the language

The language is publicly available human-readable and machine-readable

The language supports three basic different styles Structural Dataflow and behavioral

It supports a wide range of abstraction levels ranging from abstract behavioral descriptions to very precise gate-level descriptions

Arbitrarily large designs can be modeled using the language and there are no limitations imposed by the language on the size of the design

23 HARDWARE ABSTRACTION

VHDL is used to describe a model for a digital hardware device This model specifies the external view of the device and one or more internal views The internal view of the device specifies functionality or structure while the external view specifies the interface of the device through which it communicates with the other modules in the environment In VHDL each device model is treated as a distinct representation of a unique device called an Entity The Entity is thus a hardware abstraction of the actual hardware device Each Entity is described using one model which contains one external view and one or more internal views

7

24 Basic terminology

VHDL is a hardware description language that can be used to model a digital system A hardware abstraction of this digital system is called an entity An entity X when used in another entity Y becomes a component for the entity YTo describe an entity VHDL provides five different types of primary constructs called design units They are

1 Entity declaration 2 Architecture body 3 Configuration declaration 4 Package declaration 5 Package body

1 An entity is modeled using an entity declaration and at least one architecture body the Entity declaration describes the external view of the entity

For example the input and output signal names2 The architecture body contains the internal description of the entity for

example as a set of interconnected components that represents the structure of the entity or a set of concurrent or sequential statements that represents the behavior of the entity

3 A configuration declaration is used to create a configuration for an entity It specifies the binding of one architecture body from the many architecture bodies that may be associated with the entity It may also specify the bindings of the architecture components used in the selected architecture body to other entities An entity may have any number of configurations

4 A package declaration encapsulates a set of related declarations such type of declaration s subtype declaration and subprogram declaration which can be shared across two or more design units

5 A package body contains the definition of subprogram declared in a package declaration

Once an entity has been modeled it needs to be validated by a VHDL system A typical VHDL system consists of an analyzer and a simulator The analyzer reads in one or more design units contained in a single file and compiles them into a design library after validating the syntax and performing some static checks

The language is case insensitive that is lowercase and uppercase characters are treated alike the Language is also free format comments are specified in the language by preceding the text with two Consecutive dashes (- -)

Entity Declaration

The entity declaration specifies the name of entity being modeled and lists the set of inter face ports Ports are signals through which entity communicates with other models in its external environment

8

EXAMPLE

Entity declaration for the half adder circuit is

Entity half adder is Port (A B in Bit sum carry out Bit) End half adder

The entity called half adder has two input ports A and B and two out put ports sum and carry Bit is predefined type of the language

Architecture Body

An architecture body using any of the following modeling styles specifies the internal details of an entity

1 As a set of interconnected components (to represent structure)2 As a set of concurrent assignment statements (to represent data flow)3 As a set of sequential assignment statements (to represent behavior)4 As any combination of the above three

25 Structural style of modeling

In this one an entity is described as a set of interconnected components Such a model for the HALF_ADDER entity is described in a n architecture body

Architecture ha of ha isComponent Xor2 Port (X Y in BIT Zout BIT)End componentComponent And2 Port (L M in BIT NoutBIT) End component Begin X1 Xor2portmap (A B SUM) A1 AND2portmap (A B CARRY) End ha

The name of the architecture body is ha the entity declaration for half adder specifies the interface ports for this architecture body The architecture body is composed of two parts the declaration part and the statement part Two component declarations are present in the declarative part of the architecture body

The declared components are instantiated in the statement part of the architecture body using component instantiation The signals in the port map of a component instantiation and the port signals in the component declaration are associated by the position

9

DATAFLOW STYLE OF MODELING

In this modeling style the flow of data through the entity is expressed primarily using concurrent signal assignment statements The data flow model for the half adder is described using two concurrent signal assignment statements In a signal assignment statement the symbol lt=implies an assignment of a value to a signal

BEHAVIORAL STYLE OF MODELING

The behavioral style of modeling specifies the behavior of an entity as a set of statements that are executed sequentially in the specific order These sets of sequential statements which are specified inside a process statement do not explicitly specify the structure of the entity but merely its functionality A process statement is a concurrent statement that can appear with in an architecture body MIXED STYLE OF MODELING

It is possible to mix the three modeling styles in a single architecture body That is within an architecture body we could use component instantiation statements concurrent signal assignment statements and process statements

MODEL ANALYSIS

Once an entity is declared in VHDL it can be validated using analyzer and a simulator that are apart of a VHDL system The first step in the validation process is analysis The analyzer takes a file that contains one or more design units and compile s them into an intermediate form The generated intermediate form is stored in a specific design library that has been designated as the working library

There is a design library with the logic name STD predefined by the VHDL language environment This library contains two packages STANDARD and TEXTIO The STANDARD package contains declarations for all the predefined types of the language The TexTIO package contains procedures and functions that are necessary for supporting formatted text read and write operations There also exists an IEEE standard package called STD_LOGIC_1164and contains its associated sub types overloaded operator functions and other useful utilities This standard is called the IEEE STD 1164 ndash1993

SIMULATION

For a hierarchical entity to be simulated all of its lowest ndashlevel components must be described at the behavioral level A simulation can be performed on either one of the following

1 An entity declaration and an architecture body pair

2 A configuration

10

Preceding the actual simulation are two major steps

1 Elaboration phase IN this phase the hierarchy of the entity is expanded and linked components are bound to entities in a library and the top- level entity is built as a network of behavioral models that is ready to be simulated

2 Initialization phase Driving and effective values for all explicitly declared signals are computed implicit signals are assigned values processes are executed once until they suspend and simulation time is set to 0ns

Simulation commences by advancing time to that of the next event Values that are assigned to signals at this time are assigned If the value of a signal changes and if that signal is present in the sensitivity list of a process the process is executed until it suspends Simulation stops when an assertion occurs depending on the implementation of the VHDL system or when the maximum time as defined by the language is reached

Entity Declaration

An entity declaration describes the external interface of the entity It specifies the name of the entity the names of the interface ports their mode and the type of ports The syntax for entity declaration is

Entity entity _name is [generic (list of ndashgenerics and ndashtheir types)] [port (list of interface-port-names-and their types )]

[entity item declarations] [begin entity statements] end [entity][entity name]

The entity ndashname is the name of the entity and the interface ports are the signals through which entity passes the information to and from its external environment Each interface port can have one of the following modes

1 in The value of an input port can only read with in the entity model 2 out The value of an out put port can only be updated within the entity model3 inout The value of a bi directional port can be read and updated within the entity

model4 buffer The value of a buffer port can be read and updated within the entity

model It cannot have more than one source

Declarations that are placed in the entity are common to all the design units that are associated with that entity declaration

ARCHITECTURE BODY

An architecture body describes the internal view of an entity It describes the functionality of the structure of the entity

11

Architecture ltarchitecture namegt oflt entity namegt is Begin Concurrent statements Process statements Block statements Concurrent signal assignment-statement Component ndashinstantiation-statement Generate statement End [architecture] [architecture name]

The concurrent statements describe the internal composition of the entity All

concurrent statements are executed in parallel The internal composition of an entity can be expressed in terms of structure dataflow and sequential behavior

Here we describe an entity by using the behavioral model A process statement which is a concurrent statement is the primary mechanism used to describe the functionality of an entity in this modeling style

26 PROCESS STATEMENT

A process statement contains sequential statements that describe the functionality of a portion of an entity in sequential terms The syntax for the process statement is

[Process-label] process [(sensitivity-list)] [is] begin sequential statements variable-assignment-statement signal assignment-statement wait statement if-statement case-statement loop-statement null-statement exit-statement next-statement assertion-statement report-statement procedure-call-statement return end process [process label]

A set of signals to which the process is sensitive is defined by the sensitivity list In other words each time an event occurs on any of the signals in the sensitivity list the sequential statements with in the process are executed in a sequential order that is in the order in which they appear The process then suspends after executing the last sequential statement and waits for another event to occur on a signal in the sensitivity list

12

VARIABLE ASSIGNMENT STAEMENT

Variables can be declared and used inside a process statement A variable is assigned a value using the variable assignment statement that typically has the form

Variable-object = expression

The expression is evaluated when the statement is executed and the computed value is assigned to the variable object instantaneously that is at the concurrent simulation time

A variable can be declared outside of a process or subprogram Such a variable can be read and updated by more than one process These variables are called shared variables

SIGNAL ASSIGNMENT STATEMENT

Signals are assigned values using a signal assignment statement The simplest form of a signal assignment statement is

Signal-object lt= expression [after a delay value]

A signal assignment statement can appear within a process or outside of a process If it occurs outside of a process it is considered to be a concurrent signal assignment statement

When a signal assignment statement appears with in a process it is considered to be a sequential signal assignment statement and is executed in sequences with respect to the other statements which appear with in the process

27 CONDITIONAL STATEMENTS

IF STATEMENT An if statement selects a sequence of statements for execution of statements for execution based on the value of a condition the condition The condition can be any expression that evaluates to a Boolean value The general form of an if statement is

If Boolean expression thenSequential statementselsif Boolean-expression thenSequential-statements[else sequential statements] end if

The if statement is executed by checking each condition sequentially until the first true condition is found the set of sequential statements associated with this condition is executed An if statement is also a sequential statement

13

CASE STATEMENT

The format of a case statement is Case expression is

When choices =gtsequential statementsWhen choices =gtsequential statements End case

The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement

LOOP STATEMENTS

A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is

[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]

14

28 Active HDL Overview

Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries

Standards Supported

VHDL

The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard

Verilog

The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL

EDIF

Active-HDL supports Electronic Design Interchange Format version 2 0 0

VITAL

The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21

WAVES

Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system

15

29 ACTIVE-HDL Macro Language

All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)

1 HDL Editor

HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts

2 Block Diagram Editor

Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

3 State Diagram Editor

State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

4 Waveform Editor

Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors

5 Design Browser

The Design Browser window displays the contents of the current design that is

Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the

current design

16

6 Console window

The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console

210 Compilation

Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following

VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)

In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram

A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection

Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled

211 Simulation

The purpose of simulation is to verify that the circuit works as desired

The Active-HDL simulator provides two simulation engines

Event-Driven Simulation Cycle-Based Simulation

17

The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven

212 XILINX

Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish

ISE enables you to start your design with any of a number of different source types including

HDL (VHDL Verilog HDL ABEL)

Schematic design files

EDIF

NGCNGO

State Machines

IP Cores

From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration

Design Entry

ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports

18

Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow

CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories

Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints

PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints

State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL

Implementation

Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file

Map - The Map program maps a logical design to a Xilinx FPGA

Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator

Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design

FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing

Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design

Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file

Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments

19

Device Download and Program File Formatting

BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration

iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device

XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices

Integration with ChipScope Pro

CH 3 Introduction to SEA

Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication

In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle

20

SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo

Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems

For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted

31 Specifications

Parameters and Definitions

SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds

As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0

Basic Operations

21

Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _

These operations are formally defined as follows

1 Bitwise XOR

The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1

2 Substitution Box S

SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition

S Znb2b rarr Znb

2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR

Word Rotation R

The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1

Bit Rotation r

The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word

22

Addition mod2b _

The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1

The Round and Key Round

Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that

[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri

[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri

[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi

23

R

R-1

r S

ki

Li Ri

Li+1 Ri+1

KLi KRi

R r S

Ci

KLi+1

KRi+1

FIG 31 Encryptdecrypt round and key round

The Complete Cipher

The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K

key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))

encryptionfor i in 1 to nr2

[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)

24

finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD

32 Design Properties of the Components

Substitution Box S

The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table

Bit and Word Rotations r and R

The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3

The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds

Addition mod 2b _

Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in

25

most processors (4) necessity to avoid structural attacks

33 Overall Structure

The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round

34 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis

From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round

Extensions of Linear and Differential Cryptanalysis

Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear

26

cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack

However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC

It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks

As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account

A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a

These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and

27

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 4: Scalable Encryption Algorithm

Specifications

Behavioral Description

RTL Description

Behavioral Simulation

Functional Simulation

Behavioral Synthesis

Logic Synthesis

Gate Level Net list

Constraints

Constraints

Library

AutomaticPampR

Layout

Logic simulation

Fabrication

Lay Out Management

12 TYPICAL IC DESIGN FLOW

4

Micron Technology

Micron SM DSM VDSM

13 MICRON TECHNOLOGY

The micron technology can be classified into 4 categories Evolving from micron technology and extending up to VDSM

Micron Technology The technology up to 10-6 m is the micron

Technology

Submicron Technology The technology below 1um is known as the Submicron technology It generally ranges up to 036m

DSM(Deep Sub Micron technology) The technology extending up to 018m is DSM

VDSM(Very Deep Sub Micron technology) The presently used technology is VDSM It ranges up to 009um

14 FEATURES

5

21 INTRODUCTION TO VHDL

VHDL is acronym for VHSIC hardware Description languageVHSIC is acronym for very high speed Integrated Circuits It is a hardware description language that can be used to model a digital system at many levels of abstraction ranging from the algorithmic level to the gate level

The VHDL language can be regarded as an integrated amalgamation of the following languages

Sequential language

Concurrent language

Net-list language

Timing specifications

Waveform generation language VHDL

This language not only defines the syntax but also defines very clear simulation semantics for each language construct Therefore models written in this language can be verified using a VHDL simulator This subset is usually sufficient to model most applications The complete language however has sufficient power to capture the descriptions of the most complex chips to a complete electronic system

HISTORY

The requirements for the language were first generated in 1988 under the VHSIC chips for the department of Defence (DOD) Reprocurement and reuse was also a big issue Thus a need for a standardized hardware description language for the design documentation and verification of the digital systems was generated The IEEE in the December 1987 standardized VHDL language this version of the language is known as the IEEE STD 1076-1987 The official language description appears in the IEEE standard VHDL language Reference manual available from IEEE The language has also been recognized as an American National Standards Institute (ANSI) standard

According to IEEE rules an IEEE standard has to be reballoted every 5 years so that it may remain a standard so that it may remain a standard Consequently the language was upgraded with new features the syntax of many constructs was made more uniform and many ambiguities present in the 1987 version of the language were resolved This new version of the language is known as the IEEE STD 1076-1993

6

22 CAPABILITIES

The following are the major capabilities that the language provides along with the features that the language provides along with the features that differentiate it from other hardware languages

The language can be used as exchange medium between chip vendors and CAD tool users Different chip vendors can provide VHDL descriptions of their components to system designers

The language can be used as a communication medium between different CAD and CAE tools

The language supports hierarchy that is a digital can be modeled as asset of interconnected components each component in turn can be modeled as a set of interconnected subcomponents

The language supports flexible design methodologies top-down bottom-up or mixed It supports both synchronous and asynchronous timing models

Various digital modeling techniques such as finite ndashstate machine descriptions and Boolean equations can be modeled using the language

The language is publicly available human-readable and machine-readable

The language supports three basic different styles Structural Dataflow and behavioral

It supports a wide range of abstraction levels ranging from abstract behavioral descriptions to very precise gate-level descriptions

Arbitrarily large designs can be modeled using the language and there are no limitations imposed by the language on the size of the design

23 HARDWARE ABSTRACTION

VHDL is used to describe a model for a digital hardware device This model specifies the external view of the device and one or more internal views The internal view of the device specifies functionality or structure while the external view specifies the interface of the device through which it communicates with the other modules in the environment In VHDL each device model is treated as a distinct representation of a unique device called an Entity The Entity is thus a hardware abstraction of the actual hardware device Each Entity is described using one model which contains one external view and one or more internal views

7

24 Basic terminology

VHDL is a hardware description language that can be used to model a digital system A hardware abstraction of this digital system is called an entity An entity X when used in another entity Y becomes a component for the entity YTo describe an entity VHDL provides five different types of primary constructs called design units They are

1 Entity declaration 2 Architecture body 3 Configuration declaration 4 Package declaration 5 Package body

1 An entity is modeled using an entity declaration and at least one architecture body the Entity declaration describes the external view of the entity

For example the input and output signal names2 The architecture body contains the internal description of the entity for

example as a set of interconnected components that represents the structure of the entity or a set of concurrent or sequential statements that represents the behavior of the entity

3 A configuration declaration is used to create a configuration for an entity It specifies the binding of one architecture body from the many architecture bodies that may be associated with the entity It may also specify the bindings of the architecture components used in the selected architecture body to other entities An entity may have any number of configurations

4 A package declaration encapsulates a set of related declarations such type of declaration s subtype declaration and subprogram declaration which can be shared across two or more design units

5 A package body contains the definition of subprogram declared in a package declaration

Once an entity has been modeled it needs to be validated by a VHDL system A typical VHDL system consists of an analyzer and a simulator The analyzer reads in one or more design units contained in a single file and compiles them into a design library after validating the syntax and performing some static checks

The language is case insensitive that is lowercase and uppercase characters are treated alike the Language is also free format comments are specified in the language by preceding the text with two Consecutive dashes (- -)

Entity Declaration

The entity declaration specifies the name of entity being modeled and lists the set of inter face ports Ports are signals through which entity communicates with other models in its external environment

8

EXAMPLE

Entity declaration for the half adder circuit is

Entity half adder is Port (A B in Bit sum carry out Bit) End half adder

The entity called half adder has two input ports A and B and two out put ports sum and carry Bit is predefined type of the language

Architecture Body

An architecture body using any of the following modeling styles specifies the internal details of an entity

1 As a set of interconnected components (to represent structure)2 As a set of concurrent assignment statements (to represent data flow)3 As a set of sequential assignment statements (to represent behavior)4 As any combination of the above three

25 Structural style of modeling

In this one an entity is described as a set of interconnected components Such a model for the HALF_ADDER entity is described in a n architecture body

Architecture ha of ha isComponent Xor2 Port (X Y in BIT Zout BIT)End componentComponent And2 Port (L M in BIT NoutBIT) End component Begin X1 Xor2portmap (A B SUM) A1 AND2portmap (A B CARRY) End ha

The name of the architecture body is ha the entity declaration for half adder specifies the interface ports for this architecture body The architecture body is composed of two parts the declaration part and the statement part Two component declarations are present in the declarative part of the architecture body

The declared components are instantiated in the statement part of the architecture body using component instantiation The signals in the port map of a component instantiation and the port signals in the component declaration are associated by the position

9

DATAFLOW STYLE OF MODELING

In this modeling style the flow of data through the entity is expressed primarily using concurrent signal assignment statements The data flow model for the half adder is described using two concurrent signal assignment statements In a signal assignment statement the symbol lt=implies an assignment of a value to a signal

BEHAVIORAL STYLE OF MODELING

The behavioral style of modeling specifies the behavior of an entity as a set of statements that are executed sequentially in the specific order These sets of sequential statements which are specified inside a process statement do not explicitly specify the structure of the entity but merely its functionality A process statement is a concurrent statement that can appear with in an architecture body MIXED STYLE OF MODELING

It is possible to mix the three modeling styles in a single architecture body That is within an architecture body we could use component instantiation statements concurrent signal assignment statements and process statements

MODEL ANALYSIS

Once an entity is declared in VHDL it can be validated using analyzer and a simulator that are apart of a VHDL system The first step in the validation process is analysis The analyzer takes a file that contains one or more design units and compile s them into an intermediate form The generated intermediate form is stored in a specific design library that has been designated as the working library

There is a design library with the logic name STD predefined by the VHDL language environment This library contains two packages STANDARD and TEXTIO The STANDARD package contains declarations for all the predefined types of the language The TexTIO package contains procedures and functions that are necessary for supporting formatted text read and write operations There also exists an IEEE standard package called STD_LOGIC_1164and contains its associated sub types overloaded operator functions and other useful utilities This standard is called the IEEE STD 1164 ndash1993

SIMULATION

For a hierarchical entity to be simulated all of its lowest ndashlevel components must be described at the behavioral level A simulation can be performed on either one of the following

1 An entity declaration and an architecture body pair

2 A configuration

10

Preceding the actual simulation are two major steps

1 Elaboration phase IN this phase the hierarchy of the entity is expanded and linked components are bound to entities in a library and the top- level entity is built as a network of behavioral models that is ready to be simulated

2 Initialization phase Driving and effective values for all explicitly declared signals are computed implicit signals are assigned values processes are executed once until they suspend and simulation time is set to 0ns

Simulation commences by advancing time to that of the next event Values that are assigned to signals at this time are assigned If the value of a signal changes and if that signal is present in the sensitivity list of a process the process is executed until it suspends Simulation stops when an assertion occurs depending on the implementation of the VHDL system or when the maximum time as defined by the language is reached

Entity Declaration

An entity declaration describes the external interface of the entity It specifies the name of the entity the names of the interface ports their mode and the type of ports The syntax for entity declaration is

Entity entity _name is [generic (list of ndashgenerics and ndashtheir types)] [port (list of interface-port-names-and their types )]

[entity item declarations] [begin entity statements] end [entity][entity name]

The entity ndashname is the name of the entity and the interface ports are the signals through which entity passes the information to and from its external environment Each interface port can have one of the following modes

1 in The value of an input port can only read with in the entity model 2 out The value of an out put port can only be updated within the entity model3 inout The value of a bi directional port can be read and updated within the entity

model4 buffer The value of a buffer port can be read and updated within the entity

model It cannot have more than one source

Declarations that are placed in the entity are common to all the design units that are associated with that entity declaration

ARCHITECTURE BODY

An architecture body describes the internal view of an entity It describes the functionality of the structure of the entity

11

Architecture ltarchitecture namegt oflt entity namegt is Begin Concurrent statements Process statements Block statements Concurrent signal assignment-statement Component ndashinstantiation-statement Generate statement End [architecture] [architecture name]

The concurrent statements describe the internal composition of the entity All

concurrent statements are executed in parallel The internal composition of an entity can be expressed in terms of structure dataflow and sequential behavior

Here we describe an entity by using the behavioral model A process statement which is a concurrent statement is the primary mechanism used to describe the functionality of an entity in this modeling style

26 PROCESS STATEMENT

A process statement contains sequential statements that describe the functionality of a portion of an entity in sequential terms The syntax for the process statement is

[Process-label] process [(sensitivity-list)] [is] begin sequential statements variable-assignment-statement signal assignment-statement wait statement if-statement case-statement loop-statement null-statement exit-statement next-statement assertion-statement report-statement procedure-call-statement return end process [process label]

A set of signals to which the process is sensitive is defined by the sensitivity list In other words each time an event occurs on any of the signals in the sensitivity list the sequential statements with in the process are executed in a sequential order that is in the order in which they appear The process then suspends after executing the last sequential statement and waits for another event to occur on a signal in the sensitivity list

12

VARIABLE ASSIGNMENT STAEMENT

Variables can be declared and used inside a process statement A variable is assigned a value using the variable assignment statement that typically has the form

Variable-object = expression

The expression is evaluated when the statement is executed and the computed value is assigned to the variable object instantaneously that is at the concurrent simulation time

A variable can be declared outside of a process or subprogram Such a variable can be read and updated by more than one process These variables are called shared variables

SIGNAL ASSIGNMENT STATEMENT

Signals are assigned values using a signal assignment statement The simplest form of a signal assignment statement is

Signal-object lt= expression [after a delay value]

A signal assignment statement can appear within a process or outside of a process If it occurs outside of a process it is considered to be a concurrent signal assignment statement

When a signal assignment statement appears with in a process it is considered to be a sequential signal assignment statement and is executed in sequences with respect to the other statements which appear with in the process

27 CONDITIONAL STATEMENTS

IF STATEMENT An if statement selects a sequence of statements for execution of statements for execution based on the value of a condition the condition The condition can be any expression that evaluates to a Boolean value The general form of an if statement is

If Boolean expression thenSequential statementselsif Boolean-expression thenSequential-statements[else sequential statements] end if

The if statement is executed by checking each condition sequentially until the first true condition is found the set of sequential statements associated with this condition is executed An if statement is also a sequential statement

13

CASE STATEMENT

The format of a case statement is Case expression is

When choices =gtsequential statementsWhen choices =gtsequential statements End case

The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement

LOOP STATEMENTS

A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is

[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]

14

28 Active HDL Overview

Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries

Standards Supported

VHDL

The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard

Verilog

The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL

EDIF

Active-HDL supports Electronic Design Interchange Format version 2 0 0

VITAL

The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21

WAVES

Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system

15

29 ACTIVE-HDL Macro Language

All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)

1 HDL Editor

HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts

2 Block Diagram Editor

Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

3 State Diagram Editor

State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

4 Waveform Editor

Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors

5 Design Browser

The Design Browser window displays the contents of the current design that is

Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the

current design

16

6 Console window

The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console

210 Compilation

Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following

VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)

In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram

A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection

Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled

211 Simulation

The purpose of simulation is to verify that the circuit works as desired

The Active-HDL simulator provides two simulation engines

Event-Driven Simulation Cycle-Based Simulation

17

The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven

212 XILINX

Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish

ISE enables you to start your design with any of a number of different source types including

HDL (VHDL Verilog HDL ABEL)

Schematic design files

EDIF

NGCNGO

State Machines

IP Cores

From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration

Design Entry

ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports

18

Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow

CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories

Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints

PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints

State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL

Implementation

Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file

Map - The Map program maps a logical design to a Xilinx FPGA

Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator

Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design

FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing

Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design

Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file

Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments

19

Device Download and Program File Formatting

BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration

iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device

XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices

Integration with ChipScope Pro

CH 3 Introduction to SEA

Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication

In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle

20

SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo

Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems

For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted

31 Specifications

Parameters and Definitions

SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds

As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0

Basic Operations

21

Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _

These operations are formally defined as follows

1 Bitwise XOR

The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1

2 Substitution Box S

SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition

S Znb2b rarr Znb

2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR

Word Rotation R

The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1

Bit Rotation r

The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word

22

Addition mod2b _

The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1

The Round and Key Round

Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that

[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri

[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri

[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi

23

R

R-1

r S

ki

Li Ri

Li+1 Ri+1

KLi KRi

R r S

Ci

KLi+1

KRi+1

FIG 31 Encryptdecrypt round and key round

The Complete Cipher

The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K

key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))

encryptionfor i in 1 to nr2

[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)

24

finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD

32 Design Properties of the Components

Substitution Box S

The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table

Bit and Word Rotations r and R

The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3

The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds

Addition mod 2b _

Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in

25

most processors (4) necessity to avoid structural attacks

33 Overall Structure

The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round

34 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis

From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round

Extensions of Linear and Differential Cryptanalysis

Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear

26

cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack

However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC

It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks

As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account

A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a

These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and

27

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 5: Scalable Encryption Algorithm

Micron Technology

Micron SM DSM VDSM

13 MICRON TECHNOLOGY

The micron technology can be classified into 4 categories Evolving from micron technology and extending up to VDSM

Micron Technology The technology up to 10-6 m is the micron

Technology

Submicron Technology The technology below 1um is known as the Submicron technology It generally ranges up to 036m

DSM(Deep Sub Micron technology) The technology extending up to 018m is DSM

VDSM(Very Deep Sub Micron technology) The presently used technology is VDSM It ranges up to 009um

14 FEATURES

5

21 INTRODUCTION TO VHDL

VHDL is acronym for VHSIC hardware Description languageVHSIC is acronym for very high speed Integrated Circuits It is a hardware description language that can be used to model a digital system at many levels of abstraction ranging from the algorithmic level to the gate level

The VHDL language can be regarded as an integrated amalgamation of the following languages

Sequential language

Concurrent language

Net-list language

Timing specifications

Waveform generation language VHDL

This language not only defines the syntax but also defines very clear simulation semantics for each language construct Therefore models written in this language can be verified using a VHDL simulator This subset is usually sufficient to model most applications The complete language however has sufficient power to capture the descriptions of the most complex chips to a complete electronic system

HISTORY

The requirements for the language were first generated in 1988 under the VHSIC chips for the department of Defence (DOD) Reprocurement and reuse was also a big issue Thus a need for a standardized hardware description language for the design documentation and verification of the digital systems was generated The IEEE in the December 1987 standardized VHDL language this version of the language is known as the IEEE STD 1076-1987 The official language description appears in the IEEE standard VHDL language Reference manual available from IEEE The language has also been recognized as an American National Standards Institute (ANSI) standard

According to IEEE rules an IEEE standard has to be reballoted every 5 years so that it may remain a standard so that it may remain a standard Consequently the language was upgraded with new features the syntax of many constructs was made more uniform and many ambiguities present in the 1987 version of the language were resolved This new version of the language is known as the IEEE STD 1076-1993

6

22 CAPABILITIES

The following are the major capabilities that the language provides along with the features that the language provides along with the features that differentiate it from other hardware languages

The language can be used as exchange medium between chip vendors and CAD tool users Different chip vendors can provide VHDL descriptions of their components to system designers

The language can be used as a communication medium between different CAD and CAE tools

The language supports hierarchy that is a digital can be modeled as asset of interconnected components each component in turn can be modeled as a set of interconnected subcomponents

The language supports flexible design methodologies top-down bottom-up or mixed It supports both synchronous and asynchronous timing models

Various digital modeling techniques such as finite ndashstate machine descriptions and Boolean equations can be modeled using the language

The language is publicly available human-readable and machine-readable

The language supports three basic different styles Structural Dataflow and behavioral

It supports a wide range of abstraction levels ranging from abstract behavioral descriptions to very precise gate-level descriptions

Arbitrarily large designs can be modeled using the language and there are no limitations imposed by the language on the size of the design

23 HARDWARE ABSTRACTION

VHDL is used to describe a model for a digital hardware device This model specifies the external view of the device and one or more internal views The internal view of the device specifies functionality or structure while the external view specifies the interface of the device through which it communicates with the other modules in the environment In VHDL each device model is treated as a distinct representation of a unique device called an Entity The Entity is thus a hardware abstraction of the actual hardware device Each Entity is described using one model which contains one external view and one or more internal views

7

24 Basic terminology

VHDL is a hardware description language that can be used to model a digital system A hardware abstraction of this digital system is called an entity An entity X when used in another entity Y becomes a component for the entity YTo describe an entity VHDL provides five different types of primary constructs called design units They are

1 Entity declaration 2 Architecture body 3 Configuration declaration 4 Package declaration 5 Package body

1 An entity is modeled using an entity declaration and at least one architecture body the Entity declaration describes the external view of the entity

For example the input and output signal names2 The architecture body contains the internal description of the entity for

example as a set of interconnected components that represents the structure of the entity or a set of concurrent or sequential statements that represents the behavior of the entity

3 A configuration declaration is used to create a configuration for an entity It specifies the binding of one architecture body from the many architecture bodies that may be associated with the entity It may also specify the bindings of the architecture components used in the selected architecture body to other entities An entity may have any number of configurations

4 A package declaration encapsulates a set of related declarations such type of declaration s subtype declaration and subprogram declaration which can be shared across two or more design units

5 A package body contains the definition of subprogram declared in a package declaration

Once an entity has been modeled it needs to be validated by a VHDL system A typical VHDL system consists of an analyzer and a simulator The analyzer reads in one or more design units contained in a single file and compiles them into a design library after validating the syntax and performing some static checks

The language is case insensitive that is lowercase and uppercase characters are treated alike the Language is also free format comments are specified in the language by preceding the text with two Consecutive dashes (- -)

Entity Declaration

The entity declaration specifies the name of entity being modeled and lists the set of inter face ports Ports are signals through which entity communicates with other models in its external environment

8

EXAMPLE

Entity declaration for the half adder circuit is

Entity half adder is Port (A B in Bit sum carry out Bit) End half adder

The entity called half adder has two input ports A and B and two out put ports sum and carry Bit is predefined type of the language

Architecture Body

An architecture body using any of the following modeling styles specifies the internal details of an entity

1 As a set of interconnected components (to represent structure)2 As a set of concurrent assignment statements (to represent data flow)3 As a set of sequential assignment statements (to represent behavior)4 As any combination of the above three

25 Structural style of modeling

In this one an entity is described as a set of interconnected components Such a model for the HALF_ADDER entity is described in a n architecture body

Architecture ha of ha isComponent Xor2 Port (X Y in BIT Zout BIT)End componentComponent And2 Port (L M in BIT NoutBIT) End component Begin X1 Xor2portmap (A B SUM) A1 AND2portmap (A B CARRY) End ha

The name of the architecture body is ha the entity declaration for half adder specifies the interface ports for this architecture body The architecture body is composed of two parts the declaration part and the statement part Two component declarations are present in the declarative part of the architecture body

The declared components are instantiated in the statement part of the architecture body using component instantiation The signals in the port map of a component instantiation and the port signals in the component declaration are associated by the position

9

DATAFLOW STYLE OF MODELING

In this modeling style the flow of data through the entity is expressed primarily using concurrent signal assignment statements The data flow model for the half adder is described using two concurrent signal assignment statements In a signal assignment statement the symbol lt=implies an assignment of a value to a signal

BEHAVIORAL STYLE OF MODELING

The behavioral style of modeling specifies the behavior of an entity as a set of statements that are executed sequentially in the specific order These sets of sequential statements which are specified inside a process statement do not explicitly specify the structure of the entity but merely its functionality A process statement is a concurrent statement that can appear with in an architecture body MIXED STYLE OF MODELING

It is possible to mix the three modeling styles in a single architecture body That is within an architecture body we could use component instantiation statements concurrent signal assignment statements and process statements

MODEL ANALYSIS

Once an entity is declared in VHDL it can be validated using analyzer and a simulator that are apart of a VHDL system The first step in the validation process is analysis The analyzer takes a file that contains one or more design units and compile s them into an intermediate form The generated intermediate form is stored in a specific design library that has been designated as the working library

There is a design library with the logic name STD predefined by the VHDL language environment This library contains two packages STANDARD and TEXTIO The STANDARD package contains declarations for all the predefined types of the language The TexTIO package contains procedures and functions that are necessary for supporting formatted text read and write operations There also exists an IEEE standard package called STD_LOGIC_1164and contains its associated sub types overloaded operator functions and other useful utilities This standard is called the IEEE STD 1164 ndash1993

SIMULATION

For a hierarchical entity to be simulated all of its lowest ndashlevel components must be described at the behavioral level A simulation can be performed on either one of the following

1 An entity declaration and an architecture body pair

2 A configuration

10

Preceding the actual simulation are two major steps

1 Elaboration phase IN this phase the hierarchy of the entity is expanded and linked components are bound to entities in a library and the top- level entity is built as a network of behavioral models that is ready to be simulated

2 Initialization phase Driving and effective values for all explicitly declared signals are computed implicit signals are assigned values processes are executed once until they suspend and simulation time is set to 0ns

Simulation commences by advancing time to that of the next event Values that are assigned to signals at this time are assigned If the value of a signal changes and if that signal is present in the sensitivity list of a process the process is executed until it suspends Simulation stops when an assertion occurs depending on the implementation of the VHDL system or when the maximum time as defined by the language is reached

Entity Declaration

An entity declaration describes the external interface of the entity It specifies the name of the entity the names of the interface ports their mode and the type of ports The syntax for entity declaration is

Entity entity _name is [generic (list of ndashgenerics and ndashtheir types)] [port (list of interface-port-names-and their types )]

[entity item declarations] [begin entity statements] end [entity][entity name]

The entity ndashname is the name of the entity and the interface ports are the signals through which entity passes the information to and from its external environment Each interface port can have one of the following modes

1 in The value of an input port can only read with in the entity model 2 out The value of an out put port can only be updated within the entity model3 inout The value of a bi directional port can be read and updated within the entity

model4 buffer The value of a buffer port can be read and updated within the entity

model It cannot have more than one source

Declarations that are placed in the entity are common to all the design units that are associated with that entity declaration

ARCHITECTURE BODY

An architecture body describes the internal view of an entity It describes the functionality of the structure of the entity

11

Architecture ltarchitecture namegt oflt entity namegt is Begin Concurrent statements Process statements Block statements Concurrent signal assignment-statement Component ndashinstantiation-statement Generate statement End [architecture] [architecture name]

The concurrent statements describe the internal composition of the entity All

concurrent statements are executed in parallel The internal composition of an entity can be expressed in terms of structure dataflow and sequential behavior

Here we describe an entity by using the behavioral model A process statement which is a concurrent statement is the primary mechanism used to describe the functionality of an entity in this modeling style

26 PROCESS STATEMENT

A process statement contains sequential statements that describe the functionality of a portion of an entity in sequential terms The syntax for the process statement is

[Process-label] process [(sensitivity-list)] [is] begin sequential statements variable-assignment-statement signal assignment-statement wait statement if-statement case-statement loop-statement null-statement exit-statement next-statement assertion-statement report-statement procedure-call-statement return end process [process label]

A set of signals to which the process is sensitive is defined by the sensitivity list In other words each time an event occurs on any of the signals in the sensitivity list the sequential statements with in the process are executed in a sequential order that is in the order in which they appear The process then suspends after executing the last sequential statement and waits for another event to occur on a signal in the sensitivity list

12

VARIABLE ASSIGNMENT STAEMENT

Variables can be declared and used inside a process statement A variable is assigned a value using the variable assignment statement that typically has the form

Variable-object = expression

The expression is evaluated when the statement is executed and the computed value is assigned to the variable object instantaneously that is at the concurrent simulation time

A variable can be declared outside of a process or subprogram Such a variable can be read and updated by more than one process These variables are called shared variables

SIGNAL ASSIGNMENT STATEMENT

Signals are assigned values using a signal assignment statement The simplest form of a signal assignment statement is

Signal-object lt= expression [after a delay value]

A signal assignment statement can appear within a process or outside of a process If it occurs outside of a process it is considered to be a concurrent signal assignment statement

When a signal assignment statement appears with in a process it is considered to be a sequential signal assignment statement and is executed in sequences with respect to the other statements which appear with in the process

27 CONDITIONAL STATEMENTS

IF STATEMENT An if statement selects a sequence of statements for execution of statements for execution based on the value of a condition the condition The condition can be any expression that evaluates to a Boolean value The general form of an if statement is

If Boolean expression thenSequential statementselsif Boolean-expression thenSequential-statements[else sequential statements] end if

The if statement is executed by checking each condition sequentially until the first true condition is found the set of sequential statements associated with this condition is executed An if statement is also a sequential statement

13

CASE STATEMENT

The format of a case statement is Case expression is

When choices =gtsequential statementsWhen choices =gtsequential statements End case

The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement

LOOP STATEMENTS

A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is

[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]

14

28 Active HDL Overview

Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries

Standards Supported

VHDL

The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard

Verilog

The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL

EDIF

Active-HDL supports Electronic Design Interchange Format version 2 0 0

VITAL

The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21

WAVES

Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system

15

29 ACTIVE-HDL Macro Language

All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)

1 HDL Editor

HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts

2 Block Diagram Editor

Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

3 State Diagram Editor

State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

4 Waveform Editor

Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors

5 Design Browser

The Design Browser window displays the contents of the current design that is

Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the

current design

16

6 Console window

The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console

210 Compilation

Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following

VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)

In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram

A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection

Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled

211 Simulation

The purpose of simulation is to verify that the circuit works as desired

The Active-HDL simulator provides two simulation engines

Event-Driven Simulation Cycle-Based Simulation

17

The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven

212 XILINX

Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish

ISE enables you to start your design with any of a number of different source types including

HDL (VHDL Verilog HDL ABEL)

Schematic design files

EDIF

NGCNGO

State Machines

IP Cores

From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration

Design Entry

ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports

18

Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow

CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories

Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints

PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints

State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL

Implementation

Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file

Map - The Map program maps a logical design to a Xilinx FPGA

Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator

Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design

FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing

Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design

Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file

Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments

19

Device Download and Program File Formatting

BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration

iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device

XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices

Integration with ChipScope Pro

CH 3 Introduction to SEA

Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication

In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle

20

SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo

Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems

For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted

31 Specifications

Parameters and Definitions

SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds

As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0

Basic Operations

21

Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _

These operations are formally defined as follows

1 Bitwise XOR

The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1

2 Substitution Box S

SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition

S Znb2b rarr Znb

2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR

Word Rotation R

The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1

Bit Rotation r

The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word

22

Addition mod2b _

The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1

The Round and Key Round

Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that

[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri

[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri

[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi

23

R

R-1

r S

ki

Li Ri

Li+1 Ri+1

KLi KRi

R r S

Ci

KLi+1

KRi+1

FIG 31 Encryptdecrypt round and key round

The Complete Cipher

The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K

key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))

encryptionfor i in 1 to nr2

[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)

24

finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD

32 Design Properties of the Components

Substitution Box S

The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table

Bit and Word Rotations r and R

The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3

The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds

Addition mod 2b _

Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in

25

most processors (4) necessity to avoid structural attacks

33 Overall Structure

The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round

34 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis

From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round

Extensions of Linear and Differential Cryptanalysis

Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear

26

cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack

However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC

It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks

As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account

A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a

These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and

27

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 6: Scalable Encryption Algorithm

21 INTRODUCTION TO VHDL

VHDL is acronym for VHSIC hardware Description languageVHSIC is acronym for very high speed Integrated Circuits It is a hardware description language that can be used to model a digital system at many levels of abstraction ranging from the algorithmic level to the gate level

The VHDL language can be regarded as an integrated amalgamation of the following languages

Sequential language

Concurrent language

Net-list language

Timing specifications

Waveform generation language VHDL

This language not only defines the syntax but also defines very clear simulation semantics for each language construct Therefore models written in this language can be verified using a VHDL simulator This subset is usually sufficient to model most applications The complete language however has sufficient power to capture the descriptions of the most complex chips to a complete electronic system

HISTORY

The requirements for the language were first generated in 1988 under the VHSIC chips for the department of Defence (DOD) Reprocurement and reuse was also a big issue Thus a need for a standardized hardware description language for the design documentation and verification of the digital systems was generated The IEEE in the December 1987 standardized VHDL language this version of the language is known as the IEEE STD 1076-1987 The official language description appears in the IEEE standard VHDL language Reference manual available from IEEE The language has also been recognized as an American National Standards Institute (ANSI) standard

According to IEEE rules an IEEE standard has to be reballoted every 5 years so that it may remain a standard so that it may remain a standard Consequently the language was upgraded with new features the syntax of many constructs was made more uniform and many ambiguities present in the 1987 version of the language were resolved This new version of the language is known as the IEEE STD 1076-1993

6

22 CAPABILITIES

The following are the major capabilities that the language provides along with the features that the language provides along with the features that differentiate it from other hardware languages

The language can be used as exchange medium between chip vendors and CAD tool users Different chip vendors can provide VHDL descriptions of their components to system designers

The language can be used as a communication medium between different CAD and CAE tools

The language supports hierarchy that is a digital can be modeled as asset of interconnected components each component in turn can be modeled as a set of interconnected subcomponents

The language supports flexible design methodologies top-down bottom-up or mixed It supports both synchronous and asynchronous timing models

Various digital modeling techniques such as finite ndashstate machine descriptions and Boolean equations can be modeled using the language

The language is publicly available human-readable and machine-readable

The language supports three basic different styles Structural Dataflow and behavioral

It supports a wide range of abstraction levels ranging from abstract behavioral descriptions to very precise gate-level descriptions

Arbitrarily large designs can be modeled using the language and there are no limitations imposed by the language on the size of the design

23 HARDWARE ABSTRACTION

VHDL is used to describe a model for a digital hardware device This model specifies the external view of the device and one or more internal views The internal view of the device specifies functionality or structure while the external view specifies the interface of the device through which it communicates with the other modules in the environment In VHDL each device model is treated as a distinct representation of a unique device called an Entity The Entity is thus a hardware abstraction of the actual hardware device Each Entity is described using one model which contains one external view and one or more internal views

7

24 Basic terminology

VHDL is a hardware description language that can be used to model a digital system A hardware abstraction of this digital system is called an entity An entity X when used in another entity Y becomes a component for the entity YTo describe an entity VHDL provides five different types of primary constructs called design units They are

1 Entity declaration 2 Architecture body 3 Configuration declaration 4 Package declaration 5 Package body

1 An entity is modeled using an entity declaration and at least one architecture body the Entity declaration describes the external view of the entity

For example the input and output signal names2 The architecture body contains the internal description of the entity for

example as a set of interconnected components that represents the structure of the entity or a set of concurrent or sequential statements that represents the behavior of the entity

3 A configuration declaration is used to create a configuration for an entity It specifies the binding of one architecture body from the many architecture bodies that may be associated with the entity It may also specify the bindings of the architecture components used in the selected architecture body to other entities An entity may have any number of configurations

4 A package declaration encapsulates a set of related declarations such type of declaration s subtype declaration and subprogram declaration which can be shared across two or more design units

5 A package body contains the definition of subprogram declared in a package declaration

Once an entity has been modeled it needs to be validated by a VHDL system A typical VHDL system consists of an analyzer and a simulator The analyzer reads in one or more design units contained in a single file and compiles them into a design library after validating the syntax and performing some static checks

The language is case insensitive that is lowercase and uppercase characters are treated alike the Language is also free format comments are specified in the language by preceding the text with two Consecutive dashes (- -)

Entity Declaration

The entity declaration specifies the name of entity being modeled and lists the set of inter face ports Ports are signals through which entity communicates with other models in its external environment

8

EXAMPLE

Entity declaration for the half adder circuit is

Entity half adder is Port (A B in Bit sum carry out Bit) End half adder

The entity called half adder has two input ports A and B and two out put ports sum and carry Bit is predefined type of the language

Architecture Body

An architecture body using any of the following modeling styles specifies the internal details of an entity

1 As a set of interconnected components (to represent structure)2 As a set of concurrent assignment statements (to represent data flow)3 As a set of sequential assignment statements (to represent behavior)4 As any combination of the above three

25 Structural style of modeling

In this one an entity is described as a set of interconnected components Such a model for the HALF_ADDER entity is described in a n architecture body

Architecture ha of ha isComponent Xor2 Port (X Y in BIT Zout BIT)End componentComponent And2 Port (L M in BIT NoutBIT) End component Begin X1 Xor2portmap (A B SUM) A1 AND2portmap (A B CARRY) End ha

The name of the architecture body is ha the entity declaration for half adder specifies the interface ports for this architecture body The architecture body is composed of two parts the declaration part and the statement part Two component declarations are present in the declarative part of the architecture body

The declared components are instantiated in the statement part of the architecture body using component instantiation The signals in the port map of a component instantiation and the port signals in the component declaration are associated by the position

9

DATAFLOW STYLE OF MODELING

In this modeling style the flow of data through the entity is expressed primarily using concurrent signal assignment statements The data flow model for the half adder is described using two concurrent signal assignment statements In a signal assignment statement the symbol lt=implies an assignment of a value to a signal

BEHAVIORAL STYLE OF MODELING

The behavioral style of modeling specifies the behavior of an entity as a set of statements that are executed sequentially in the specific order These sets of sequential statements which are specified inside a process statement do not explicitly specify the structure of the entity but merely its functionality A process statement is a concurrent statement that can appear with in an architecture body MIXED STYLE OF MODELING

It is possible to mix the three modeling styles in a single architecture body That is within an architecture body we could use component instantiation statements concurrent signal assignment statements and process statements

MODEL ANALYSIS

Once an entity is declared in VHDL it can be validated using analyzer and a simulator that are apart of a VHDL system The first step in the validation process is analysis The analyzer takes a file that contains one or more design units and compile s them into an intermediate form The generated intermediate form is stored in a specific design library that has been designated as the working library

There is a design library with the logic name STD predefined by the VHDL language environment This library contains two packages STANDARD and TEXTIO The STANDARD package contains declarations for all the predefined types of the language The TexTIO package contains procedures and functions that are necessary for supporting formatted text read and write operations There also exists an IEEE standard package called STD_LOGIC_1164and contains its associated sub types overloaded operator functions and other useful utilities This standard is called the IEEE STD 1164 ndash1993

SIMULATION

For a hierarchical entity to be simulated all of its lowest ndashlevel components must be described at the behavioral level A simulation can be performed on either one of the following

1 An entity declaration and an architecture body pair

2 A configuration

10

Preceding the actual simulation are two major steps

1 Elaboration phase IN this phase the hierarchy of the entity is expanded and linked components are bound to entities in a library and the top- level entity is built as a network of behavioral models that is ready to be simulated

2 Initialization phase Driving and effective values for all explicitly declared signals are computed implicit signals are assigned values processes are executed once until they suspend and simulation time is set to 0ns

Simulation commences by advancing time to that of the next event Values that are assigned to signals at this time are assigned If the value of a signal changes and if that signal is present in the sensitivity list of a process the process is executed until it suspends Simulation stops when an assertion occurs depending on the implementation of the VHDL system or when the maximum time as defined by the language is reached

Entity Declaration

An entity declaration describes the external interface of the entity It specifies the name of the entity the names of the interface ports their mode and the type of ports The syntax for entity declaration is

Entity entity _name is [generic (list of ndashgenerics and ndashtheir types)] [port (list of interface-port-names-and their types )]

[entity item declarations] [begin entity statements] end [entity][entity name]

The entity ndashname is the name of the entity and the interface ports are the signals through which entity passes the information to and from its external environment Each interface port can have one of the following modes

1 in The value of an input port can only read with in the entity model 2 out The value of an out put port can only be updated within the entity model3 inout The value of a bi directional port can be read and updated within the entity

model4 buffer The value of a buffer port can be read and updated within the entity

model It cannot have more than one source

Declarations that are placed in the entity are common to all the design units that are associated with that entity declaration

ARCHITECTURE BODY

An architecture body describes the internal view of an entity It describes the functionality of the structure of the entity

11

Architecture ltarchitecture namegt oflt entity namegt is Begin Concurrent statements Process statements Block statements Concurrent signal assignment-statement Component ndashinstantiation-statement Generate statement End [architecture] [architecture name]

The concurrent statements describe the internal composition of the entity All

concurrent statements are executed in parallel The internal composition of an entity can be expressed in terms of structure dataflow and sequential behavior

Here we describe an entity by using the behavioral model A process statement which is a concurrent statement is the primary mechanism used to describe the functionality of an entity in this modeling style

26 PROCESS STATEMENT

A process statement contains sequential statements that describe the functionality of a portion of an entity in sequential terms The syntax for the process statement is

[Process-label] process [(sensitivity-list)] [is] begin sequential statements variable-assignment-statement signal assignment-statement wait statement if-statement case-statement loop-statement null-statement exit-statement next-statement assertion-statement report-statement procedure-call-statement return end process [process label]

A set of signals to which the process is sensitive is defined by the sensitivity list In other words each time an event occurs on any of the signals in the sensitivity list the sequential statements with in the process are executed in a sequential order that is in the order in which they appear The process then suspends after executing the last sequential statement and waits for another event to occur on a signal in the sensitivity list

12

VARIABLE ASSIGNMENT STAEMENT

Variables can be declared and used inside a process statement A variable is assigned a value using the variable assignment statement that typically has the form

Variable-object = expression

The expression is evaluated when the statement is executed and the computed value is assigned to the variable object instantaneously that is at the concurrent simulation time

A variable can be declared outside of a process or subprogram Such a variable can be read and updated by more than one process These variables are called shared variables

SIGNAL ASSIGNMENT STATEMENT

Signals are assigned values using a signal assignment statement The simplest form of a signal assignment statement is

Signal-object lt= expression [after a delay value]

A signal assignment statement can appear within a process or outside of a process If it occurs outside of a process it is considered to be a concurrent signal assignment statement

When a signal assignment statement appears with in a process it is considered to be a sequential signal assignment statement and is executed in sequences with respect to the other statements which appear with in the process

27 CONDITIONAL STATEMENTS

IF STATEMENT An if statement selects a sequence of statements for execution of statements for execution based on the value of a condition the condition The condition can be any expression that evaluates to a Boolean value The general form of an if statement is

If Boolean expression thenSequential statementselsif Boolean-expression thenSequential-statements[else sequential statements] end if

The if statement is executed by checking each condition sequentially until the first true condition is found the set of sequential statements associated with this condition is executed An if statement is also a sequential statement

13

CASE STATEMENT

The format of a case statement is Case expression is

When choices =gtsequential statementsWhen choices =gtsequential statements End case

The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement

LOOP STATEMENTS

A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is

[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]

14

28 Active HDL Overview

Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries

Standards Supported

VHDL

The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard

Verilog

The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL

EDIF

Active-HDL supports Electronic Design Interchange Format version 2 0 0

VITAL

The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21

WAVES

Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system

15

29 ACTIVE-HDL Macro Language

All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)

1 HDL Editor

HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts

2 Block Diagram Editor

Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

3 State Diagram Editor

State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

4 Waveform Editor

Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors

5 Design Browser

The Design Browser window displays the contents of the current design that is

Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the

current design

16

6 Console window

The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console

210 Compilation

Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following

VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)

In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram

A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection

Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled

211 Simulation

The purpose of simulation is to verify that the circuit works as desired

The Active-HDL simulator provides two simulation engines

Event-Driven Simulation Cycle-Based Simulation

17

The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven

212 XILINX

Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish

ISE enables you to start your design with any of a number of different source types including

HDL (VHDL Verilog HDL ABEL)

Schematic design files

EDIF

NGCNGO

State Machines

IP Cores

From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration

Design Entry

ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports

18

Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow

CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories

Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints

PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints

State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL

Implementation

Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file

Map - The Map program maps a logical design to a Xilinx FPGA

Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator

Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design

FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing

Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design

Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file

Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments

19

Device Download and Program File Formatting

BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration

iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device

XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices

Integration with ChipScope Pro

CH 3 Introduction to SEA

Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication

In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle

20

SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo

Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems

For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted

31 Specifications

Parameters and Definitions

SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds

As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0

Basic Operations

21

Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _

These operations are formally defined as follows

1 Bitwise XOR

The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1

2 Substitution Box S

SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition

S Znb2b rarr Znb

2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR

Word Rotation R

The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1

Bit Rotation r

The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word

22

Addition mod2b _

The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1

The Round and Key Round

Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that

[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri

[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri

[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi

23

R

R-1

r S

ki

Li Ri

Li+1 Ri+1

KLi KRi

R r S

Ci

KLi+1

KRi+1

FIG 31 Encryptdecrypt round and key round

The Complete Cipher

The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K

key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))

encryptionfor i in 1 to nr2

[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)

24

finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD

32 Design Properties of the Components

Substitution Box S

The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table

Bit and Word Rotations r and R

The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3

The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds

Addition mod 2b _

Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in

25

most processors (4) necessity to avoid structural attacks

33 Overall Structure

The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round

34 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis

From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round

Extensions of Linear and Differential Cryptanalysis

Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear

26

cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack

However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC

It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks

As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account

A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a

These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and

27

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 7: Scalable Encryption Algorithm

22 CAPABILITIES

The following are the major capabilities that the language provides along with the features that the language provides along with the features that differentiate it from other hardware languages

The language can be used as exchange medium between chip vendors and CAD tool users Different chip vendors can provide VHDL descriptions of their components to system designers

The language can be used as a communication medium between different CAD and CAE tools

The language supports hierarchy that is a digital can be modeled as asset of interconnected components each component in turn can be modeled as a set of interconnected subcomponents

The language supports flexible design methodologies top-down bottom-up or mixed It supports both synchronous and asynchronous timing models

Various digital modeling techniques such as finite ndashstate machine descriptions and Boolean equations can be modeled using the language

The language is publicly available human-readable and machine-readable

The language supports three basic different styles Structural Dataflow and behavioral

It supports a wide range of abstraction levels ranging from abstract behavioral descriptions to very precise gate-level descriptions

Arbitrarily large designs can be modeled using the language and there are no limitations imposed by the language on the size of the design

23 HARDWARE ABSTRACTION

VHDL is used to describe a model for a digital hardware device This model specifies the external view of the device and one or more internal views The internal view of the device specifies functionality or structure while the external view specifies the interface of the device through which it communicates with the other modules in the environment In VHDL each device model is treated as a distinct representation of a unique device called an Entity The Entity is thus a hardware abstraction of the actual hardware device Each Entity is described using one model which contains one external view and one or more internal views

7

24 Basic terminology

VHDL is a hardware description language that can be used to model a digital system A hardware abstraction of this digital system is called an entity An entity X when used in another entity Y becomes a component for the entity YTo describe an entity VHDL provides five different types of primary constructs called design units They are

1 Entity declaration 2 Architecture body 3 Configuration declaration 4 Package declaration 5 Package body

1 An entity is modeled using an entity declaration and at least one architecture body the Entity declaration describes the external view of the entity

For example the input and output signal names2 The architecture body contains the internal description of the entity for

example as a set of interconnected components that represents the structure of the entity or a set of concurrent or sequential statements that represents the behavior of the entity

3 A configuration declaration is used to create a configuration for an entity It specifies the binding of one architecture body from the many architecture bodies that may be associated with the entity It may also specify the bindings of the architecture components used in the selected architecture body to other entities An entity may have any number of configurations

4 A package declaration encapsulates a set of related declarations such type of declaration s subtype declaration and subprogram declaration which can be shared across two or more design units

5 A package body contains the definition of subprogram declared in a package declaration

Once an entity has been modeled it needs to be validated by a VHDL system A typical VHDL system consists of an analyzer and a simulator The analyzer reads in one or more design units contained in a single file and compiles them into a design library after validating the syntax and performing some static checks

The language is case insensitive that is lowercase and uppercase characters are treated alike the Language is also free format comments are specified in the language by preceding the text with two Consecutive dashes (- -)

Entity Declaration

The entity declaration specifies the name of entity being modeled and lists the set of inter face ports Ports are signals through which entity communicates with other models in its external environment

8

EXAMPLE

Entity declaration for the half adder circuit is

Entity half adder is Port (A B in Bit sum carry out Bit) End half adder

The entity called half adder has two input ports A and B and two out put ports sum and carry Bit is predefined type of the language

Architecture Body

An architecture body using any of the following modeling styles specifies the internal details of an entity

1 As a set of interconnected components (to represent structure)2 As a set of concurrent assignment statements (to represent data flow)3 As a set of sequential assignment statements (to represent behavior)4 As any combination of the above three

25 Structural style of modeling

In this one an entity is described as a set of interconnected components Such a model for the HALF_ADDER entity is described in a n architecture body

Architecture ha of ha isComponent Xor2 Port (X Y in BIT Zout BIT)End componentComponent And2 Port (L M in BIT NoutBIT) End component Begin X1 Xor2portmap (A B SUM) A1 AND2portmap (A B CARRY) End ha

The name of the architecture body is ha the entity declaration for half adder specifies the interface ports for this architecture body The architecture body is composed of two parts the declaration part and the statement part Two component declarations are present in the declarative part of the architecture body

The declared components are instantiated in the statement part of the architecture body using component instantiation The signals in the port map of a component instantiation and the port signals in the component declaration are associated by the position

9

DATAFLOW STYLE OF MODELING

In this modeling style the flow of data through the entity is expressed primarily using concurrent signal assignment statements The data flow model for the half adder is described using two concurrent signal assignment statements In a signal assignment statement the symbol lt=implies an assignment of a value to a signal

BEHAVIORAL STYLE OF MODELING

The behavioral style of modeling specifies the behavior of an entity as a set of statements that are executed sequentially in the specific order These sets of sequential statements which are specified inside a process statement do not explicitly specify the structure of the entity but merely its functionality A process statement is a concurrent statement that can appear with in an architecture body MIXED STYLE OF MODELING

It is possible to mix the three modeling styles in a single architecture body That is within an architecture body we could use component instantiation statements concurrent signal assignment statements and process statements

MODEL ANALYSIS

Once an entity is declared in VHDL it can be validated using analyzer and a simulator that are apart of a VHDL system The first step in the validation process is analysis The analyzer takes a file that contains one or more design units and compile s them into an intermediate form The generated intermediate form is stored in a specific design library that has been designated as the working library

There is a design library with the logic name STD predefined by the VHDL language environment This library contains two packages STANDARD and TEXTIO The STANDARD package contains declarations for all the predefined types of the language The TexTIO package contains procedures and functions that are necessary for supporting formatted text read and write operations There also exists an IEEE standard package called STD_LOGIC_1164and contains its associated sub types overloaded operator functions and other useful utilities This standard is called the IEEE STD 1164 ndash1993

SIMULATION

For a hierarchical entity to be simulated all of its lowest ndashlevel components must be described at the behavioral level A simulation can be performed on either one of the following

1 An entity declaration and an architecture body pair

2 A configuration

10

Preceding the actual simulation are two major steps

1 Elaboration phase IN this phase the hierarchy of the entity is expanded and linked components are bound to entities in a library and the top- level entity is built as a network of behavioral models that is ready to be simulated

2 Initialization phase Driving and effective values for all explicitly declared signals are computed implicit signals are assigned values processes are executed once until they suspend and simulation time is set to 0ns

Simulation commences by advancing time to that of the next event Values that are assigned to signals at this time are assigned If the value of a signal changes and if that signal is present in the sensitivity list of a process the process is executed until it suspends Simulation stops when an assertion occurs depending on the implementation of the VHDL system or when the maximum time as defined by the language is reached

Entity Declaration

An entity declaration describes the external interface of the entity It specifies the name of the entity the names of the interface ports their mode and the type of ports The syntax for entity declaration is

Entity entity _name is [generic (list of ndashgenerics and ndashtheir types)] [port (list of interface-port-names-and their types )]

[entity item declarations] [begin entity statements] end [entity][entity name]

The entity ndashname is the name of the entity and the interface ports are the signals through which entity passes the information to and from its external environment Each interface port can have one of the following modes

1 in The value of an input port can only read with in the entity model 2 out The value of an out put port can only be updated within the entity model3 inout The value of a bi directional port can be read and updated within the entity

model4 buffer The value of a buffer port can be read and updated within the entity

model It cannot have more than one source

Declarations that are placed in the entity are common to all the design units that are associated with that entity declaration

ARCHITECTURE BODY

An architecture body describes the internal view of an entity It describes the functionality of the structure of the entity

11

Architecture ltarchitecture namegt oflt entity namegt is Begin Concurrent statements Process statements Block statements Concurrent signal assignment-statement Component ndashinstantiation-statement Generate statement End [architecture] [architecture name]

The concurrent statements describe the internal composition of the entity All

concurrent statements are executed in parallel The internal composition of an entity can be expressed in terms of structure dataflow and sequential behavior

Here we describe an entity by using the behavioral model A process statement which is a concurrent statement is the primary mechanism used to describe the functionality of an entity in this modeling style

26 PROCESS STATEMENT

A process statement contains sequential statements that describe the functionality of a portion of an entity in sequential terms The syntax for the process statement is

[Process-label] process [(sensitivity-list)] [is] begin sequential statements variable-assignment-statement signal assignment-statement wait statement if-statement case-statement loop-statement null-statement exit-statement next-statement assertion-statement report-statement procedure-call-statement return end process [process label]

A set of signals to which the process is sensitive is defined by the sensitivity list In other words each time an event occurs on any of the signals in the sensitivity list the sequential statements with in the process are executed in a sequential order that is in the order in which they appear The process then suspends after executing the last sequential statement and waits for another event to occur on a signal in the sensitivity list

12

VARIABLE ASSIGNMENT STAEMENT

Variables can be declared and used inside a process statement A variable is assigned a value using the variable assignment statement that typically has the form

Variable-object = expression

The expression is evaluated when the statement is executed and the computed value is assigned to the variable object instantaneously that is at the concurrent simulation time

A variable can be declared outside of a process or subprogram Such a variable can be read and updated by more than one process These variables are called shared variables

SIGNAL ASSIGNMENT STATEMENT

Signals are assigned values using a signal assignment statement The simplest form of a signal assignment statement is

Signal-object lt= expression [after a delay value]

A signal assignment statement can appear within a process or outside of a process If it occurs outside of a process it is considered to be a concurrent signal assignment statement

When a signal assignment statement appears with in a process it is considered to be a sequential signal assignment statement and is executed in sequences with respect to the other statements which appear with in the process

27 CONDITIONAL STATEMENTS

IF STATEMENT An if statement selects a sequence of statements for execution of statements for execution based on the value of a condition the condition The condition can be any expression that evaluates to a Boolean value The general form of an if statement is

If Boolean expression thenSequential statementselsif Boolean-expression thenSequential-statements[else sequential statements] end if

The if statement is executed by checking each condition sequentially until the first true condition is found the set of sequential statements associated with this condition is executed An if statement is also a sequential statement

13

CASE STATEMENT

The format of a case statement is Case expression is

When choices =gtsequential statementsWhen choices =gtsequential statements End case

The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement

LOOP STATEMENTS

A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is

[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]

14

28 Active HDL Overview

Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries

Standards Supported

VHDL

The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard

Verilog

The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL

EDIF

Active-HDL supports Electronic Design Interchange Format version 2 0 0

VITAL

The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21

WAVES

Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system

15

29 ACTIVE-HDL Macro Language

All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)

1 HDL Editor

HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts

2 Block Diagram Editor

Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

3 State Diagram Editor

State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

4 Waveform Editor

Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors

5 Design Browser

The Design Browser window displays the contents of the current design that is

Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the

current design

16

6 Console window

The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console

210 Compilation

Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following

VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)

In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram

A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection

Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled

211 Simulation

The purpose of simulation is to verify that the circuit works as desired

The Active-HDL simulator provides two simulation engines

Event-Driven Simulation Cycle-Based Simulation

17

The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven

212 XILINX

Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish

ISE enables you to start your design with any of a number of different source types including

HDL (VHDL Verilog HDL ABEL)

Schematic design files

EDIF

NGCNGO

State Machines

IP Cores

From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration

Design Entry

ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports

18

Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow

CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories

Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints

PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints

State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL

Implementation

Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file

Map - The Map program maps a logical design to a Xilinx FPGA

Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator

Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design

FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing

Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design

Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file

Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments

19

Device Download and Program File Formatting

BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration

iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device

XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices

Integration with ChipScope Pro

CH 3 Introduction to SEA

Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication

In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle

20

SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo

Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems

For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted

31 Specifications

Parameters and Definitions

SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds

As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0

Basic Operations

21

Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _

These operations are formally defined as follows

1 Bitwise XOR

The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1

2 Substitution Box S

SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition

S Znb2b rarr Znb

2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR

Word Rotation R

The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1

Bit Rotation r

The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word

22

Addition mod2b _

The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1

The Round and Key Round

Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that

[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri

[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri

[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi

23

R

R-1

r S

ki

Li Ri

Li+1 Ri+1

KLi KRi

R r S

Ci

KLi+1

KRi+1

FIG 31 Encryptdecrypt round and key round

The Complete Cipher

The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K

key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))

encryptionfor i in 1 to nr2

[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)

24

finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD

32 Design Properties of the Components

Substitution Box S

The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table

Bit and Word Rotations r and R

The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3

The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds

Addition mod 2b _

Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in

25

most processors (4) necessity to avoid structural attacks

33 Overall Structure

The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round

34 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis

From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round

Extensions of Linear and Differential Cryptanalysis

Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear

26

cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack

However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC

It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks

As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account

A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a

These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and

27

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 8: Scalable Encryption Algorithm

24 Basic terminology

VHDL is a hardware description language that can be used to model a digital system A hardware abstraction of this digital system is called an entity An entity X when used in another entity Y becomes a component for the entity YTo describe an entity VHDL provides five different types of primary constructs called design units They are

1 Entity declaration 2 Architecture body 3 Configuration declaration 4 Package declaration 5 Package body

1 An entity is modeled using an entity declaration and at least one architecture body the Entity declaration describes the external view of the entity

For example the input and output signal names2 The architecture body contains the internal description of the entity for

example as a set of interconnected components that represents the structure of the entity or a set of concurrent or sequential statements that represents the behavior of the entity

3 A configuration declaration is used to create a configuration for an entity It specifies the binding of one architecture body from the many architecture bodies that may be associated with the entity It may also specify the bindings of the architecture components used in the selected architecture body to other entities An entity may have any number of configurations

4 A package declaration encapsulates a set of related declarations such type of declaration s subtype declaration and subprogram declaration which can be shared across two or more design units

5 A package body contains the definition of subprogram declared in a package declaration

Once an entity has been modeled it needs to be validated by a VHDL system A typical VHDL system consists of an analyzer and a simulator The analyzer reads in one or more design units contained in a single file and compiles them into a design library after validating the syntax and performing some static checks

The language is case insensitive that is lowercase and uppercase characters are treated alike the Language is also free format comments are specified in the language by preceding the text with two Consecutive dashes (- -)

Entity Declaration

The entity declaration specifies the name of entity being modeled and lists the set of inter face ports Ports are signals through which entity communicates with other models in its external environment

8

EXAMPLE

Entity declaration for the half adder circuit is

Entity half adder is Port (A B in Bit sum carry out Bit) End half adder

The entity called half adder has two input ports A and B and two out put ports sum and carry Bit is predefined type of the language

Architecture Body

An architecture body using any of the following modeling styles specifies the internal details of an entity

1 As a set of interconnected components (to represent structure)2 As a set of concurrent assignment statements (to represent data flow)3 As a set of sequential assignment statements (to represent behavior)4 As any combination of the above three

25 Structural style of modeling

In this one an entity is described as a set of interconnected components Such a model for the HALF_ADDER entity is described in a n architecture body

Architecture ha of ha isComponent Xor2 Port (X Y in BIT Zout BIT)End componentComponent And2 Port (L M in BIT NoutBIT) End component Begin X1 Xor2portmap (A B SUM) A1 AND2portmap (A B CARRY) End ha

The name of the architecture body is ha the entity declaration for half adder specifies the interface ports for this architecture body The architecture body is composed of two parts the declaration part and the statement part Two component declarations are present in the declarative part of the architecture body

The declared components are instantiated in the statement part of the architecture body using component instantiation The signals in the port map of a component instantiation and the port signals in the component declaration are associated by the position

9

DATAFLOW STYLE OF MODELING

In this modeling style the flow of data through the entity is expressed primarily using concurrent signal assignment statements The data flow model for the half adder is described using two concurrent signal assignment statements In a signal assignment statement the symbol lt=implies an assignment of a value to a signal

BEHAVIORAL STYLE OF MODELING

The behavioral style of modeling specifies the behavior of an entity as a set of statements that are executed sequentially in the specific order These sets of sequential statements which are specified inside a process statement do not explicitly specify the structure of the entity but merely its functionality A process statement is a concurrent statement that can appear with in an architecture body MIXED STYLE OF MODELING

It is possible to mix the three modeling styles in a single architecture body That is within an architecture body we could use component instantiation statements concurrent signal assignment statements and process statements

MODEL ANALYSIS

Once an entity is declared in VHDL it can be validated using analyzer and a simulator that are apart of a VHDL system The first step in the validation process is analysis The analyzer takes a file that contains one or more design units and compile s them into an intermediate form The generated intermediate form is stored in a specific design library that has been designated as the working library

There is a design library with the logic name STD predefined by the VHDL language environment This library contains two packages STANDARD and TEXTIO The STANDARD package contains declarations for all the predefined types of the language The TexTIO package contains procedures and functions that are necessary for supporting formatted text read and write operations There also exists an IEEE standard package called STD_LOGIC_1164and contains its associated sub types overloaded operator functions and other useful utilities This standard is called the IEEE STD 1164 ndash1993

SIMULATION

For a hierarchical entity to be simulated all of its lowest ndashlevel components must be described at the behavioral level A simulation can be performed on either one of the following

1 An entity declaration and an architecture body pair

2 A configuration

10

Preceding the actual simulation are two major steps

1 Elaboration phase IN this phase the hierarchy of the entity is expanded and linked components are bound to entities in a library and the top- level entity is built as a network of behavioral models that is ready to be simulated

2 Initialization phase Driving and effective values for all explicitly declared signals are computed implicit signals are assigned values processes are executed once until they suspend and simulation time is set to 0ns

Simulation commences by advancing time to that of the next event Values that are assigned to signals at this time are assigned If the value of a signal changes and if that signal is present in the sensitivity list of a process the process is executed until it suspends Simulation stops when an assertion occurs depending on the implementation of the VHDL system or when the maximum time as defined by the language is reached

Entity Declaration

An entity declaration describes the external interface of the entity It specifies the name of the entity the names of the interface ports their mode and the type of ports The syntax for entity declaration is

Entity entity _name is [generic (list of ndashgenerics and ndashtheir types)] [port (list of interface-port-names-and their types )]

[entity item declarations] [begin entity statements] end [entity][entity name]

The entity ndashname is the name of the entity and the interface ports are the signals through which entity passes the information to and from its external environment Each interface port can have one of the following modes

1 in The value of an input port can only read with in the entity model 2 out The value of an out put port can only be updated within the entity model3 inout The value of a bi directional port can be read and updated within the entity

model4 buffer The value of a buffer port can be read and updated within the entity

model It cannot have more than one source

Declarations that are placed in the entity are common to all the design units that are associated with that entity declaration

ARCHITECTURE BODY

An architecture body describes the internal view of an entity It describes the functionality of the structure of the entity

11

Architecture ltarchitecture namegt oflt entity namegt is Begin Concurrent statements Process statements Block statements Concurrent signal assignment-statement Component ndashinstantiation-statement Generate statement End [architecture] [architecture name]

The concurrent statements describe the internal composition of the entity All

concurrent statements are executed in parallel The internal composition of an entity can be expressed in terms of structure dataflow and sequential behavior

Here we describe an entity by using the behavioral model A process statement which is a concurrent statement is the primary mechanism used to describe the functionality of an entity in this modeling style

26 PROCESS STATEMENT

A process statement contains sequential statements that describe the functionality of a portion of an entity in sequential terms The syntax for the process statement is

[Process-label] process [(sensitivity-list)] [is] begin sequential statements variable-assignment-statement signal assignment-statement wait statement if-statement case-statement loop-statement null-statement exit-statement next-statement assertion-statement report-statement procedure-call-statement return end process [process label]

A set of signals to which the process is sensitive is defined by the sensitivity list In other words each time an event occurs on any of the signals in the sensitivity list the sequential statements with in the process are executed in a sequential order that is in the order in which they appear The process then suspends after executing the last sequential statement and waits for another event to occur on a signal in the sensitivity list

12

VARIABLE ASSIGNMENT STAEMENT

Variables can be declared and used inside a process statement A variable is assigned a value using the variable assignment statement that typically has the form

Variable-object = expression

The expression is evaluated when the statement is executed and the computed value is assigned to the variable object instantaneously that is at the concurrent simulation time

A variable can be declared outside of a process or subprogram Such a variable can be read and updated by more than one process These variables are called shared variables

SIGNAL ASSIGNMENT STATEMENT

Signals are assigned values using a signal assignment statement The simplest form of a signal assignment statement is

Signal-object lt= expression [after a delay value]

A signal assignment statement can appear within a process or outside of a process If it occurs outside of a process it is considered to be a concurrent signal assignment statement

When a signal assignment statement appears with in a process it is considered to be a sequential signal assignment statement and is executed in sequences with respect to the other statements which appear with in the process

27 CONDITIONAL STATEMENTS

IF STATEMENT An if statement selects a sequence of statements for execution of statements for execution based on the value of a condition the condition The condition can be any expression that evaluates to a Boolean value The general form of an if statement is

If Boolean expression thenSequential statementselsif Boolean-expression thenSequential-statements[else sequential statements] end if

The if statement is executed by checking each condition sequentially until the first true condition is found the set of sequential statements associated with this condition is executed An if statement is also a sequential statement

13

CASE STATEMENT

The format of a case statement is Case expression is

When choices =gtsequential statementsWhen choices =gtsequential statements End case

The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement

LOOP STATEMENTS

A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is

[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]

14

28 Active HDL Overview

Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries

Standards Supported

VHDL

The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard

Verilog

The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL

EDIF

Active-HDL supports Electronic Design Interchange Format version 2 0 0

VITAL

The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21

WAVES

Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system

15

29 ACTIVE-HDL Macro Language

All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)

1 HDL Editor

HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts

2 Block Diagram Editor

Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

3 State Diagram Editor

State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

4 Waveform Editor

Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors

5 Design Browser

The Design Browser window displays the contents of the current design that is

Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the

current design

16

6 Console window

The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console

210 Compilation

Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following

VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)

In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram

A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection

Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled

211 Simulation

The purpose of simulation is to verify that the circuit works as desired

The Active-HDL simulator provides two simulation engines

Event-Driven Simulation Cycle-Based Simulation

17

The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven

212 XILINX

Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish

ISE enables you to start your design with any of a number of different source types including

HDL (VHDL Verilog HDL ABEL)

Schematic design files

EDIF

NGCNGO

State Machines

IP Cores

From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration

Design Entry

ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports

18

Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow

CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories

Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints

PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints

State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL

Implementation

Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file

Map - The Map program maps a logical design to a Xilinx FPGA

Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator

Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design

FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing

Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design

Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file

Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments

19

Device Download and Program File Formatting

BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration

iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device

XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices

Integration with ChipScope Pro

CH 3 Introduction to SEA

Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication

In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle

20

SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo

Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems

For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted

31 Specifications

Parameters and Definitions

SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds

As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0

Basic Operations

21

Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _

These operations are formally defined as follows

1 Bitwise XOR

The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1

2 Substitution Box S

SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition

S Znb2b rarr Znb

2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR

Word Rotation R

The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1

Bit Rotation r

The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word

22

Addition mod2b _

The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1

The Round and Key Round

Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that

[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri

[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri

[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi

23

R

R-1

r S

ki

Li Ri

Li+1 Ri+1

KLi KRi

R r S

Ci

KLi+1

KRi+1

FIG 31 Encryptdecrypt round and key round

The Complete Cipher

The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K

key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))

encryptionfor i in 1 to nr2

[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)

24

finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD

32 Design Properties of the Components

Substitution Box S

The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table

Bit and Word Rotations r and R

The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3

The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds

Addition mod 2b _

Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in

25

most processors (4) necessity to avoid structural attacks

33 Overall Structure

The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round

34 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis

From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round

Extensions of Linear and Differential Cryptanalysis

Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear

26

cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack

However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC

It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks

As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account

A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a

These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and

27

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 9: Scalable Encryption Algorithm

EXAMPLE

Entity declaration for the half adder circuit is

Entity half adder is Port (A B in Bit sum carry out Bit) End half adder

The entity called half adder has two input ports A and B and two out put ports sum and carry Bit is predefined type of the language

Architecture Body

An architecture body using any of the following modeling styles specifies the internal details of an entity

1 As a set of interconnected components (to represent structure)2 As a set of concurrent assignment statements (to represent data flow)3 As a set of sequential assignment statements (to represent behavior)4 As any combination of the above three

25 Structural style of modeling

In this one an entity is described as a set of interconnected components Such a model for the HALF_ADDER entity is described in a n architecture body

Architecture ha of ha isComponent Xor2 Port (X Y in BIT Zout BIT)End componentComponent And2 Port (L M in BIT NoutBIT) End component Begin X1 Xor2portmap (A B SUM) A1 AND2portmap (A B CARRY) End ha

The name of the architecture body is ha the entity declaration for half adder specifies the interface ports for this architecture body The architecture body is composed of two parts the declaration part and the statement part Two component declarations are present in the declarative part of the architecture body

The declared components are instantiated in the statement part of the architecture body using component instantiation The signals in the port map of a component instantiation and the port signals in the component declaration are associated by the position

9

DATAFLOW STYLE OF MODELING

In this modeling style the flow of data through the entity is expressed primarily using concurrent signal assignment statements The data flow model for the half adder is described using two concurrent signal assignment statements In a signal assignment statement the symbol lt=implies an assignment of a value to a signal

BEHAVIORAL STYLE OF MODELING

The behavioral style of modeling specifies the behavior of an entity as a set of statements that are executed sequentially in the specific order These sets of sequential statements which are specified inside a process statement do not explicitly specify the structure of the entity but merely its functionality A process statement is a concurrent statement that can appear with in an architecture body MIXED STYLE OF MODELING

It is possible to mix the three modeling styles in a single architecture body That is within an architecture body we could use component instantiation statements concurrent signal assignment statements and process statements

MODEL ANALYSIS

Once an entity is declared in VHDL it can be validated using analyzer and a simulator that are apart of a VHDL system The first step in the validation process is analysis The analyzer takes a file that contains one or more design units and compile s them into an intermediate form The generated intermediate form is stored in a specific design library that has been designated as the working library

There is a design library with the logic name STD predefined by the VHDL language environment This library contains two packages STANDARD and TEXTIO The STANDARD package contains declarations for all the predefined types of the language The TexTIO package contains procedures and functions that are necessary for supporting formatted text read and write operations There also exists an IEEE standard package called STD_LOGIC_1164and contains its associated sub types overloaded operator functions and other useful utilities This standard is called the IEEE STD 1164 ndash1993

SIMULATION

For a hierarchical entity to be simulated all of its lowest ndashlevel components must be described at the behavioral level A simulation can be performed on either one of the following

1 An entity declaration and an architecture body pair

2 A configuration

10

Preceding the actual simulation are two major steps

1 Elaboration phase IN this phase the hierarchy of the entity is expanded and linked components are bound to entities in a library and the top- level entity is built as a network of behavioral models that is ready to be simulated

2 Initialization phase Driving and effective values for all explicitly declared signals are computed implicit signals are assigned values processes are executed once until they suspend and simulation time is set to 0ns

Simulation commences by advancing time to that of the next event Values that are assigned to signals at this time are assigned If the value of a signal changes and if that signal is present in the sensitivity list of a process the process is executed until it suspends Simulation stops when an assertion occurs depending on the implementation of the VHDL system or when the maximum time as defined by the language is reached

Entity Declaration

An entity declaration describes the external interface of the entity It specifies the name of the entity the names of the interface ports their mode and the type of ports The syntax for entity declaration is

Entity entity _name is [generic (list of ndashgenerics and ndashtheir types)] [port (list of interface-port-names-and their types )]

[entity item declarations] [begin entity statements] end [entity][entity name]

The entity ndashname is the name of the entity and the interface ports are the signals through which entity passes the information to and from its external environment Each interface port can have one of the following modes

1 in The value of an input port can only read with in the entity model 2 out The value of an out put port can only be updated within the entity model3 inout The value of a bi directional port can be read and updated within the entity

model4 buffer The value of a buffer port can be read and updated within the entity

model It cannot have more than one source

Declarations that are placed in the entity are common to all the design units that are associated with that entity declaration

ARCHITECTURE BODY

An architecture body describes the internal view of an entity It describes the functionality of the structure of the entity

11

Architecture ltarchitecture namegt oflt entity namegt is Begin Concurrent statements Process statements Block statements Concurrent signal assignment-statement Component ndashinstantiation-statement Generate statement End [architecture] [architecture name]

The concurrent statements describe the internal composition of the entity All

concurrent statements are executed in parallel The internal composition of an entity can be expressed in terms of structure dataflow and sequential behavior

Here we describe an entity by using the behavioral model A process statement which is a concurrent statement is the primary mechanism used to describe the functionality of an entity in this modeling style

26 PROCESS STATEMENT

A process statement contains sequential statements that describe the functionality of a portion of an entity in sequential terms The syntax for the process statement is

[Process-label] process [(sensitivity-list)] [is] begin sequential statements variable-assignment-statement signal assignment-statement wait statement if-statement case-statement loop-statement null-statement exit-statement next-statement assertion-statement report-statement procedure-call-statement return end process [process label]

A set of signals to which the process is sensitive is defined by the sensitivity list In other words each time an event occurs on any of the signals in the sensitivity list the sequential statements with in the process are executed in a sequential order that is in the order in which they appear The process then suspends after executing the last sequential statement and waits for another event to occur on a signal in the sensitivity list

12

VARIABLE ASSIGNMENT STAEMENT

Variables can be declared and used inside a process statement A variable is assigned a value using the variable assignment statement that typically has the form

Variable-object = expression

The expression is evaluated when the statement is executed and the computed value is assigned to the variable object instantaneously that is at the concurrent simulation time

A variable can be declared outside of a process or subprogram Such a variable can be read and updated by more than one process These variables are called shared variables

SIGNAL ASSIGNMENT STATEMENT

Signals are assigned values using a signal assignment statement The simplest form of a signal assignment statement is

Signal-object lt= expression [after a delay value]

A signal assignment statement can appear within a process or outside of a process If it occurs outside of a process it is considered to be a concurrent signal assignment statement

When a signal assignment statement appears with in a process it is considered to be a sequential signal assignment statement and is executed in sequences with respect to the other statements which appear with in the process

27 CONDITIONAL STATEMENTS

IF STATEMENT An if statement selects a sequence of statements for execution of statements for execution based on the value of a condition the condition The condition can be any expression that evaluates to a Boolean value The general form of an if statement is

If Boolean expression thenSequential statementselsif Boolean-expression thenSequential-statements[else sequential statements] end if

The if statement is executed by checking each condition sequentially until the first true condition is found the set of sequential statements associated with this condition is executed An if statement is also a sequential statement

13

CASE STATEMENT

The format of a case statement is Case expression is

When choices =gtsequential statementsWhen choices =gtsequential statements End case

The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement

LOOP STATEMENTS

A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is

[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]

14

28 Active HDL Overview

Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries

Standards Supported

VHDL

The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard

Verilog

The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL

EDIF

Active-HDL supports Electronic Design Interchange Format version 2 0 0

VITAL

The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21

WAVES

Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system

15

29 ACTIVE-HDL Macro Language

All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)

1 HDL Editor

HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts

2 Block Diagram Editor

Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

3 State Diagram Editor

State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

4 Waveform Editor

Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors

5 Design Browser

The Design Browser window displays the contents of the current design that is

Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the

current design

16

6 Console window

The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console

210 Compilation

Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following

VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)

In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram

A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection

Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled

211 Simulation

The purpose of simulation is to verify that the circuit works as desired

The Active-HDL simulator provides two simulation engines

Event-Driven Simulation Cycle-Based Simulation

17

The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven

212 XILINX

Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish

ISE enables you to start your design with any of a number of different source types including

HDL (VHDL Verilog HDL ABEL)

Schematic design files

EDIF

NGCNGO

State Machines

IP Cores

From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration

Design Entry

ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports

18

Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow

CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories

Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints

PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints

State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL

Implementation

Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file

Map - The Map program maps a logical design to a Xilinx FPGA

Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator

Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design

FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing

Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design

Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file

Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments

19

Device Download and Program File Formatting

BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration

iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device

XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices

Integration with ChipScope Pro

CH 3 Introduction to SEA

Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication

In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle

20

SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo

Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems

For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted

31 Specifications

Parameters and Definitions

SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds

As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0

Basic Operations

21

Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _

These operations are formally defined as follows

1 Bitwise XOR

The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1

2 Substitution Box S

SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition

S Znb2b rarr Znb

2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR

Word Rotation R

The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1

Bit Rotation r

The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word

22

Addition mod2b _

The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1

The Round and Key Round

Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that

[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri

[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri

[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi

23

R

R-1

r S

ki

Li Ri

Li+1 Ri+1

KLi KRi

R r S

Ci

KLi+1

KRi+1

FIG 31 Encryptdecrypt round and key round

The Complete Cipher

The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K

key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))

encryptionfor i in 1 to nr2

[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)

24

finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD

32 Design Properties of the Components

Substitution Box S

The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table

Bit and Word Rotations r and R

The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3

The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds

Addition mod 2b _

Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in

25

most processors (4) necessity to avoid structural attacks

33 Overall Structure

The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round

34 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis

From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round

Extensions of Linear and Differential Cryptanalysis

Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear

26

cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack

However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC

It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks

As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account

A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a

These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and

27

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 10: Scalable Encryption Algorithm

DATAFLOW STYLE OF MODELING

In this modeling style the flow of data through the entity is expressed primarily using concurrent signal assignment statements The data flow model for the half adder is described using two concurrent signal assignment statements In a signal assignment statement the symbol lt=implies an assignment of a value to a signal

BEHAVIORAL STYLE OF MODELING

The behavioral style of modeling specifies the behavior of an entity as a set of statements that are executed sequentially in the specific order These sets of sequential statements which are specified inside a process statement do not explicitly specify the structure of the entity but merely its functionality A process statement is a concurrent statement that can appear with in an architecture body MIXED STYLE OF MODELING

It is possible to mix the three modeling styles in a single architecture body That is within an architecture body we could use component instantiation statements concurrent signal assignment statements and process statements

MODEL ANALYSIS

Once an entity is declared in VHDL it can be validated using analyzer and a simulator that are apart of a VHDL system The first step in the validation process is analysis The analyzer takes a file that contains one or more design units and compile s them into an intermediate form The generated intermediate form is stored in a specific design library that has been designated as the working library

There is a design library with the logic name STD predefined by the VHDL language environment This library contains two packages STANDARD and TEXTIO The STANDARD package contains declarations for all the predefined types of the language The TexTIO package contains procedures and functions that are necessary for supporting formatted text read and write operations There also exists an IEEE standard package called STD_LOGIC_1164and contains its associated sub types overloaded operator functions and other useful utilities This standard is called the IEEE STD 1164 ndash1993

SIMULATION

For a hierarchical entity to be simulated all of its lowest ndashlevel components must be described at the behavioral level A simulation can be performed on either one of the following

1 An entity declaration and an architecture body pair

2 A configuration

10

Preceding the actual simulation are two major steps

1 Elaboration phase IN this phase the hierarchy of the entity is expanded and linked components are bound to entities in a library and the top- level entity is built as a network of behavioral models that is ready to be simulated

2 Initialization phase Driving and effective values for all explicitly declared signals are computed implicit signals are assigned values processes are executed once until they suspend and simulation time is set to 0ns

Simulation commences by advancing time to that of the next event Values that are assigned to signals at this time are assigned If the value of a signal changes and if that signal is present in the sensitivity list of a process the process is executed until it suspends Simulation stops when an assertion occurs depending on the implementation of the VHDL system or when the maximum time as defined by the language is reached

Entity Declaration

An entity declaration describes the external interface of the entity It specifies the name of the entity the names of the interface ports their mode and the type of ports The syntax for entity declaration is

Entity entity _name is [generic (list of ndashgenerics and ndashtheir types)] [port (list of interface-port-names-and their types )]

[entity item declarations] [begin entity statements] end [entity][entity name]

The entity ndashname is the name of the entity and the interface ports are the signals through which entity passes the information to and from its external environment Each interface port can have one of the following modes

1 in The value of an input port can only read with in the entity model 2 out The value of an out put port can only be updated within the entity model3 inout The value of a bi directional port can be read and updated within the entity

model4 buffer The value of a buffer port can be read and updated within the entity

model It cannot have more than one source

Declarations that are placed in the entity are common to all the design units that are associated with that entity declaration

ARCHITECTURE BODY

An architecture body describes the internal view of an entity It describes the functionality of the structure of the entity

11

Architecture ltarchitecture namegt oflt entity namegt is Begin Concurrent statements Process statements Block statements Concurrent signal assignment-statement Component ndashinstantiation-statement Generate statement End [architecture] [architecture name]

The concurrent statements describe the internal composition of the entity All

concurrent statements are executed in parallel The internal composition of an entity can be expressed in terms of structure dataflow and sequential behavior

Here we describe an entity by using the behavioral model A process statement which is a concurrent statement is the primary mechanism used to describe the functionality of an entity in this modeling style

26 PROCESS STATEMENT

A process statement contains sequential statements that describe the functionality of a portion of an entity in sequential terms The syntax for the process statement is

[Process-label] process [(sensitivity-list)] [is] begin sequential statements variable-assignment-statement signal assignment-statement wait statement if-statement case-statement loop-statement null-statement exit-statement next-statement assertion-statement report-statement procedure-call-statement return end process [process label]

A set of signals to which the process is sensitive is defined by the sensitivity list In other words each time an event occurs on any of the signals in the sensitivity list the sequential statements with in the process are executed in a sequential order that is in the order in which they appear The process then suspends after executing the last sequential statement and waits for another event to occur on a signal in the sensitivity list

12

VARIABLE ASSIGNMENT STAEMENT

Variables can be declared and used inside a process statement A variable is assigned a value using the variable assignment statement that typically has the form

Variable-object = expression

The expression is evaluated when the statement is executed and the computed value is assigned to the variable object instantaneously that is at the concurrent simulation time

A variable can be declared outside of a process or subprogram Such a variable can be read and updated by more than one process These variables are called shared variables

SIGNAL ASSIGNMENT STATEMENT

Signals are assigned values using a signal assignment statement The simplest form of a signal assignment statement is

Signal-object lt= expression [after a delay value]

A signal assignment statement can appear within a process or outside of a process If it occurs outside of a process it is considered to be a concurrent signal assignment statement

When a signal assignment statement appears with in a process it is considered to be a sequential signal assignment statement and is executed in sequences with respect to the other statements which appear with in the process

27 CONDITIONAL STATEMENTS

IF STATEMENT An if statement selects a sequence of statements for execution of statements for execution based on the value of a condition the condition The condition can be any expression that evaluates to a Boolean value The general form of an if statement is

If Boolean expression thenSequential statementselsif Boolean-expression thenSequential-statements[else sequential statements] end if

The if statement is executed by checking each condition sequentially until the first true condition is found the set of sequential statements associated with this condition is executed An if statement is also a sequential statement

13

CASE STATEMENT

The format of a case statement is Case expression is

When choices =gtsequential statementsWhen choices =gtsequential statements End case

The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement

LOOP STATEMENTS

A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is

[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]

14

28 Active HDL Overview

Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries

Standards Supported

VHDL

The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard

Verilog

The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL

EDIF

Active-HDL supports Electronic Design Interchange Format version 2 0 0

VITAL

The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21

WAVES

Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system

15

29 ACTIVE-HDL Macro Language

All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)

1 HDL Editor

HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts

2 Block Diagram Editor

Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

3 State Diagram Editor

State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

4 Waveform Editor

Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors

5 Design Browser

The Design Browser window displays the contents of the current design that is

Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the

current design

16

6 Console window

The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console

210 Compilation

Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following

VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)

In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram

A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection

Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled

211 Simulation

The purpose of simulation is to verify that the circuit works as desired

The Active-HDL simulator provides two simulation engines

Event-Driven Simulation Cycle-Based Simulation

17

The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven

212 XILINX

Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish

ISE enables you to start your design with any of a number of different source types including

HDL (VHDL Verilog HDL ABEL)

Schematic design files

EDIF

NGCNGO

State Machines

IP Cores

From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration

Design Entry

ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports

18

Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow

CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories

Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints

PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints

State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL

Implementation

Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file

Map - The Map program maps a logical design to a Xilinx FPGA

Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator

Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design

FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing

Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design

Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file

Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments

19

Device Download and Program File Formatting

BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration

iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device

XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices

Integration with ChipScope Pro

CH 3 Introduction to SEA

Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication

In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle

20

SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo

Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems

For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted

31 Specifications

Parameters and Definitions

SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds

As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0

Basic Operations

21

Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _

These operations are formally defined as follows

1 Bitwise XOR

The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1

2 Substitution Box S

SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition

S Znb2b rarr Znb

2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR

Word Rotation R

The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1

Bit Rotation r

The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word

22

Addition mod2b _

The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1

The Round and Key Round

Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that

[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri

[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri

[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi

23

R

R-1

r S

ki

Li Ri

Li+1 Ri+1

KLi KRi

R r S

Ci

KLi+1

KRi+1

FIG 31 Encryptdecrypt round and key round

The Complete Cipher

The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K

key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))

encryptionfor i in 1 to nr2

[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)

24

finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD

32 Design Properties of the Components

Substitution Box S

The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table

Bit and Word Rotations r and R

The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3

The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds

Addition mod 2b _

Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in

25

most processors (4) necessity to avoid structural attacks

33 Overall Structure

The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round

34 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis

From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round

Extensions of Linear and Differential Cryptanalysis

Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear

26

cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack

However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC

It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks

As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account

A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a

These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and

27

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 11: Scalable Encryption Algorithm

Preceding the actual simulation are two major steps

1 Elaboration phase IN this phase the hierarchy of the entity is expanded and linked components are bound to entities in a library and the top- level entity is built as a network of behavioral models that is ready to be simulated

2 Initialization phase Driving and effective values for all explicitly declared signals are computed implicit signals are assigned values processes are executed once until they suspend and simulation time is set to 0ns

Simulation commences by advancing time to that of the next event Values that are assigned to signals at this time are assigned If the value of a signal changes and if that signal is present in the sensitivity list of a process the process is executed until it suspends Simulation stops when an assertion occurs depending on the implementation of the VHDL system or when the maximum time as defined by the language is reached

Entity Declaration

An entity declaration describes the external interface of the entity It specifies the name of the entity the names of the interface ports their mode and the type of ports The syntax for entity declaration is

Entity entity _name is [generic (list of ndashgenerics and ndashtheir types)] [port (list of interface-port-names-and their types )]

[entity item declarations] [begin entity statements] end [entity][entity name]

The entity ndashname is the name of the entity and the interface ports are the signals through which entity passes the information to and from its external environment Each interface port can have one of the following modes

1 in The value of an input port can only read with in the entity model 2 out The value of an out put port can only be updated within the entity model3 inout The value of a bi directional port can be read and updated within the entity

model4 buffer The value of a buffer port can be read and updated within the entity

model It cannot have more than one source

Declarations that are placed in the entity are common to all the design units that are associated with that entity declaration

ARCHITECTURE BODY

An architecture body describes the internal view of an entity It describes the functionality of the structure of the entity

11

Architecture ltarchitecture namegt oflt entity namegt is Begin Concurrent statements Process statements Block statements Concurrent signal assignment-statement Component ndashinstantiation-statement Generate statement End [architecture] [architecture name]

The concurrent statements describe the internal composition of the entity All

concurrent statements are executed in parallel The internal composition of an entity can be expressed in terms of structure dataflow and sequential behavior

Here we describe an entity by using the behavioral model A process statement which is a concurrent statement is the primary mechanism used to describe the functionality of an entity in this modeling style

26 PROCESS STATEMENT

A process statement contains sequential statements that describe the functionality of a portion of an entity in sequential terms The syntax for the process statement is

[Process-label] process [(sensitivity-list)] [is] begin sequential statements variable-assignment-statement signal assignment-statement wait statement if-statement case-statement loop-statement null-statement exit-statement next-statement assertion-statement report-statement procedure-call-statement return end process [process label]

A set of signals to which the process is sensitive is defined by the sensitivity list In other words each time an event occurs on any of the signals in the sensitivity list the sequential statements with in the process are executed in a sequential order that is in the order in which they appear The process then suspends after executing the last sequential statement and waits for another event to occur on a signal in the sensitivity list

12

VARIABLE ASSIGNMENT STAEMENT

Variables can be declared and used inside a process statement A variable is assigned a value using the variable assignment statement that typically has the form

Variable-object = expression

The expression is evaluated when the statement is executed and the computed value is assigned to the variable object instantaneously that is at the concurrent simulation time

A variable can be declared outside of a process or subprogram Such a variable can be read and updated by more than one process These variables are called shared variables

SIGNAL ASSIGNMENT STATEMENT

Signals are assigned values using a signal assignment statement The simplest form of a signal assignment statement is

Signal-object lt= expression [after a delay value]

A signal assignment statement can appear within a process or outside of a process If it occurs outside of a process it is considered to be a concurrent signal assignment statement

When a signal assignment statement appears with in a process it is considered to be a sequential signal assignment statement and is executed in sequences with respect to the other statements which appear with in the process

27 CONDITIONAL STATEMENTS

IF STATEMENT An if statement selects a sequence of statements for execution of statements for execution based on the value of a condition the condition The condition can be any expression that evaluates to a Boolean value The general form of an if statement is

If Boolean expression thenSequential statementselsif Boolean-expression thenSequential-statements[else sequential statements] end if

The if statement is executed by checking each condition sequentially until the first true condition is found the set of sequential statements associated with this condition is executed An if statement is also a sequential statement

13

CASE STATEMENT

The format of a case statement is Case expression is

When choices =gtsequential statementsWhen choices =gtsequential statements End case

The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement

LOOP STATEMENTS

A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is

[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]

14

28 Active HDL Overview

Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries

Standards Supported

VHDL

The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard

Verilog

The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL

EDIF

Active-HDL supports Electronic Design Interchange Format version 2 0 0

VITAL

The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21

WAVES

Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system

15

29 ACTIVE-HDL Macro Language

All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)

1 HDL Editor

HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts

2 Block Diagram Editor

Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

3 State Diagram Editor

State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

4 Waveform Editor

Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors

5 Design Browser

The Design Browser window displays the contents of the current design that is

Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the

current design

16

6 Console window

The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console

210 Compilation

Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following

VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)

In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram

A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection

Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled

211 Simulation

The purpose of simulation is to verify that the circuit works as desired

The Active-HDL simulator provides two simulation engines

Event-Driven Simulation Cycle-Based Simulation

17

The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven

212 XILINX

Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish

ISE enables you to start your design with any of a number of different source types including

HDL (VHDL Verilog HDL ABEL)

Schematic design files

EDIF

NGCNGO

State Machines

IP Cores

From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration

Design Entry

ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports

18

Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow

CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories

Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints

PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints

State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL

Implementation

Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file

Map - The Map program maps a logical design to a Xilinx FPGA

Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator

Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design

FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing

Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design

Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file

Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments

19

Device Download and Program File Formatting

BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration

iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device

XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices

Integration with ChipScope Pro

CH 3 Introduction to SEA

Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication

In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle

20

SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo

Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems

For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted

31 Specifications

Parameters and Definitions

SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds

As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0

Basic Operations

21

Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _

These operations are formally defined as follows

1 Bitwise XOR

The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1

2 Substitution Box S

SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition

S Znb2b rarr Znb

2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR

Word Rotation R

The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1

Bit Rotation r

The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word

22

Addition mod2b _

The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1

The Round and Key Round

Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that

[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri

[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri

[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi

23

R

R-1

r S

ki

Li Ri

Li+1 Ri+1

KLi KRi

R r S

Ci

KLi+1

KRi+1

FIG 31 Encryptdecrypt round and key round

The Complete Cipher

The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K

key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))

encryptionfor i in 1 to nr2

[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)

24

finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD

32 Design Properties of the Components

Substitution Box S

The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table

Bit and Word Rotations r and R

The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3

The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds

Addition mod 2b _

Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in

25

most processors (4) necessity to avoid structural attacks

33 Overall Structure

The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round

34 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis

From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round

Extensions of Linear and Differential Cryptanalysis

Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear

26

cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack

However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC

It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks

As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account

A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a

These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and

27

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 12: Scalable Encryption Algorithm

Architecture ltarchitecture namegt oflt entity namegt is Begin Concurrent statements Process statements Block statements Concurrent signal assignment-statement Component ndashinstantiation-statement Generate statement End [architecture] [architecture name]

The concurrent statements describe the internal composition of the entity All

concurrent statements are executed in parallel The internal composition of an entity can be expressed in terms of structure dataflow and sequential behavior

Here we describe an entity by using the behavioral model A process statement which is a concurrent statement is the primary mechanism used to describe the functionality of an entity in this modeling style

26 PROCESS STATEMENT

A process statement contains sequential statements that describe the functionality of a portion of an entity in sequential terms The syntax for the process statement is

[Process-label] process [(sensitivity-list)] [is] begin sequential statements variable-assignment-statement signal assignment-statement wait statement if-statement case-statement loop-statement null-statement exit-statement next-statement assertion-statement report-statement procedure-call-statement return end process [process label]

A set of signals to which the process is sensitive is defined by the sensitivity list In other words each time an event occurs on any of the signals in the sensitivity list the sequential statements with in the process are executed in a sequential order that is in the order in which they appear The process then suspends after executing the last sequential statement and waits for another event to occur on a signal in the sensitivity list

12

VARIABLE ASSIGNMENT STAEMENT

Variables can be declared and used inside a process statement A variable is assigned a value using the variable assignment statement that typically has the form

Variable-object = expression

The expression is evaluated when the statement is executed and the computed value is assigned to the variable object instantaneously that is at the concurrent simulation time

A variable can be declared outside of a process or subprogram Such a variable can be read and updated by more than one process These variables are called shared variables

SIGNAL ASSIGNMENT STATEMENT

Signals are assigned values using a signal assignment statement The simplest form of a signal assignment statement is

Signal-object lt= expression [after a delay value]

A signal assignment statement can appear within a process or outside of a process If it occurs outside of a process it is considered to be a concurrent signal assignment statement

When a signal assignment statement appears with in a process it is considered to be a sequential signal assignment statement and is executed in sequences with respect to the other statements which appear with in the process

27 CONDITIONAL STATEMENTS

IF STATEMENT An if statement selects a sequence of statements for execution of statements for execution based on the value of a condition the condition The condition can be any expression that evaluates to a Boolean value The general form of an if statement is

If Boolean expression thenSequential statementselsif Boolean-expression thenSequential-statements[else sequential statements] end if

The if statement is executed by checking each condition sequentially until the first true condition is found the set of sequential statements associated with this condition is executed An if statement is also a sequential statement

13

CASE STATEMENT

The format of a case statement is Case expression is

When choices =gtsequential statementsWhen choices =gtsequential statements End case

The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement

LOOP STATEMENTS

A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is

[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]

14

28 Active HDL Overview

Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries

Standards Supported

VHDL

The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard

Verilog

The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL

EDIF

Active-HDL supports Electronic Design Interchange Format version 2 0 0

VITAL

The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21

WAVES

Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system

15

29 ACTIVE-HDL Macro Language

All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)

1 HDL Editor

HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts

2 Block Diagram Editor

Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

3 State Diagram Editor

State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

4 Waveform Editor

Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors

5 Design Browser

The Design Browser window displays the contents of the current design that is

Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the

current design

16

6 Console window

The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console

210 Compilation

Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following

VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)

In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram

A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection

Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled

211 Simulation

The purpose of simulation is to verify that the circuit works as desired

The Active-HDL simulator provides two simulation engines

Event-Driven Simulation Cycle-Based Simulation

17

The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven

212 XILINX

Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish

ISE enables you to start your design with any of a number of different source types including

HDL (VHDL Verilog HDL ABEL)

Schematic design files

EDIF

NGCNGO

State Machines

IP Cores

From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration

Design Entry

ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports

18

Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow

CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories

Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints

PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints

State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL

Implementation

Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file

Map - The Map program maps a logical design to a Xilinx FPGA

Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator

Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design

FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing

Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design

Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file

Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments

19

Device Download and Program File Formatting

BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration

iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device

XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices

Integration with ChipScope Pro

CH 3 Introduction to SEA

Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication

In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle

20

SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo

Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems

For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted

31 Specifications

Parameters and Definitions

SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds

As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0

Basic Operations

21

Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _

These operations are formally defined as follows

1 Bitwise XOR

The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1

2 Substitution Box S

SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition

S Znb2b rarr Znb

2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR

Word Rotation R

The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1

Bit Rotation r

The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word

22

Addition mod2b _

The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1

The Round and Key Round

Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that

[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri

[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri

[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi

23

R

R-1

r S

ki

Li Ri

Li+1 Ri+1

KLi KRi

R r S

Ci

KLi+1

KRi+1

FIG 31 Encryptdecrypt round and key round

The Complete Cipher

The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K

key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))

encryptionfor i in 1 to nr2

[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)

24

finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD

32 Design Properties of the Components

Substitution Box S

The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table

Bit and Word Rotations r and R

The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3

The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds

Addition mod 2b _

Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in

25

most processors (4) necessity to avoid structural attacks

33 Overall Structure

The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round

34 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis

From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round

Extensions of Linear and Differential Cryptanalysis

Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear

26

cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack

However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC

It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks

As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account

A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a

These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and

27

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 13: Scalable Encryption Algorithm

VARIABLE ASSIGNMENT STAEMENT

Variables can be declared and used inside a process statement A variable is assigned a value using the variable assignment statement that typically has the form

Variable-object = expression

The expression is evaluated when the statement is executed and the computed value is assigned to the variable object instantaneously that is at the concurrent simulation time

A variable can be declared outside of a process or subprogram Such a variable can be read and updated by more than one process These variables are called shared variables

SIGNAL ASSIGNMENT STATEMENT

Signals are assigned values using a signal assignment statement The simplest form of a signal assignment statement is

Signal-object lt= expression [after a delay value]

A signal assignment statement can appear within a process or outside of a process If it occurs outside of a process it is considered to be a concurrent signal assignment statement

When a signal assignment statement appears with in a process it is considered to be a sequential signal assignment statement and is executed in sequences with respect to the other statements which appear with in the process

27 CONDITIONAL STATEMENTS

IF STATEMENT An if statement selects a sequence of statements for execution of statements for execution based on the value of a condition the condition The condition can be any expression that evaluates to a Boolean value The general form of an if statement is

If Boolean expression thenSequential statementselsif Boolean-expression thenSequential-statements[else sequential statements] end if

The if statement is executed by checking each condition sequentially until the first true condition is found the set of sequential statements associated with this condition is executed An if statement is also a sequential statement

13

CASE STATEMENT

The format of a case statement is Case expression is

When choices =gtsequential statementsWhen choices =gtsequential statements End case

The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement

LOOP STATEMENTS

A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is

[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]

14

28 Active HDL Overview

Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries

Standards Supported

VHDL

The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard

Verilog

The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL

EDIF

Active-HDL supports Electronic Design Interchange Format version 2 0 0

VITAL

The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21

WAVES

Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system

15

29 ACTIVE-HDL Macro Language

All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)

1 HDL Editor

HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts

2 Block Diagram Editor

Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

3 State Diagram Editor

State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

4 Waveform Editor

Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors

5 Design Browser

The Design Browser window displays the contents of the current design that is

Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the

current design

16

6 Console window

The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console

210 Compilation

Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following

VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)

In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram

A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection

Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled

211 Simulation

The purpose of simulation is to verify that the circuit works as desired

The Active-HDL simulator provides two simulation engines

Event-Driven Simulation Cycle-Based Simulation

17

The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven

212 XILINX

Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish

ISE enables you to start your design with any of a number of different source types including

HDL (VHDL Verilog HDL ABEL)

Schematic design files

EDIF

NGCNGO

State Machines

IP Cores

From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration

Design Entry

ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports

18

Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow

CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories

Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints

PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints

State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL

Implementation

Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file

Map - The Map program maps a logical design to a Xilinx FPGA

Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator

Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design

FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing

Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design

Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file

Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments

19

Device Download and Program File Formatting

BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration

iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device

XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices

Integration with ChipScope Pro

CH 3 Introduction to SEA

Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication

In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle

20

SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo

Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems

For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted

31 Specifications

Parameters and Definitions

SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds

As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0

Basic Operations

21

Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _

These operations are formally defined as follows

1 Bitwise XOR

The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1

2 Substitution Box S

SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition

S Znb2b rarr Znb

2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR

Word Rotation R

The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1

Bit Rotation r

The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word

22

Addition mod2b _

The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1

The Round and Key Round

Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that

[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri

[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri

[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi

23

R

R-1

r S

ki

Li Ri

Li+1 Ri+1

KLi KRi

R r S

Ci

KLi+1

KRi+1

FIG 31 Encryptdecrypt round and key round

The Complete Cipher

The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K

key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))

encryptionfor i in 1 to nr2

[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)

24

finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD

32 Design Properties of the Components

Substitution Box S

The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table

Bit and Word Rotations r and R

The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3

The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds

Addition mod 2b _

Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in

25

most processors (4) necessity to avoid structural attacks

33 Overall Structure

The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round

34 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis

From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round

Extensions of Linear and Differential Cryptanalysis

Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear

26

cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack

However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC

It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks

As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account

A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a

These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and

27

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 14: Scalable Encryption Algorithm

CASE STATEMENT

The format of a case statement is Case expression is

When choices =gtsequential statementsWhen choices =gtsequential statements End case

The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement

LOOP STATEMENTS

A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is

[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]

14

28 Active HDL Overview

Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries

Standards Supported

VHDL

The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard

Verilog

The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL

EDIF

Active-HDL supports Electronic Design Interchange Format version 2 0 0

VITAL

The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21

WAVES

Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system

15

29 ACTIVE-HDL Macro Language

All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)

1 HDL Editor

HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts

2 Block Diagram Editor

Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

3 State Diagram Editor

State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

4 Waveform Editor

Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors

5 Design Browser

The Design Browser window displays the contents of the current design that is

Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the

current design

16

6 Console window

The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console

210 Compilation

Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following

VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)

In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram

A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection

Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled

211 Simulation

The purpose of simulation is to verify that the circuit works as desired

The Active-HDL simulator provides two simulation engines

Event-Driven Simulation Cycle-Based Simulation

17

The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven

212 XILINX

Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish

ISE enables you to start your design with any of a number of different source types including

HDL (VHDL Verilog HDL ABEL)

Schematic design files

EDIF

NGCNGO

State Machines

IP Cores

From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration

Design Entry

ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports

18

Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow

CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories

Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints

PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints

State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL

Implementation

Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file

Map - The Map program maps a logical design to a Xilinx FPGA

Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator

Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design

FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing

Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design

Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file

Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments

19

Device Download and Program File Formatting

BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration

iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device

XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices

Integration with ChipScope Pro

CH 3 Introduction to SEA

Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication

In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle

20

SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo

Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems

For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted

31 Specifications

Parameters and Definitions

SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds

As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0

Basic Operations

21

Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _

These operations are formally defined as follows

1 Bitwise XOR

The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1

2 Substitution Box S

SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition

S Znb2b rarr Znb

2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR

Word Rotation R

The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1

Bit Rotation r

The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word

22

Addition mod2b _

The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1

The Round and Key Round

Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that

[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri

[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri

[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi

23

R

R-1

r S

ki

Li Ri

Li+1 Ri+1

KLi KRi

R r S

Ci

KLi+1

KRi+1

FIG 31 Encryptdecrypt round and key round

The Complete Cipher

The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K

key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))

encryptionfor i in 1 to nr2

[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)

24

finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD

32 Design Properties of the Components

Substitution Box S

The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table

Bit and Word Rotations r and R

The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3

The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds

Addition mod 2b _

Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in

25

most processors (4) necessity to avoid structural attacks

33 Overall Structure

The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round

34 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis

From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round

Extensions of Linear and Differential Cryptanalysis

Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear

26

cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack

However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC

It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks

As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account

A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a

These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and

27

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 15: Scalable Encryption Algorithm

28 Active HDL Overview

Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries

Standards Supported

VHDL

The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard

Verilog

The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL

EDIF

Active-HDL supports Electronic Design Interchange Format version 2 0 0

VITAL

The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21

WAVES

Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system

15

29 ACTIVE-HDL Macro Language

All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)

1 HDL Editor

HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts

2 Block Diagram Editor

Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

3 State Diagram Editor

State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

4 Waveform Editor

Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors

5 Design Browser

The Design Browser window displays the contents of the current design that is

Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the

current design

16

6 Console window

The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console

210 Compilation

Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following

VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)

In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram

A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection

Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled

211 Simulation

The purpose of simulation is to verify that the circuit works as desired

The Active-HDL simulator provides two simulation engines

Event-Driven Simulation Cycle-Based Simulation

17

The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven

212 XILINX

Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish

ISE enables you to start your design with any of a number of different source types including

HDL (VHDL Verilog HDL ABEL)

Schematic design files

EDIF

NGCNGO

State Machines

IP Cores

From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration

Design Entry

ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports

18

Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow

CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories

Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints

PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints

State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL

Implementation

Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file

Map - The Map program maps a logical design to a Xilinx FPGA

Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator

Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design

FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing

Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design

Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file

Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments

19

Device Download and Program File Formatting

BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration

iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device

XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices

Integration with ChipScope Pro

CH 3 Introduction to SEA

Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication

In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle

20

SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo

Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems

For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted

31 Specifications

Parameters and Definitions

SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds

As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0

Basic Operations

21

Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _

These operations are formally defined as follows

1 Bitwise XOR

The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1

2 Substitution Box S

SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition

S Znb2b rarr Znb

2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR

Word Rotation R

The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1

Bit Rotation r

The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word

22

Addition mod2b _

The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1

The Round and Key Round

Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that

[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri

[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri

[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi

23

R

R-1

r S

ki

Li Ri

Li+1 Ri+1

KLi KRi

R r S

Ci

KLi+1

KRi+1

FIG 31 Encryptdecrypt round and key round

The Complete Cipher

The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K

key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))

encryptionfor i in 1 to nr2

[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)

24

finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD

32 Design Properties of the Components

Substitution Box S

The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table

Bit and Word Rotations r and R

The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3

The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds

Addition mod 2b _

Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in

25

most processors (4) necessity to avoid structural attacks

33 Overall Structure

The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round

34 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis

From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round

Extensions of Linear and Differential Cryptanalysis

Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear

26

cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack

However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC

It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks

As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account

A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a

These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and

27

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 16: Scalable Encryption Algorithm

29 ACTIVE-HDL Macro Language

All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)

1 HDL Editor

HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts

2 Block Diagram Editor

Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

3 State Diagram Editor

State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code

4 Waveform Editor

Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors

5 Design Browser

The Design Browser window displays the contents of the current design that is

Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the

current design

16

6 Console window

The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console

210 Compilation

Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following

VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)

In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram

A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection

Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled

211 Simulation

The purpose of simulation is to verify that the circuit works as desired

The Active-HDL simulator provides two simulation engines

Event-Driven Simulation Cycle-Based Simulation

17

The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven

212 XILINX

Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish

ISE enables you to start your design with any of a number of different source types including

HDL (VHDL Verilog HDL ABEL)

Schematic design files

EDIF

NGCNGO

State Machines

IP Cores

From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration

Design Entry

ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports

18

Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow

CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories

Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints

PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints

State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL

Implementation

Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file

Map - The Map program maps a logical design to a Xilinx FPGA

Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator

Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design

FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing

Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design

Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file

Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments

19

Device Download and Program File Formatting

BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration

iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device

XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices

Integration with ChipScope Pro

CH 3 Introduction to SEA

Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication

In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle

20

SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo

Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems

For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted

31 Specifications

Parameters and Definitions

SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds

As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0

Basic Operations

21

Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _

These operations are formally defined as follows

1 Bitwise XOR

The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1

2 Substitution Box S

SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition

S Znb2b rarr Znb

2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR

Word Rotation R

The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1

Bit Rotation r

The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word

22

Addition mod2b _

The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1

The Round and Key Round

Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that

[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri

[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri

[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi

23

R

R-1

r S

ki

Li Ri

Li+1 Ri+1

KLi KRi

R r S

Ci

KLi+1

KRi+1

FIG 31 Encryptdecrypt round and key round

The Complete Cipher

The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K

key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))

encryptionfor i in 1 to nr2

[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)

24

finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD

32 Design Properties of the Components

Substitution Box S

The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table

Bit and Word Rotations r and R

The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3

The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds

Addition mod 2b _

Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in

25

most processors (4) necessity to avoid structural attacks

33 Overall Structure

The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round

34 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis

From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round

Extensions of Linear and Differential Cryptanalysis

Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear

26

cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack

However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC

It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks

As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account

A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a

These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and

27

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 17: Scalable Encryption Algorithm

6 Console window

The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console

210 Compilation

Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following

VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)

In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram

A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection

Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled

211 Simulation

The purpose of simulation is to verify that the circuit works as desired

The Active-HDL simulator provides two simulation engines

Event-Driven Simulation Cycle-Based Simulation

17

The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven

212 XILINX

Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish

ISE enables you to start your design with any of a number of different source types including

HDL (VHDL Verilog HDL ABEL)

Schematic design files

EDIF

NGCNGO

State Machines

IP Cores

From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration

Design Entry

ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports

18

Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow

CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories

Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints

PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints

State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL

Implementation

Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file

Map - The Map program maps a logical design to a Xilinx FPGA

Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator

Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design

FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing

Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design

Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file

Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments

19

Device Download and Program File Formatting

BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration

iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device

XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices

Integration with ChipScope Pro

CH 3 Introduction to SEA

Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication

In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle

20

SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo

Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems

For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted

31 Specifications

Parameters and Definitions

SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds

As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0

Basic Operations

21

Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _

These operations are formally defined as follows

1 Bitwise XOR

The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1

2 Substitution Box S

SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition

S Znb2b rarr Znb

2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR

Word Rotation R

The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1

Bit Rotation r

The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word

22

Addition mod2b _

The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1

The Round and Key Round

Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that

[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri

[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri

[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi

23

R

R-1

r S

ki

Li Ri

Li+1 Ri+1

KLi KRi

R r S

Ci

KLi+1

KRi+1

FIG 31 Encryptdecrypt round and key round

The Complete Cipher

The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K

key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))

encryptionfor i in 1 to nr2

[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)

24

finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD

32 Design Properties of the Components

Substitution Box S

The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table

Bit and Word Rotations r and R

The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3

The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds

Addition mod 2b _

Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in

25

most processors (4) necessity to avoid structural attacks

33 Overall Structure

The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round

34 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis

From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round

Extensions of Linear and Differential Cryptanalysis

Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear

26

cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack

However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC

It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks

As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account

A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a

These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and

27

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 18: Scalable Encryption Algorithm

The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven

212 XILINX

Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish

ISE enables you to start your design with any of a number of different source types including

HDL (VHDL Verilog HDL ABEL)

Schematic design files

EDIF

NGCNGO

State Machines

IP Cores

From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration

Design Entry

ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports

18

Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow

CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories

Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints

PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints

State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL

Implementation

Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file

Map - The Map program maps a logical design to a Xilinx FPGA

Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator

Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design

FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing

Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design

Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file

Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments

19

Device Download and Program File Formatting

BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration

iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device

XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices

Integration with ChipScope Pro

CH 3 Introduction to SEA

Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication

In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle

20

SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo

Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems

For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted

31 Specifications

Parameters and Definitions

SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds

As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0

Basic Operations

21

Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _

These operations are formally defined as follows

1 Bitwise XOR

The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1

2 Substitution Box S

SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition

S Znb2b rarr Znb

2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR

Word Rotation R

The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1

Bit Rotation r

The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word

22

Addition mod2b _

The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1

The Round and Key Round

Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that

[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri

[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri

[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi

23

R

R-1

r S

ki

Li Ri

Li+1 Ri+1

KLi KRi

R r S

Ci

KLi+1

KRi+1

FIG 31 Encryptdecrypt round and key round

The Complete Cipher

The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K

key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))

encryptionfor i in 1 to nr2

[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)

24

finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD

32 Design Properties of the Components

Substitution Box S

The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table

Bit and Word Rotations r and R

The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3

The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds

Addition mod 2b _

Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in

25

most processors (4) necessity to avoid structural attacks

33 Overall Structure

The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round

34 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis

From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round

Extensions of Linear and Differential Cryptanalysis

Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear

26

cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack

However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC

It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks

As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account

A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a

These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and

27

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 19: Scalable Encryption Algorithm

Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow

CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories

Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints

PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints

State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL

Implementation

Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file

Map - The Map program maps a logical design to a Xilinx FPGA

Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator

Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design

FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing

Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design

Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file

Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments

19

Device Download and Program File Formatting

BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration

iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device

XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices

Integration with ChipScope Pro

CH 3 Introduction to SEA

Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication

In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle

20

SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo

Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems

For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted

31 Specifications

Parameters and Definitions

SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds

As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0

Basic Operations

21

Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _

These operations are formally defined as follows

1 Bitwise XOR

The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1

2 Substitution Box S

SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition

S Znb2b rarr Znb

2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR

Word Rotation R

The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1

Bit Rotation r

The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word

22

Addition mod2b _

The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1

The Round and Key Round

Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that

[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri

[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri

[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi

23

R

R-1

r S

ki

Li Ri

Li+1 Ri+1

KLi KRi

R r S

Ci

KLi+1

KRi+1

FIG 31 Encryptdecrypt round and key round

The Complete Cipher

The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K

key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))

encryptionfor i in 1 to nr2

[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)

24

finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD

32 Design Properties of the Components

Substitution Box S

The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table

Bit and Word Rotations r and R

The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3

The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds

Addition mod 2b _

Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in

25

most processors (4) necessity to avoid structural attacks

33 Overall Structure

The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round

34 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis

From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round

Extensions of Linear and Differential Cryptanalysis

Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear

26

cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack

However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC

It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks

As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account

A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a

These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and

27

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 20: Scalable Encryption Algorithm

Device Download and Program File Formatting

BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration

iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device

XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices

Integration with ChipScope Pro

CH 3 Introduction to SEA

Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication

In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle

20

SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo

Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems

For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted

31 Specifications

Parameters and Definitions

SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds

As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0

Basic Operations

21

Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _

These operations are formally defined as follows

1 Bitwise XOR

The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1

2 Substitution Box S

SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition

S Znb2b rarr Znb

2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR

Word Rotation R

The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1

Bit Rotation r

The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word

22

Addition mod2b _

The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1

The Round and Key Round

Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that

[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri

[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri

[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi

23

R

R-1

r S

ki

Li Ri

Li+1 Ri+1

KLi KRi

R r S

Ci

KLi+1

KRi+1

FIG 31 Encryptdecrypt round and key round

The Complete Cipher

The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K

key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))

encryptionfor i in 1 to nr2

[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)

24

finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD

32 Design Properties of the Components

Substitution Box S

The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table

Bit and Word Rotations r and R

The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3

The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds

Addition mod 2b _

Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in

25

most processors (4) necessity to avoid structural attacks

33 Overall Structure

The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round

34 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis

From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round

Extensions of Linear and Differential Cryptanalysis

Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear

26

cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack

However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC

It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks

As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account

A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a

These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and

27

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 21: Scalable Encryption Algorithm

SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo

Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems

For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted

31 Specifications

Parameters and Definitions

SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds

As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0

Basic Operations

21

Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _

These operations are formally defined as follows

1 Bitwise XOR

The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1

2 Substitution Box S

SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition

S Znb2b rarr Znb

2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR

Word Rotation R

The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1

Bit Rotation r

The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word

22

Addition mod2b _

The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1

The Round and Key Round

Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that

[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri

[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri

[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi

23

R

R-1

r S

ki

Li Ri

Li+1 Ri+1

KLi KRi

R r S

Ci

KLi+1

KRi+1

FIG 31 Encryptdecrypt round and key round

The Complete Cipher

The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K

key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))

encryptionfor i in 1 to nr2

[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)

24

finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD

32 Design Properties of the Components

Substitution Box S

The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table

Bit and Word Rotations r and R

The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3

The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds

Addition mod 2b _

Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in

25

most processors (4) necessity to avoid structural attacks

33 Overall Structure

The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round

34 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis

From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round

Extensions of Linear and Differential Cryptanalysis

Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear

26

cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack

However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC

It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks

As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account

A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a

These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and

27

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 22: Scalable Encryption Algorithm

Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _

These operations are formally defined as follows

1 Bitwise XOR

The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1

2 Substitution Box S

SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition

S Znb2b rarr Znb

2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR

Word Rotation R

The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1

Bit Rotation r

The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word

22

Addition mod2b _

The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1

The Round and Key Round

Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that

[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri

[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri

[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi

23

R

R-1

r S

ki

Li Ri

Li+1 Ri+1

KLi KRi

R r S

Ci

KLi+1

KRi+1

FIG 31 Encryptdecrypt round and key round

The Complete Cipher

The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K

key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))

encryptionfor i in 1 to nr2

[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)

24

finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD

32 Design Properties of the Components

Substitution Box S

The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table

Bit and Word Rotations r and R

The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3

The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds

Addition mod 2b _

Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in

25

most processors (4) necessity to avoid structural attacks

33 Overall Structure

The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round

34 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis

From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round

Extensions of Linear and Differential Cryptanalysis

Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear

26

cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack

However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC

It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks

As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account

A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a

These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and

27

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 23: Scalable Encryption Algorithm

Addition mod2b _

The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1

The Round and Key Round

Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that

[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri

[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri

[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi

23

R

R-1

r S

ki

Li Ri

Li+1 Ri+1

KLi KRi

R r S

Ci

KLi+1

KRi+1

FIG 31 Encryptdecrypt round and key round

The Complete Cipher

The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K

key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))

encryptionfor i in 1 to nr2

[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)

24

finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD

32 Design Properties of the Components

Substitution Box S

The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table

Bit and Word Rotations r and R

The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3

The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds

Addition mod 2b _

Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in

25

most processors (4) necessity to avoid structural attacks

33 Overall Structure

The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round

34 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis

From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round

Extensions of Linear and Differential Cryptanalysis

Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear

26

cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack

However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC

It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks

As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account

A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a

These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and

27

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 24: Scalable Encryption Algorithm

R

R-1

r S

ki

Li Ri

Li+1 Ri+1

KLi KRi

R r S

Ci

KLi+1

KRi+1

FIG 31 Encryptdecrypt round and key round

The Complete Cipher

The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K

key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))

encryptionfor i in 1 to nr2

[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)

24

finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD

32 Design Properties of the Components

Substitution Box S

The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table

Bit and Word Rotations r and R

The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3

The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds

Addition mod 2b _

Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in

25

most processors (4) necessity to avoid structural attacks

33 Overall Structure

The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round

34 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis

From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round

Extensions of Linear and Differential Cryptanalysis

Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear

26

cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack

However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC

It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks

As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account

A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a

These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and

27

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 25: Scalable Encryption Algorithm

finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD

32 Design Properties of the Components

Substitution Box S

The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table

Bit and Word Rotations r and R

The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3

The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds

Addition mod 2b _

Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in

25

most processors (4) necessity to avoid structural attacks

33 Overall Structure

The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round

34 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis

From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round

Extensions of Linear and Differential Cryptanalysis

Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear

26

cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack

However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC

It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks

As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account

A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a

These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and

27

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 26: Scalable Encryption Algorithm

most processors (4) necessity to avoid structural attacks

33 Overall Structure

The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round

34 Security Analysis

Resistance Against Known Attacks

Linear and Differential Cryptanalysis

From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round

Extensions of Linear and Differential Cryptanalysis

Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear

26

cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack

However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC

It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks

As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account

A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a

These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and

27

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 27: Scalable Encryption Algorithm

cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack

However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC

It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks

As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account

A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a

Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a

These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and

27

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 28: Scalable Encryption Algorithm

K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys

Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb

Square Attacks

We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks

Truncated and Impossible Differentials

As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)

Interpolation Attacks

The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway

Slide Attacks

The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has

28

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 29: Scalable Encryption Algorithm

some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher

Related-Key Attacks

The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here

Complementation Properties

The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design

Algebraic Attacks

Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines

Suggested Number of Rounds

29

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 30: Scalable Encryption Algorithm

From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits

35 Performance Analysis

SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)

ndash Arithmetic and logic operators or andoplus_≫≪

ndash Branch instructions goto subroutine call and return

ndash Comparison load RAM in register store register in RAM

According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation

30

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 31: Scalable Encryption Algorithm

For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms

The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations

CH4 AN EXPOSITION OF THE SEA ALGORITHM

31

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 32: Scalable Encryption Algorithm

The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks

The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm

Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]

Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits

It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp

The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to

32

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 33: Scalable Encryption Algorithm

recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t

A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm

The Algorithm

41 Overview

The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6

CH 5 SEA Architecture Block Diagram

33

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 34: Scalable Encryption Algorithm

Mod

SBox

BR

WR

XOR

Round Reg

KeyReg[950]

KeyReg0[950]

KeyReg1[950]

KeyReg8[950]

KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K

Mod SBoxxxx BR

WR XOR Round Reg

Encryption Computational Block

Cipher data Register

Key0[950]

Key9[950]

DataI[950]DataLd

ClkRst

Mod SBox BR

IWR XOR Round Reg

Decryption Computational Block

Plain text data Register

Key0[950]

Key9[950]

DataO[950]DataRd

ClkRst

SMCClkRstEnaEDOvr

KeyI[950]KeyLd

ClkRst

FIG 51

51 KEY GENERATION

34

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 35: Scalable Encryption Algorithm

Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted

Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data

Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption

In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1

The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common

35

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 36: Scalable Encryption Algorithm

36

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 37: Scalable Encryption Algorithm

Cryptography

Cryptography is the art and science of secret writing The term is derived from the Greek language

krytos - secret graphos - writing

52 Encryption

Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications

Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own

The two main areas of cryptography are Cipher and Code

Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers

Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter

Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form

37

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 38: Scalable Encryption Algorithm

FIG 52 ENCRYPTION BLOCK

FIG 53 Encryption Operation

38

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 39: Scalable Encryption Algorithm

53 DECRYPTION

The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password

It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]

Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher

In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher

There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII

Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)

Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks

Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is

39

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 40: Scalable Encryption Algorithm

based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)

For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s

While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis

An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all

40

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 41: Scalable Encryption Algorithm

FIG 54 DECRYPTION BLOCK

41

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 42: Scalable Encryption Algorithm

FIG 55 Decryption Operation

SIMULATION RESULTS

42

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 43: Scalable Encryption Algorithm

Key Generation Results

43

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 44: Scalable Encryption Algorithm

Encryption Results

44

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 45: Scalable Encryption Algorithm

Decryption Results

45

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 46: Scalable Encryption Algorithm

SYNTHESIS REPORTS

KEY INPUT

RTL SCHEMATIC

GATE LEVEL

SYNTHESIS REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p

46

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 47: Scalable Encryption Algorithm

xc2s15-cs144-6 keyregngc keyregngd

Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

Writing NGD file keyregngd

Writing NGDBUILD log file keyregbld

Release 61i Map G23Xilinx Mapping Report File for Design keyreg

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

MAPPING REPORT

Relekeyreg

Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm

47

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 48: Scalable Encryption Algorithm

area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009

Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

Placing amp Routing Report

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization

48

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 49: Scalable Encryption Algorithm

Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB

KEY REGISTER

Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed

49

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 50: Scalable Encryption Algorithm

Ignore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes

50

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 51: Scalable Encryption Algorithm

Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated

========================================================================= HDL Synthesis =========================================================================

Synthesizing Unit ltkeyreggt

51

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 52: Scalable Encryption Algorithm

Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary

inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized

=========================================================================HDL Synthesis Report

Macro Statistics Registers 1 96-bit register 1

=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28

========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg

52

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 53: Scalable Encryption Algorithm

Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 195

Macro Statistics Registers 1 96-bit register 1

Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================

Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25

WARNINGXst1336 - () More than 100 of Device resources are used

=========================================================================TIMING REPORT

NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

53

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 54: Scalable Encryption Algorithm

GENERATED AFTER PLACE-and-ROUTE

Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+

Timing Summary---------------Speed Grade -6

Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found

Timing Detail--------------All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising

Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------

Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising

54

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 55: Scalable Encryption Algorithm

Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)

=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt

Total memory usage is 54400 kilobytes

SBOX

55

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 56: Scalable Encryption Algorithm

RTL SCHEMATIC

GATE LEVEL

========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters

56

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 57: Scalable Encryption Algorithm

Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO

57

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 58: Scalable Encryption Algorithm

Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO

=========================================================================

WARNINGXst1885 - LSO file is empty default list of libraries is used

========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date

========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated

========================================================================= HDL Synthesis =========================================================================

58

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 59: Scalable Encryption Algorithm

Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized

========================================================================HDL Synthesis Report

Found no macro=========================================================================

========================================================================= Advanced HDL Synthesis =========================================================================

========================================================================= Low Level Synthesis =========================================================================

Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx

Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1

========================================================================= Final Report =========================================================================

59

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 60: Scalable Encryption Algorithm

Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO

Design Statistics IOs 7

Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------

Selected Device 2s15cs144-6

Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7

Total memory usage is 53376 kilobytes

TRANSLATION REPORT

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 37996 kilobytes

60

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 61: Scalable Encryption Algorithm

FLOOR PLANNING

61

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 62: Scalable Encryption Algorithm

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8

Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB

Maping Report

Device utilization summary

Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0

Number of SLICEs 2 out of 192 1

The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0

The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512

The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707

62

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 63: Scalable Encryption Algorithm

KEY GENERATION

RTL SCHEMATIC

63

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 64: Scalable Encryption Algorithm

GATE LEVEL

64

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 65: Scalable Encryption Algorithm

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

65

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 66: Scalable Encryption Algorithm

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

TRANSLATION REPORT

Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved

Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd

Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion

Checking timing specifications Checking expanded design

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

Writing NGD file keygenblockngd

Writing NGDBUILD log file keygenblockbld

66

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 67: Scalable Encryption Algorithm

MAPPING REPORT

Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25

Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB

ENCRYPTION

67

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 68: Scalable Encryption Algorithm

RTL SCHEMATIC

68

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 69: Scalable Encryption Algorithm

GATE LEVEL

SYNTHESIS REPORT

69

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 70: Scalable Encryption Algorithm

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

70

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 71: Scalable Encryption Algorithm

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 42092 kilobytes

71

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 72: Scalable Encryption Algorithm

DECRYPTION

GATE LEVEL

72

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 73: Scalable Encryption Algorithm

SYNTHESIS REPORT

========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory

---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144

---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No

---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto

73

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 74: Scalable Encryption Algorithm

---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5

---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO

Translation Report

NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0

Total memory usage is 45122 kilobytes

74

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 75: Scalable Encryption Algorithm

75

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 76: Scalable Encryption Algorithm

ADVANTAGES

SEA is parametric in text key and processor size

It is a low cost encryption routine targeted for the processors with limited instruction set

It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size

It is also used in applications where the same constrained device has to perform both encryption and decryption

APPLICATIONS

This is a low-cost encryption routine basically designed for processors with a limited instruction set

In wireless communication and mobile computing and networking systems

For the encryption of JPEG2000 images

In scalable video coding

In sensor networks and RFIDrsquos

76

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 77: Scalable Encryption Algorithm

CONCLUSION

SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required

This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations

77

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0
Page 78: Scalable Encryption Algorithm

Bibliography

Reference books

Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian

A VHDL Primer J Bhaskar

Digital Design Morris Mano

Data and Computer Communications William Stalling

Computer Networks Andrew S Tannenbaum

Network Cryptology William Stalling

Reference Websites

IEEE Transactions wwwwikipediacom

wwwwebopediacom

78

  • Appendix-I Simulation Results 47
  • Appendix-II Synthesis Reports 50
  • Appendix- III Implementation 79
  • Appendix-V Conclusion 81
  • CH1 INTRODUCTION TO VLSI
    • 12 TYPICAL IC DESIGN FLOW
      • 13 MICRON TECHNOLOGY
        • 21 INTRODUCTION TO VHDL
        • Standards Supported
          • VHDL
          • Verilog
          • EDIF
          • VITAL
          • 29 ACTIVE-HDL Macro Language
            • 210 Compilation
            • Design Entry
            • Implementation
            • Device Download and Program File Formatting
              • Number of errors 0