Abstract
SEA is a scalable encryption algorithm targeted for small embedded applications It was initially designed for software implementations in controllers smart cards or processors In this letter we investigate its performances in recent field-programmable gate array (FPGA) devices For this purpose a loop architecture of the block cipher is presented Beyond its low cost performances a significant advantage of the proposed architecture is its full flexibility for any parameter of the scalable encryption algorithm taking advantage of generic VHDL coding The letter also carefully describes the implementation details allowing us to keep small area requirements Finally a comparative performance discussion of SEA with the Advanced Encryption Standard Rijndael and ICEBERG (a cipher purposed for efficient FPGA implementations) is proposed It illustrates the interest of platformcontext-oriented block cipher design and as far as SEA is concerned its low area requirements and reasonable efficiency
Scalable encryption algorithm (SEA) is a parametric block cipher for resource constrained systems (eg sensor networks RFIDs) that has been introduced in [1] It was initially designed as a low-cost encryption authentication routine (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) Additionally and contrary to most recent block ciphers (eg the DES [2] and AES Rijndael [3] [4]) the algorithm takes the plaintext key and the bus sizes as parameters and therefore can be straightforwardly adapted to various implementation contexts andor security requirements Compared to older solutions for low-cost encryption like tiny encryption algorithm (TEA) [5] or Yuvalrsquos proposal [6] SEA also benefits from a stronger security analysis derived from recent advances in block cipher designcryptanalysis
In practice SEA has been proven to be an efficient solution for embedded software applications using microcontrollers but its hardware performances have not yet been investigated Consequently and as a first step towards hardware performance analysis this letter explores the features of a low-cost field-programmable gate array (FPGA) encryption decryption core for SEA In addition to the performance evaluation we show that the algorithmrsquos scalability can be turned into a fully generic VHDL design so that any text key and bus size can be straightforwardly reimplemented without any modification of the hardware description language with standard synthesis and implementation tools
1
CONTENTS
CHAPTER 1 Introduction to VLSI 9 11 Introduction 9 12 VLSI Design Style 10
13 VLSI Design Flow 11 14 VLSI Features 11
CHAPTER 2 Introduction to VHDL 1221 Introduction 1222 Capabilities 1323 Abstraction levels of VHDL 13 24 Basic Terminology 1425 Modeling Techniques for VHDL 1726 Process Statements 1827 Conditional Statements 19 28 Active HDL Overview 2129 Macro language 22 210 Compilation 23 211 Simulation 23 212 X Linix 24
CHAPTER 3 Introduction to SEA 26 31 Specifications 2732 Design properties 3033 Overall Structure 3134 Security Analysis 3135 Performance Analysis 35
CHAPTER 4 An Exposition Of SEA 37 41 Overview of SEA 38
CHAPTER 5 SEA Architecture 39 51 Key Generation 4052 Encryption 4253 Decryption 44
Appendix-I Simulation Results 47Appendix-II Synthesis Reports 50 Appendix- III Implementation 79
Appendix-IV Advantages 80Appendix-V Conclusion 81
Appendix-VI Bibliography 82
2
CH1 INTRODUCTION TO VLSI
The first digital circuit was designed by using electronic components like vacuum tubes and transistors Later Integrated Circuits (ICs) were invented where a designer can be able to place digital circuits on a chip consists of less than 10 gates for an IC called SSI (Small Scale Integration) scale With the advent of new fabrication techniques designer can place more than 100 gates on an IC called MSI (Medium Scale Integration) Using design at this level one can create digital sub blocks (adders multiplexes counters registers and etc) on an IC This level is LSI (Large Scale Integration) using this scale of integration people succeeded to make digital subsystems (Microprocessor IO peripheral devices and etc) on a chip
At this point design process started getting very complicated ie manually conversion from schematic level to gate level or gate level to layout level was becoming somewhat lengthy process and verifying the functionality of digital circuits at various levels became critical This created new challenges to digital designers as well as circuit designers Designers felt need to automate these processes In this process Rapid advances in Software Technology and development of new higher level programming languages taken place People could able to develop CADCAE (Computer Aided DesignComputer Aided Engineering) tools for design electronics circuits with assistance of software programs Functional verification and Logic verification of design can be done using CAD simulation tools with greater efficiency It became very easy to a designer to verify functionality of design at various levels
With advent of new technology ie CMOS (Complementary Metal Oxide Semiconductor) process technology One can fabricate a chip contains more than Million of gates At this point design process still became critical because of manual converting the design from one level to other Using latest CAD tools could solve the problem Existence of logic synthesis tools design engineer can easily translate to higher-level design description to lower levels This way of designing (using CAD tools) is certainly a revolution in electronic industry This may be leading to development of sophisticated electronic products for both consumer as well as business Designing Systems using Hardware always gives best results when compared to software (like Speed Reliability performance and etc) Using CMOS VLSI Design methodology designer could design and fabricate ICs without spending much time when compared to traditional way of designing
3
Specifications
Behavioral Description
RTL Description
Behavioral Simulation
Functional Simulation
Behavioral Synthesis
Logic Synthesis
Gate Level Net list
Constraints
Constraints
Library
AutomaticPampR
Layout
Logic simulation
Fabrication
Lay Out Management
12 TYPICAL IC DESIGN FLOW
4
Micron Technology
Micron SM DSM VDSM
13 MICRON TECHNOLOGY
The micron technology can be classified into 4 categories Evolving from micron technology and extending up to VDSM
Micron Technology The technology up to 10-6 m is the micron
Technology
Submicron Technology The technology below 1um is known as the Submicron technology It generally ranges up to 036m
DSM(Deep Sub Micron technology) The technology extending up to 018m is DSM
VDSM(Very Deep Sub Micron technology) The presently used technology is VDSM It ranges up to 009um
14 FEATURES
5
21 INTRODUCTION TO VHDL
VHDL is acronym for VHSIC hardware Description languageVHSIC is acronym for very high speed Integrated Circuits It is a hardware description language that can be used to model a digital system at many levels of abstraction ranging from the algorithmic level to the gate level
The VHDL language can be regarded as an integrated amalgamation of the following languages
Sequential language
Concurrent language
Net-list language
Timing specifications
Waveform generation language VHDL
This language not only defines the syntax but also defines very clear simulation semantics for each language construct Therefore models written in this language can be verified using a VHDL simulator This subset is usually sufficient to model most applications The complete language however has sufficient power to capture the descriptions of the most complex chips to a complete electronic system
HISTORY
The requirements for the language were first generated in 1988 under the VHSIC chips for the department of Defence (DOD) Reprocurement and reuse was also a big issue Thus a need for a standardized hardware description language for the design documentation and verification of the digital systems was generated The IEEE in the December 1987 standardized VHDL language this version of the language is known as the IEEE STD 1076-1987 The official language description appears in the IEEE standard VHDL language Reference manual available from IEEE The language has also been recognized as an American National Standards Institute (ANSI) standard
According to IEEE rules an IEEE standard has to be reballoted every 5 years so that it may remain a standard so that it may remain a standard Consequently the language was upgraded with new features the syntax of many constructs was made more uniform and many ambiguities present in the 1987 version of the language were resolved This new version of the language is known as the IEEE STD 1076-1993
6
22 CAPABILITIES
The following are the major capabilities that the language provides along with the features that the language provides along with the features that differentiate it from other hardware languages
The language can be used as exchange medium between chip vendors and CAD tool users Different chip vendors can provide VHDL descriptions of their components to system designers
The language can be used as a communication medium between different CAD and CAE tools
The language supports hierarchy that is a digital can be modeled as asset of interconnected components each component in turn can be modeled as a set of interconnected subcomponents
The language supports flexible design methodologies top-down bottom-up or mixed It supports both synchronous and asynchronous timing models
Various digital modeling techniques such as finite ndashstate machine descriptions and Boolean equations can be modeled using the language
The language is publicly available human-readable and machine-readable
The language supports three basic different styles Structural Dataflow and behavioral
It supports a wide range of abstraction levels ranging from abstract behavioral descriptions to very precise gate-level descriptions
Arbitrarily large designs can be modeled using the language and there are no limitations imposed by the language on the size of the design
23 HARDWARE ABSTRACTION
VHDL is used to describe a model for a digital hardware device This model specifies the external view of the device and one or more internal views The internal view of the device specifies functionality or structure while the external view specifies the interface of the device through which it communicates with the other modules in the environment In VHDL each device model is treated as a distinct representation of a unique device called an Entity The Entity is thus a hardware abstraction of the actual hardware device Each Entity is described using one model which contains one external view and one or more internal views
7
24 Basic terminology
VHDL is a hardware description language that can be used to model a digital system A hardware abstraction of this digital system is called an entity An entity X when used in another entity Y becomes a component for the entity YTo describe an entity VHDL provides five different types of primary constructs called design units They are
1 Entity declaration 2 Architecture body 3 Configuration declaration 4 Package declaration 5 Package body
1 An entity is modeled using an entity declaration and at least one architecture body the Entity declaration describes the external view of the entity
For example the input and output signal names2 The architecture body contains the internal description of the entity for
example as a set of interconnected components that represents the structure of the entity or a set of concurrent or sequential statements that represents the behavior of the entity
3 A configuration declaration is used to create a configuration for an entity It specifies the binding of one architecture body from the many architecture bodies that may be associated with the entity It may also specify the bindings of the architecture components used in the selected architecture body to other entities An entity may have any number of configurations
4 A package declaration encapsulates a set of related declarations such type of declaration s subtype declaration and subprogram declaration which can be shared across two or more design units
5 A package body contains the definition of subprogram declared in a package declaration
Once an entity has been modeled it needs to be validated by a VHDL system A typical VHDL system consists of an analyzer and a simulator The analyzer reads in one or more design units contained in a single file and compiles them into a design library after validating the syntax and performing some static checks
The language is case insensitive that is lowercase and uppercase characters are treated alike the Language is also free format comments are specified in the language by preceding the text with two Consecutive dashes (- -)
Entity Declaration
The entity declaration specifies the name of entity being modeled and lists the set of inter face ports Ports are signals through which entity communicates with other models in its external environment
8
EXAMPLE
Entity declaration for the half adder circuit is
Entity half adder is Port (A B in Bit sum carry out Bit) End half adder
The entity called half adder has two input ports A and B and two out put ports sum and carry Bit is predefined type of the language
Architecture Body
An architecture body using any of the following modeling styles specifies the internal details of an entity
1 As a set of interconnected components (to represent structure)2 As a set of concurrent assignment statements (to represent data flow)3 As a set of sequential assignment statements (to represent behavior)4 As any combination of the above three
25 Structural style of modeling
In this one an entity is described as a set of interconnected components Such a model for the HALF_ADDER entity is described in a n architecture body
Architecture ha of ha isComponent Xor2 Port (X Y in BIT Zout BIT)End componentComponent And2 Port (L M in BIT NoutBIT) End component Begin X1 Xor2portmap (A B SUM) A1 AND2portmap (A B CARRY) End ha
The name of the architecture body is ha the entity declaration for half adder specifies the interface ports for this architecture body The architecture body is composed of two parts the declaration part and the statement part Two component declarations are present in the declarative part of the architecture body
The declared components are instantiated in the statement part of the architecture body using component instantiation The signals in the port map of a component instantiation and the port signals in the component declaration are associated by the position
9
DATAFLOW STYLE OF MODELING
In this modeling style the flow of data through the entity is expressed primarily using concurrent signal assignment statements The data flow model for the half adder is described using two concurrent signal assignment statements In a signal assignment statement the symbol lt=implies an assignment of a value to a signal
BEHAVIORAL STYLE OF MODELING
The behavioral style of modeling specifies the behavior of an entity as a set of statements that are executed sequentially in the specific order These sets of sequential statements which are specified inside a process statement do not explicitly specify the structure of the entity but merely its functionality A process statement is a concurrent statement that can appear with in an architecture body MIXED STYLE OF MODELING
It is possible to mix the three modeling styles in a single architecture body That is within an architecture body we could use component instantiation statements concurrent signal assignment statements and process statements
MODEL ANALYSIS
Once an entity is declared in VHDL it can be validated using analyzer and a simulator that are apart of a VHDL system The first step in the validation process is analysis The analyzer takes a file that contains one or more design units and compile s them into an intermediate form The generated intermediate form is stored in a specific design library that has been designated as the working library
There is a design library with the logic name STD predefined by the VHDL language environment This library contains two packages STANDARD and TEXTIO The STANDARD package contains declarations for all the predefined types of the language The TexTIO package contains procedures and functions that are necessary for supporting formatted text read and write operations There also exists an IEEE standard package called STD_LOGIC_1164and contains its associated sub types overloaded operator functions and other useful utilities This standard is called the IEEE STD 1164 ndash1993
SIMULATION
For a hierarchical entity to be simulated all of its lowest ndashlevel components must be described at the behavioral level A simulation can be performed on either one of the following
1 An entity declaration and an architecture body pair
2 A configuration
10
Preceding the actual simulation are two major steps
1 Elaboration phase IN this phase the hierarchy of the entity is expanded and linked components are bound to entities in a library and the top- level entity is built as a network of behavioral models that is ready to be simulated
2 Initialization phase Driving and effective values for all explicitly declared signals are computed implicit signals are assigned values processes are executed once until they suspend and simulation time is set to 0ns
Simulation commences by advancing time to that of the next event Values that are assigned to signals at this time are assigned If the value of a signal changes and if that signal is present in the sensitivity list of a process the process is executed until it suspends Simulation stops when an assertion occurs depending on the implementation of the VHDL system or when the maximum time as defined by the language is reached
Entity Declaration
An entity declaration describes the external interface of the entity It specifies the name of the entity the names of the interface ports their mode and the type of ports The syntax for entity declaration is
Entity entity _name is [generic (list of ndashgenerics and ndashtheir types)] [port (list of interface-port-names-and their types )]
[entity item declarations] [begin entity statements] end [entity][entity name]
The entity ndashname is the name of the entity and the interface ports are the signals through which entity passes the information to and from its external environment Each interface port can have one of the following modes
1 in The value of an input port can only read with in the entity model 2 out The value of an out put port can only be updated within the entity model3 inout The value of a bi directional port can be read and updated within the entity
model4 buffer The value of a buffer port can be read and updated within the entity
model It cannot have more than one source
Declarations that are placed in the entity are common to all the design units that are associated with that entity declaration
ARCHITECTURE BODY
An architecture body describes the internal view of an entity It describes the functionality of the structure of the entity
11
Architecture ltarchitecture namegt oflt entity namegt is Begin Concurrent statements Process statements Block statements Concurrent signal assignment-statement Component ndashinstantiation-statement Generate statement End [architecture] [architecture name]
The concurrent statements describe the internal composition of the entity All
concurrent statements are executed in parallel The internal composition of an entity can be expressed in terms of structure dataflow and sequential behavior
Here we describe an entity by using the behavioral model A process statement which is a concurrent statement is the primary mechanism used to describe the functionality of an entity in this modeling style
26 PROCESS STATEMENT
A process statement contains sequential statements that describe the functionality of a portion of an entity in sequential terms The syntax for the process statement is
[Process-label] process [(sensitivity-list)] [is] begin sequential statements variable-assignment-statement signal assignment-statement wait statement if-statement case-statement loop-statement null-statement exit-statement next-statement assertion-statement report-statement procedure-call-statement return end process [process label]
A set of signals to which the process is sensitive is defined by the sensitivity list In other words each time an event occurs on any of the signals in the sensitivity list the sequential statements with in the process are executed in a sequential order that is in the order in which they appear The process then suspends after executing the last sequential statement and waits for another event to occur on a signal in the sensitivity list
12
VARIABLE ASSIGNMENT STAEMENT
Variables can be declared and used inside a process statement A variable is assigned a value using the variable assignment statement that typically has the form
Variable-object = expression
The expression is evaluated when the statement is executed and the computed value is assigned to the variable object instantaneously that is at the concurrent simulation time
A variable can be declared outside of a process or subprogram Such a variable can be read and updated by more than one process These variables are called shared variables
SIGNAL ASSIGNMENT STATEMENT
Signals are assigned values using a signal assignment statement The simplest form of a signal assignment statement is
Signal-object lt= expression [after a delay value]
A signal assignment statement can appear within a process or outside of a process If it occurs outside of a process it is considered to be a concurrent signal assignment statement
When a signal assignment statement appears with in a process it is considered to be a sequential signal assignment statement and is executed in sequences with respect to the other statements which appear with in the process
27 CONDITIONAL STATEMENTS
IF STATEMENT An if statement selects a sequence of statements for execution of statements for execution based on the value of a condition the condition The condition can be any expression that evaluates to a Boolean value The general form of an if statement is
If Boolean expression thenSequential statementselsif Boolean-expression thenSequential-statements[else sequential statements] end if
The if statement is executed by checking each condition sequentially until the first true condition is found the set of sequential statements associated with this condition is executed An if statement is also a sequential statement
13
CASE STATEMENT
The format of a case statement is Case expression is
When choices =gtsequential statementsWhen choices =gtsequential statements End case
The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement
LOOP STATEMENTS
A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is
[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]
14
28 Active HDL Overview
Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries
Standards Supported
VHDL
The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard
Verilog
The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL
EDIF
Active-HDL supports Electronic Design Interchange Format version 2 0 0
VITAL
The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21
WAVES
Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system
15
29 ACTIVE-HDL Macro Language
All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)
1 HDL Editor
HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts
2 Block Diagram Editor
Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
3 State Diagram Editor
State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
4 Waveform Editor
Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors
5 Design Browser
The Design Browser window displays the contents of the current design that is
Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the
current design
16
6 Console window
The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console
210 Compilation
Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following
VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)
In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram
A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection
Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled
211 Simulation
The purpose of simulation is to verify that the circuit works as desired
The Active-HDL simulator provides two simulation engines
Event-Driven Simulation Cycle-Based Simulation
17
The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven
212 XILINX
Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish
ISE enables you to start your design with any of a number of different source types including
HDL (VHDL Verilog HDL ABEL)
Schematic design files
EDIF
NGCNGO
State Machines
IP Cores
From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration
Design Entry
ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports
18
Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow
CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories
Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints
PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints
State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL
Implementation
Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file
Map - The Map program maps a logical design to a Xilinx FPGA
Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator
Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design
FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing
Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design
Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file
Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments
19
Device Download and Program File Formatting
BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration
iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device
XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices
Integration with ChipScope Pro
CH 3 Introduction to SEA
Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication
In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle
20
SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo
Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems
For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted
31 Specifications
Parameters and Definitions
SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds
As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0
Basic Operations
21
Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _
These operations are formally defined as follows
1 Bitwise XOR
The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1
2 Substitution Box S
SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition
S Znb2b rarr Znb
2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR
Word Rotation R
The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1
Bit Rotation r
The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word
22
Addition mod2b _
The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1
The Round and Key Round
Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that
[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri
[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri
[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi
23
R
R-1
r S
ki
Li Ri
Li+1 Ri+1
KLi KRi
R r S
Ci
KLi+1
KRi+1
FIG 31 Encryptdecrypt round and key round
The Complete Cipher
The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K
key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))
encryptionfor i in 1 to nr2
[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)
24
finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD
32 Design Properties of the Components
Substitution Box S
The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table
Bit and Word Rotations r and R
The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3
The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds
Addition mod 2b _
Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in
25
most processors (4) necessity to avoid structural attacks
33 Overall Structure
The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round
34 Security Analysis
Resistance Against Known Attacks
Linear and Differential Cryptanalysis
From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round
Extensions of Linear and Differential Cryptanalysis
Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear
26
cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack
However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC
It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks
As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account
A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a
Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a
These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and
27
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
CONTENTS
CHAPTER 1 Introduction to VLSI 9 11 Introduction 9 12 VLSI Design Style 10
13 VLSI Design Flow 11 14 VLSI Features 11
CHAPTER 2 Introduction to VHDL 1221 Introduction 1222 Capabilities 1323 Abstraction levels of VHDL 13 24 Basic Terminology 1425 Modeling Techniques for VHDL 1726 Process Statements 1827 Conditional Statements 19 28 Active HDL Overview 2129 Macro language 22 210 Compilation 23 211 Simulation 23 212 X Linix 24
CHAPTER 3 Introduction to SEA 26 31 Specifications 2732 Design properties 3033 Overall Structure 3134 Security Analysis 3135 Performance Analysis 35
CHAPTER 4 An Exposition Of SEA 37 41 Overview of SEA 38
CHAPTER 5 SEA Architecture 39 51 Key Generation 4052 Encryption 4253 Decryption 44
Appendix-I Simulation Results 47Appendix-II Synthesis Reports 50 Appendix- III Implementation 79
Appendix-IV Advantages 80Appendix-V Conclusion 81
Appendix-VI Bibliography 82
2
CH1 INTRODUCTION TO VLSI
The first digital circuit was designed by using electronic components like vacuum tubes and transistors Later Integrated Circuits (ICs) were invented where a designer can be able to place digital circuits on a chip consists of less than 10 gates for an IC called SSI (Small Scale Integration) scale With the advent of new fabrication techniques designer can place more than 100 gates on an IC called MSI (Medium Scale Integration) Using design at this level one can create digital sub blocks (adders multiplexes counters registers and etc) on an IC This level is LSI (Large Scale Integration) using this scale of integration people succeeded to make digital subsystems (Microprocessor IO peripheral devices and etc) on a chip
At this point design process started getting very complicated ie manually conversion from schematic level to gate level or gate level to layout level was becoming somewhat lengthy process and verifying the functionality of digital circuits at various levels became critical This created new challenges to digital designers as well as circuit designers Designers felt need to automate these processes In this process Rapid advances in Software Technology and development of new higher level programming languages taken place People could able to develop CADCAE (Computer Aided DesignComputer Aided Engineering) tools for design electronics circuits with assistance of software programs Functional verification and Logic verification of design can be done using CAD simulation tools with greater efficiency It became very easy to a designer to verify functionality of design at various levels
With advent of new technology ie CMOS (Complementary Metal Oxide Semiconductor) process technology One can fabricate a chip contains more than Million of gates At this point design process still became critical because of manual converting the design from one level to other Using latest CAD tools could solve the problem Existence of logic synthesis tools design engineer can easily translate to higher-level design description to lower levels This way of designing (using CAD tools) is certainly a revolution in electronic industry This may be leading to development of sophisticated electronic products for both consumer as well as business Designing Systems using Hardware always gives best results when compared to software (like Speed Reliability performance and etc) Using CMOS VLSI Design methodology designer could design and fabricate ICs without spending much time when compared to traditional way of designing
3
Specifications
Behavioral Description
RTL Description
Behavioral Simulation
Functional Simulation
Behavioral Synthesis
Logic Synthesis
Gate Level Net list
Constraints
Constraints
Library
AutomaticPampR
Layout
Logic simulation
Fabrication
Lay Out Management
12 TYPICAL IC DESIGN FLOW
4
Micron Technology
Micron SM DSM VDSM
13 MICRON TECHNOLOGY
The micron technology can be classified into 4 categories Evolving from micron technology and extending up to VDSM
Micron Technology The technology up to 10-6 m is the micron
Technology
Submicron Technology The technology below 1um is known as the Submicron technology It generally ranges up to 036m
DSM(Deep Sub Micron technology) The technology extending up to 018m is DSM
VDSM(Very Deep Sub Micron technology) The presently used technology is VDSM It ranges up to 009um
14 FEATURES
5
21 INTRODUCTION TO VHDL
VHDL is acronym for VHSIC hardware Description languageVHSIC is acronym for very high speed Integrated Circuits It is a hardware description language that can be used to model a digital system at many levels of abstraction ranging from the algorithmic level to the gate level
The VHDL language can be regarded as an integrated amalgamation of the following languages
Sequential language
Concurrent language
Net-list language
Timing specifications
Waveform generation language VHDL
This language not only defines the syntax but also defines very clear simulation semantics for each language construct Therefore models written in this language can be verified using a VHDL simulator This subset is usually sufficient to model most applications The complete language however has sufficient power to capture the descriptions of the most complex chips to a complete electronic system
HISTORY
The requirements for the language were first generated in 1988 under the VHSIC chips for the department of Defence (DOD) Reprocurement and reuse was also a big issue Thus a need for a standardized hardware description language for the design documentation and verification of the digital systems was generated The IEEE in the December 1987 standardized VHDL language this version of the language is known as the IEEE STD 1076-1987 The official language description appears in the IEEE standard VHDL language Reference manual available from IEEE The language has also been recognized as an American National Standards Institute (ANSI) standard
According to IEEE rules an IEEE standard has to be reballoted every 5 years so that it may remain a standard so that it may remain a standard Consequently the language was upgraded with new features the syntax of many constructs was made more uniform and many ambiguities present in the 1987 version of the language were resolved This new version of the language is known as the IEEE STD 1076-1993
6
22 CAPABILITIES
The following are the major capabilities that the language provides along with the features that the language provides along with the features that differentiate it from other hardware languages
The language can be used as exchange medium between chip vendors and CAD tool users Different chip vendors can provide VHDL descriptions of their components to system designers
The language can be used as a communication medium between different CAD and CAE tools
The language supports hierarchy that is a digital can be modeled as asset of interconnected components each component in turn can be modeled as a set of interconnected subcomponents
The language supports flexible design methodologies top-down bottom-up or mixed It supports both synchronous and asynchronous timing models
Various digital modeling techniques such as finite ndashstate machine descriptions and Boolean equations can be modeled using the language
The language is publicly available human-readable and machine-readable
The language supports three basic different styles Structural Dataflow and behavioral
It supports a wide range of abstraction levels ranging from abstract behavioral descriptions to very precise gate-level descriptions
Arbitrarily large designs can be modeled using the language and there are no limitations imposed by the language on the size of the design
23 HARDWARE ABSTRACTION
VHDL is used to describe a model for a digital hardware device This model specifies the external view of the device and one or more internal views The internal view of the device specifies functionality or structure while the external view specifies the interface of the device through which it communicates with the other modules in the environment In VHDL each device model is treated as a distinct representation of a unique device called an Entity The Entity is thus a hardware abstraction of the actual hardware device Each Entity is described using one model which contains one external view and one or more internal views
7
24 Basic terminology
VHDL is a hardware description language that can be used to model a digital system A hardware abstraction of this digital system is called an entity An entity X when used in another entity Y becomes a component for the entity YTo describe an entity VHDL provides five different types of primary constructs called design units They are
1 Entity declaration 2 Architecture body 3 Configuration declaration 4 Package declaration 5 Package body
1 An entity is modeled using an entity declaration and at least one architecture body the Entity declaration describes the external view of the entity
For example the input and output signal names2 The architecture body contains the internal description of the entity for
example as a set of interconnected components that represents the structure of the entity or a set of concurrent or sequential statements that represents the behavior of the entity
3 A configuration declaration is used to create a configuration for an entity It specifies the binding of one architecture body from the many architecture bodies that may be associated with the entity It may also specify the bindings of the architecture components used in the selected architecture body to other entities An entity may have any number of configurations
4 A package declaration encapsulates a set of related declarations such type of declaration s subtype declaration and subprogram declaration which can be shared across two or more design units
5 A package body contains the definition of subprogram declared in a package declaration
Once an entity has been modeled it needs to be validated by a VHDL system A typical VHDL system consists of an analyzer and a simulator The analyzer reads in one or more design units contained in a single file and compiles them into a design library after validating the syntax and performing some static checks
The language is case insensitive that is lowercase and uppercase characters are treated alike the Language is also free format comments are specified in the language by preceding the text with two Consecutive dashes (- -)
Entity Declaration
The entity declaration specifies the name of entity being modeled and lists the set of inter face ports Ports are signals through which entity communicates with other models in its external environment
8
EXAMPLE
Entity declaration for the half adder circuit is
Entity half adder is Port (A B in Bit sum carry out Bit) End half adder
The entity called half adder has two input ports A and B and two out put ports sum and carry Bit is predefined type of the language
Architecture Body
An architecture body using any of the following modeling styles specifies the internal details of an entity
1 As a set of interconnected components (to represent structure)2 As a set of concurrent assignment statements (to represent data flow)3 As a set of sequential assignment statements (to represent behavior)4 As any combination of the above three
25 Structural style of modeling
In this one an entity is described as a set of interconnected components Such a model for the HALF_ADDER entity is described in a n architecture body
Architecture ha of ha isComponent Xor2 Port (X Y in BIT Zout BIT)End componentComponent And2 Port (L M in BIT NoutBIT) End component Begin X1 Xor2portmap (A B SUM) A1 AND2portmap (A B CARRY) End ha
The name of the architecture body is ha the entity declaration for half adder specifies the interface ports for this architecture body The architecture body is composed of two parts the declaration part and the statement part Two component declarations are present in the declarative part of the architecture body
The declared components are instantiated in the statement part of the architecture body using component instantiation The signals in the port map of a component instantiation and the port signals in the component declaration are associated by the position
9
DATAFLOW STYLE OF MODELING
In this modeling style the flow of data through the entity is expressed primarily using concurrent signal assignment statements The data flow model for the half adder is described using two concurrent signal assignment statements In a signal assignment statement the symbol lt=implies an assignment of a value to a signal
BEHAVIORAL STYLE OF MODELING
The behavioral style of modeling specifies the behavior of an entity as a set of statements that are executed sequentially in the specific order These sets of sequential statements which are specified inside a process statement do not explicitly specify the structure of the entity but merely its functionality A process statement is a concurrent statement that can appear with in an architecture body MIXED STYLE OF MODELING
It is possible to mix the three modeling styles in a single architecture body That is within an architecture body we could use component instantiation statements concurrent signal assignment statements and process statements
MODEL ANALYSIS
Once an entity is declared in VHDL it can be validated using analyzer and a simulator that are apart of a VHDL system The first step in the validation process is analysis The analyzer takes a file that contains one or more design units and compile s them into an intermediate form The generated intermediate form is stored in a specific design library that has been designated as the working library
There is a design library with the logic name STD predefined by the VHDL language environment This library contains two packages STANDARD and TEXTIO The STANDARD package contains declarations for all the predefined types of the language The TexTIO package contains procedures and functions that are necessary for supporting formatted text read and write operations There also exists an IEEE standard package called STD_LOGIC_1164and contains its associated sub types overloaded operator functions and other useful utilities This standard is called the IEEE STD 1164 ndash1993
SIMULATION
For a hierarchical entity to be simulated all of its lowest ndashlevel components must be described at the behavioral level A simulation can be performed on either one of the following
1 An entity declaration and an architecture body pair
2 A configuration
10
Preceding the actual simulation are two major steps
1 Elaboration phase IN this phase the hierarchy of the entity is expanded and linked components are bound to entities in a library and the top- level entity is built as a network of behavioral models that is ready to be simulated
2 Initialization phase Driving and effective values for all explicitly declared signals are computed implicit signals are assigned values processes are executed once until they suspend and simulation time is set to 0ns
Simulation commences by advancing time to that of the next event Values that are assigned to signals at this time are assigned If the value of a signal changes and if that signal is present in the sensitivity list of a process the process is executed until it suspends Simulation stops when an assertion occurs depending on the implementation of the VHDL system or when the maximum time as defined by the language is reached
Entity Declaration
An entity declaration describes the external interface of the entity It specifies the name of the entity the names of the interface ports their mode and the type of ports The syntax for entity declaration is
Entity entity _name is [generic (list of ndashgenerics and ndashtheir types)] [port (list of interface-port-names-and their types )]
[entity item declarations] [begin entity statements] end [entity][entity name]
The entity ndashname is the name of the entity and the interface ports are the signals through which entity passes the information to and from its external environment Each interface port can have one of the following modes
1 in The value of an input port can only read with in the entity model 2 out The value of an out put port can only be updated within the entity model3 inout The value of a bi directional port can be read and updated within the entity
model4 buffer The value of a buffer port can be read and updated within the entity
model It cannot have more than one source
Declarations that are placed in the entity are common to all the design units that are associated with that entity declaration
ARCHITECTURE BODY
An architecture body describes the internal view of an entity It describes the functionality of the structure of the entity
11
Architecture ltarchitecture namegt oflt entity namegt is Begin Concurrent statements Process statements Block statements Concurrent signal assignment-statement Component ndashinstantiation-statement Generate statement End [architecture] [architecture name]
The concurrent statements describe the internal composition of the entity All
concurrent statements are executed in parallel The internal composition of an entity can be expressed in terms of structure dataflow and sequential behavior
Here we describe an entity by using the behavioral model A process statement which is a concurrent statement is the primary mechanism used to describe the functionality of an entity in this modeling style
26 PROCESS STATEMENT
A process statement contains sequential statements that describe the functionality of a portion of an entity in sequential terms The syntax for the process statement is
[Process-label] process [(sensitivity-list)] [is] begin sequential statements variable-assignment-statement signal assignment-statement wait statement if-statement case-statement loop-statement null-statement exit-statement next-statement assertion-statement report-statement procedure-call-statement return end process [process label]
A set of signals to which the process is sensitive is defined by the sensitivity list In other words each time an event occurs on any of the signals in the sensitivity list the sequential statements with in the process are executed in a sequential order that is in the order in which they appear The process then suspends after executing the last sequential statement and waits for another event to occur on a signal in the sensitivity list
12
VARIABLE ASSIGNMENT STAEMENT
Variables can be declared and used inside a process statement A variable is assigned a value using the variable assignment statement that typically has the form
Variable-object = expression
The expression is evaluated when the statement is executed and the computed value is assigned to the variable object instantaneously that is at the concurrent simulation time
A variable can be declared outside of a process or subprogram Such a variable can be read and updated by more than one process These variables are called shared variables
SIGNAL ASSIGNMENT STATEMENT
Signals are assigned values using a signal assignment statement The simplest form of a signal assignment statement is
Signal-object lt= expression [after a delay value]
A signal assignment statement can appear within a process or outside of a process If it occurs outside of a process it is considered to be a concurrent signal assignment statement
When a signal assignment statement appears with in a process it is considered to be a sequential signal assignment statement and is executed in sequences with respect to the other statements which appear with in the process
27 CONDITIONAL STATEMENTS
IF STATEMENT An if statement selects a sequence of statements for execution of statements for execution based on the value of a condition the condition The condition can be any expression that evaluates to a Boolean value The general form of an if statement is
If Boolean expression thenSequential statementselsif Boolean-expression thenSequential-statements[else sequential statements] end if
The if statement is executed by checking each condition sequentially until the first true condition is found the set of sequential statements associated with this condition is executed An if statement is also a sequential statement
13
CASE STATEMENT
The format of a case statement is Case expression is
When choices =gtsequential statementsWhen choices =gtsequential statements End case
The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement
LOOP STATEMENTS
A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is
[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]
14
28 Active HDL Overview
Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries
Standards Supported
VHDL
The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard
Verilog
The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL
EDIF
Active-HDL supports Electronic Design Interchange Format version 2 0 0
VITAL
The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21
WAVES
Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system
15
29 ACTIVE-HDL Macro Language
All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)
1 HDL Editor
HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts
2 Block Diagram Editor
Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
3 State Diagram Editor
State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
4 Waveform Editor
Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors
5 Design Browser
The Design Browser window displays the contents of the current design that is
Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the
current design
16
6 Console window
The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console
210 Compilation
Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following
VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)
In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram
A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection
Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled
211 Simulation
The purpose of simulation is to verify that the circuit works as desired
The Active-HDL simulator provides two simulation engines
Event-Driven Simulation Cycle-Based Simulation
17
The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven
212 XILINX
Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish
ISE enables you to start your design with any of a number of different source types including
HDL (VHDL Verilog HDL ABEL)
Schematic design files
EDIF
NGCNGO
State Machines
IP Cores
From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration
Design Entry
ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports
18
Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow
CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories
Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints
PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints
State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL
Implementation
Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file
Map - The Map program maps a logical design to a Xilinx FPGA
Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator
Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design
FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing
Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design
Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file
Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments
19
Device Download and Program File Formatting
BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration
iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device
XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices
Integration with ChipScope Pro
CH 3 Introduction to SEA
Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication
In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle
20
SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo
Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems
For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted
31 Specifications
Parameters and Definitions
SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds
As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0
Basic Operations
21
Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _
These operations are formally defined as follows
1 Bitwise XOR
The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1
2 Substitution Box S
SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition
S Znb2b rarr Znb
2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR
Word Rotation R
The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1
Bit Rotation r
The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word
22
Addition mod2b _
The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1
The Round and Key Round
Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that
[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri
[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri
[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi
23
R
R-1
r S
ki
Li Ri
Li+1 Ri+1
KLi KRi
R r S
Ci
KLi+1
KRi+1
FIG 31 Encryptdecrypt round and key round
The Complete Cipher
The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K
key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))
encryptionfor i in 1 to nr2
[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)
24
finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD
32 Design Properties of the Components
Substitution Box S
The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table
Bit and Word Rotations r and R
The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3
The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds
Addition mod 2b _
Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in
25
most processors (4) necessity to avoid structural attacks
33 Overall Structure
The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round
34 Security Analysis
Resistance Against Known Attacks
Linear and Differential Cryptanalysis
From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round
Extensions of Linear and Differential Cryptanalysis
Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear
26
cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack
However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC
It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks
As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account
A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a
Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a
These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and
27
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
CH1 INTRODUCTION TO VLSI
The first digital circuit was designed by using electronic components like vacuum tubes and transistors Later Integrated Circuits (ICs) were invented where a designer can be able to place digital circuits on a chip consists of less than 10 gates for an IC called SSI (Small Scale Integration) scale With the advent of new fabrication techniques designer can place more than 100 gates on an IC called MSI (Medium Scale Integration) Using design at this level one can create digital sub blocks (adders multiplexes counters registers and etc) on an IC This level is LSI (Large Scale Integration) using this scale of integration people succeeded to make digital subsystems (Microprocessor IO peripheral devices and etc) on a chip
At this point design process started getting very complicated ie manually conversion from schematic level to gate level or gate level to layout level was becoming somewhat lengthy process and verifying the functionality of digital circuits at various levels became critical This created new challenges to digital designers as well as circuit designers Designers felt need to automate these processes In this process Rapid advances in Software Technology and development of new higher level programming languages taken place People could able to develop CADCAE (Computer Aided DesignComputer Aided Engineering) tools for design electronics circuits with assistance of software programs Functional verification and Logic verification of design can be done using CAD simulation tools with greater efficiency It became very easy to a designer to verify functionality of design at various levels
With advent of new technology ie CMOS (Complementary Metal Oxide Semiconductor) process technology One can fabricate a chip contains more than Million of gates At this point design process still became critical because of manual converting the design from one level to other Using latest CAD tools could solve the problem Existence of logic synthesis tools design engineer can easily translate to higher-level design description to lower levels This way of designing (using CAD tools) is certainly a revolution in electronic industry This may be leading to development of sophisticated electronic products for both consumer as well as business Designing Systems using Hardware always gives best results when compared to software (like Speed Reliability performance and etc) Using CMOS VLSI Design methodology designer could design and fabricate ICs without spending much time when compared to traditional way of designing
3
Specifications
Behavioral Description
RTL Description
Behavioral Simulation
Functional Simulation
Behavioral Synthesis
Logic Synthesis
Gate Level Net list
Constraints
Constraints
Library
AutomaticPampR
Layout
Logic simulation
Fabrication
Lay Out Management
12 TYPICAL IC DESIGN FLOW
4
Micron Technology
Micron SM DSM VDSM
13 MICRON TECHNOLOGY
The micron technology can be classified into 4 categories Evolving from micron technology and extending up to VDSM
Micron Technology The technology up to 10-6 m is the micron
Technology
Submicron Technology The technology below 1um is known as the Submicron technology It generally ranges up to 036m
DSM(Deep Sub Micron technology) The technology extending up to 018m is DSM
VDSM(Very Deep Sub Micron technology) The presently used technology is VDSM It ranges up to 009um
14 FEATURES
5
21 INTRODUCTION TO VHDL
VHDL is acronym for VHSIC hardware Description languageVHSIC is acronym for very high speed Integrated Circuits It is a hardware description language that can be used to model a digital system at many levels of abstraction ranging from the algorithmic level to the gate level
The VHDL language can be regarded as an integrated amalgamation of the following languages
Sequential language
Concurrent language
Net-list language
Timing specifications
Waveform generation language VHDL
This language not only defines the syntax but also defines very clear simulation semantics for each language construct Therefore models written in this language can be verified using a VHDL simulator This subset is usually sufficient to model most applications The complete language however has sufficient power to capture the descriptions of the most complex chips to a complete electronic system
HISTORY
The requirements for the language were first generated in 1988 under the VHSIC chips for the department of Defence (DOD) Reprocurement and reuse was also a big issue Thus a need for a standardized hardware description language for the design documentation and verification of the digital systems was generated The IEEE in the December 1987 standardized VHDL language this version of the language is known as the IEEE STD 1076-1987 The official language description appears in the IEEE standard VHDL language Reference manual available from IEEE The language has also been recognized as an American National Standards Institute (ANSI) standard
According to IEEE rules an IEEE standard has to be reballoted every 5 years so that it may remain a standard so that it may remain a standard Consequently the language was upgraded with new features the syntax of many constructs was made more uniform and many ambiguities present in the 1987 version of the language were resolved This new version of the language is known as the IEEE STD 1076-1993
6
22 CAPABILITIES
The following are the major capabilities that the language provides along with the features that the language provides along with the features that differentiate it from other hardware languages
The language can be used as exchange medium between chip vendors and CAD tool users Different chip vendors can provide VHDL descriptions of their components to system designers
The language can be used as a communication medium between different CAD and CAE tools
The language supports hierarchy that is a digital can be modeled as asset of interconnected components each component in turn can be modeled as a set of interconnected subcomponents
The language supports flexible design methodologies top-down bottom-up or mixed It supports both synchronous and asynchronous timing models
Various digital modeling techniques such as finite ndashstate machine descriptions and Boolean equations can be modeled using the language
The language is publicly available human-readable and machine-readable
The language supports three basic different styles Structural Dataflow and behavioral
It supports a wide range of abstraction levels ranging from abstract behavioral descriptions to very precise gate-level descriptions
Arbitrarily large designs can be modeled using the language and there are no limitations imposed by the language on the size of the design
23 HARDWARE ABSTRACTION
VHDL is used to describe a model for a digital hardware device This model specifies the external view of the device and one or more internal views The internal view of the device specifies functionality or structure while the external view specifies the interface of the device through which it communicates with the other modules in the environment In VHDL each device model is treated as a distinct representation of a unique device called an Entity The Entity is thus a hardware abstraction of the actual hardware device Each Entity is described using one model which contains one external view and one or more internal views
7
24 Basic terminology
VHDL is a hardware description language that can be used to model a digital system A hardware abstraction of this digital system is called an entity An entity X when used in another entity Y becomes a component for the entity YTo describe an entity VHDL provides five different types of primary constructs called design units They are
1 Entity declaration 2 Architecture body 3 Configuration declaration 4 Package declaration 5 Package body
1 An entity is modeled using an entity declaration and at least one architecture body the Entity declaration describes the external view of the entity
For example the input and output signal names2 The architecture body contains the internal description of the entity for
example as a set of interconnected components that represents the structure of the entity or a set of concurrent or sequential statements that represents the behavior of the entity
3 A configuration declaration is used to create a configuration for an entity It specifies the binding of one architecture body from the many architecture bodies that may be associated with the entity It may also specify the bindings of the architecture components used in the selected architecture body to other entities An entity may have any number of configurations
4 A package declaration encapsulates a set of related declarations such type of declaration s subtype declaration and subprogram declaration which can be shared across two or more design units
5 A package body contains the definition of subprogram declared in a package declaration
Once an entity has been modeled it needs to be validated by a VHDL system A typical VHDL system consists of an analyzer and a simulator The analyzer reads in one or more design units contained in a single file and compiles them into a design library after validating the syntax and performing some static checks
The language is case insensitive that is lowercase and uppercase characters are treated alike the Language is also free format comments are specified in the language by preceding the text with two Consecutive dashes (- -)
Entity Declaration
The entity declaration specifies the name of entity being modeled and lists the set of inter face ports Ports are signals through which entity communicates with other models in its external environment
8
EXAMPLE
Entity declaration for the half adder circuit is
Entity half adder is Port (A B in Bit sum carry out Bit) End half adder
The entity called half adder has two input ports A and B and two out put ports sum and carry Bit is predefined type of the language
Architecture Body
An architecture body using any of the following modeling styles specifies the internal details of an entity
1 As a set of interconnected components (to represent structure)2 As a set of concurrent assignment statements (to represent data flow)3 As a set of sequential assignment statements (to represent behavior)4 As any combination of the above three
25 Structural style of modeling
In this one an entity is described as a set of interconnected components Such a model for the HALF_ADDER entity is described in a n architecture body
Architecture ha of ha isComponent Xor2 Port (X Y in BIT Zout BIT)End componentComponent And2 Port (L M in BIT NoutBIT) End component Begin X1 Xor2portmap (A B SUM) A1 AND2portmap (A B CARRY) End ha
The name of the architecture body is ha the entity declaration for half adder specifies the interface ports for this architecture body The architecture body is composed of two parts the declaration part and the statement part Two component declarations are present in the declarative part of the architecture body
The declared components are instantiated in the statement part of the architecture body using component instantiation The signals in the port map of a component instantiation and the port signals in the component declaration are associated by the position
9
DATAFLOW STYLE OF MODELING
In this modeling style the flow of data through the entity is expressed primarily using concurrent signal assignment statements The data flow model for the half adder is described using two concurrent signal assignment statements In a signal assignment statement the symbol lt=implies an assignment of a value to a signal
BEHAVIORAL STYLE OF MODELING
The behavioral style of modeling specifies the behavior of an entity as a set of statements that are executed sequentially in the specific order These sets of sequential statements which are specified inside a process statement do not explicitly specify the structure of the entity but merely its functionality A process statement is a concurrent statement that can appear with in an architecture body MIXED STYLE OF MODELING
It is possible to mix the three modeling styles in a single architecture body That is within an architecture body we could use component instantiation statements concurrent signal assignment statements and process statements
MODEL ANALYSIS
Once an entity is declared in VHDL it can be validated using analyzer and a simulator that are apart of a VHDL system The first step in the validation process is analysis The analyzer takes a file that contains one or more design units and compile s them into an intermediate form The generated intermediate form is stored in a specific design library that has been designated as the working library
There is a design library with the logic name STD predefined by the VHDL language environment This library contains two packages STANDARD and TEXTIO The STANDARD package contains declarations for all the predefined types of the language The TexTIO package contains procedures and functions that are necessary for supporting formatted text read and write operations There also exists an IEEE standard package called STD_LOGIC_1164and contains its associated sub types overloaded operator functions and other useful utilities This standard is called the IEEE STD 1164 ndash1993
SIMULATION
For a hierarchical entity to be simulated all of its lowest ndashlevel components must be described at the behavioral level A simulation can be performed on either one of the following
1 An entity declaration and an architecture body pair
2 A configuration
10
Preceding the actual simulation are two major steps
1 Elaboration phase IN this phase the hierarchy of the entity is expanded and linked components are bound to entities in a library and the top- level entity is built as a network of behavioral models that is ready to be simulated
2 Initialization phase Driving and effective values for all explicitly declared signals are computed implicit signals are assigned values processes are executed once until they suspend and simulation time is set to 0ns
Simulation commences by advancing time to that of the next event Values that are assigned to signals at this time are assigned If the value of a signal changes and if that signal is present in the sensitivity list of a process the process is executed until it suspends Simulation stops when an assertion occurs depending on the implementation of the VHDL system or when the maximum time as defined by the language is reached
Entity Declaration
An entity declaration describes the external interface of the entity It specifies the name of the entity the names of the interface ports their mode and the type of ports The syntax for entity declaration is
Entity entity _name is [generic (list of ndashgenerics and ndashtheir types)] [port (list of interface-port-names-and their types )]
[entity item declarations] [begin entity statements] end [entity][entity name]
The entity ndashname is the name of the entity and the interface ports are the signals through which entity passes the information to and from its external environment Each interface port can have one of the following modes
1 in The value of an input port can only read with in the entity model 2 out The value of an out put port can only be updated within the entity model3 inout The value of a bi directional port can be read and updated within the entity
model4 buffer The value of a buffer port can be read and updated within the entity
model It cannot have more than one source
Declarations that are placed in the entity are common to all the design units that are associated with that entity declaration
ARCHITECTURE BODY
An architecture body describes the internal view of an entity It describes the functionality of the structure of the entity
11
Architecture ltarchitecture namegt oflt entity namegt is Begin Concurrent statements Process statements Block statements Concurrent signal assignment-statement Component ndashinstantiation-statement Generate statement End [architecture] [architecture name]
The concurrent statements describe the internal composition of the entity All
concurrent statements are executed in parallel The internal composition of an entity can be expressed in terms of structure dataflow and sequential behavior
Here we describe an entity by using the behavioral model A process statement which is a concurrent statement is the primary mechanism used to describe the functionality of an entity in this modeling style
26 PROCESS STATEMENT
A process statement contains sequential statements that describe the functionality of a portion of an entity in sequential terms The syntax for the process statement is
[Process-label] process [(sensitivity-list)] [is] begin sequential statements variable-assignment-statement signal assignment-statement wait statement if-statement case-statement loop-statement null-statement exit-statement next-statement assertion-statement report-statement procedure-call-statement return end process [process label]
A set of signals to which the process is sensitive is defined by the sensitivity list In other words each time an event occurs on any of the signals in the sensitivity list the sequential statements with in the process are executed in a sequential order that is in the order in which they appear The process then suspends after executing the last sequential statement and waits for another event to occur on a signal in the sensitivity list
12
VARIABLE ASSIGNMENT STAEMENT
Variables can be declared and used inside a process statement A variable is assigned a value using the variable assignment statement that typically has the form
Variable-object = expression
The expression is evaluated when the statement is executed and the computed value is assigned to the variable object instantaneously that is at the concurrent simulation time
A variable can be declared outside of a process or subprogram Such a variable can be read and updated by more than one process These variables are called shared variables
SIGNAL ASSIGNMENT STATEMENT
Signals are assigned values using a signal assignment statement The simplest form of a signal assignment statement is
Signal-object lt= expression [after a delay value]
A signal assignment statement can appear within a process or outside of a process If it occurs outside of a process it is considered to be a concurrent signal assignment statement
When a signal assignment statement appears with in a process it is considered to be a sequential signal assignment statement and is executed in sequences with respect to the other statements which appear with in the process
27 CONDITIONAL STATEMENTS
IF STATEMENT An if statement selects a sequence of statements for execution of statements for execution based on the value of a condition the condition The condition can be any expression that evaluates to a Boolean value The general form of an if statement is
If Boolean expression thenSequential statementselsif Boolean-expression thenSequential-statements[else sequential statements] end if
The if statement is executed by checking each condition sequentially until the first true condition is found the set of sequential statements associated with this condition is executed An if statement is also a sequential statement
13
CASE STATEMENT
The format of a case statement is Case expression is
When choices =gtsequential statementsWhen choices =gtsequential statements End case
The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement
LOOP STATEMENTS
A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is
[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]
14
28 Active HDL Overview
Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries
Standards Supported
VHDL
The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard
Verilog
The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL
EDIF
Active-HDL supports Electronic Design Interchange Format version 2 0 0
VITAL
The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21
WAVES
Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system
15
29 ACTIVE-HDL Macro Language
All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)
1 HDL Editor
HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts
2 Block Diagram Editor
Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
3 State Diagram Editor
State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
4 Waveform Editor
Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors
5 Design Browser
The Design Browser window displays the contents of the current design that is
Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the
current design
16
6 Console window
The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console
210 Compilation
Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following
VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)
In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram
A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection
Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled
211 Simulation
The purpose of simulation is to verify that the circuit works as desired
The Active-HDL simulator provides two simulation engines
Event-Driven Simulation Cycle-Based Simulation
17
The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven
212 XILINX
Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish
ISE enables you to start your design with any of a number of different source types including
HDL (VHDL Verilog HDL ABEL)
Schematic design files
EDIF
NGCNGO
State Machines
IP Cores
From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration
Design Entry
ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports
18
Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow
CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories
Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints
PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints
State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL
Implementation
Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file
Map - The Map program maps a logical design to a Xilinx FPGA
Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator
Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design
FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing
Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design
Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file
Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments
19
Device Download and Program File Formatting
BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration
iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device
XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices
Integration with ChipScope Pro
CH 3 Introduction to SEA
Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication
In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle
20
SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo
Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems
For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted
31 Specifications
Parameters and Definitions
SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds
As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0
Basic Operations
21
Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _
These operations are formally defined as follows
1 Bitwise XOR
The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1
2 Substitution Box S
SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition
S Znb2b rarr Znb
2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR
Word Rotation R
The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1
Bit Rotation r
The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word
22
Addition mod2b _
The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1
The Round and Key Round
Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that
[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri
[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri
[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi
23
R
R-1
r S
ki
Li Ri
Li+1 Ri+1
KLi KRi
R r S
Ci
KLi+1
KRi+1
FIG 31 Encryptdecrypt round and key round
The Complete Cipher
The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K
key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))
encryptionfor i in 1 to nr2
[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)
24
finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD
32 Design Properties of the Components
Substitution Box S
The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table
Bit and Word Rotations r and R
The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3
The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds
Addition mod 2b _
Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in
25
most processors (4) necessity to avoid structural attacks
33 Overall Structure
The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round
34 Security Analysis
Resistance Against Known Attacks
Linear and Differential Cryptanalysis
From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round
Extensions of Linear and Differential Cryptanalysis
Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear
26
cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack
However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC
It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks
As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account
A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a
Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a
These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and
27
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
Specifications
Behavioral Description
RTL Description
Behavioral Simulation
Functional Simulation
Behavioral Synthesis
Logic Synthesis
Gate Level Net list
Constraints
Constraints
Library
AutomaticPampR
Layout
Logic simulation
Fabrication
Lay Out Management
12 TYPICAL IC DESIGN FLOW
4
Micron Technology
Micron SM DSM VDSM
13 MICRON TECHNOLOGY
The micron technology can be classified into 4 categories Evolving from micron technology and extending up to VDSM
Micron Technology The technology up to 10-6 m is the micron
Technology
Submicron Technology The technology below 1um is known as the Submicron technology It generally ranges up to 036m
DSM(Deep Sub Micron technology) The technology extending up to 018m is DSM
VDSM(Very Deep Sub Micron technology) The presently used technology is VDSM It ranges up to 009um
14 FEATURES
5
21 INTRODUCTION TO VHDL
VHDL is acronym for VHSIC hardware Description languageVHSIC is acronym for very high speed Integrated Circuits It is a hardware description language that can be used to model a digital system at many levels of abstraction ranging from the algorithmic level to the gate level
The VHDL language can be regarded as an integrated amalgamation of the following languages
Sequential language
Concurrent language
Net-list language
Timing specifications
Waveform generation language VHDL
This language not only defines the syntax but also defines very clear simulation semantics for each language construct Therefore models written in this language can be verified using a VHDL simulator This subset is usually sufficient to model most applications The complete language however has sufficient power to capture the descriptions of the most complex chips to a complete electronic system
HISTORY
The requirements for the language were first generated in 1988 under the VHSIC chips for the department of Defence (DOD) Reprocurement and reuse was also a big issue Thus a need for a standardized hardware description language for the design documentation and verification of the digital systems was generated The IEEE in the December 1987 standardized VHDL language this version of the language is known as the IEEE STD 1076-1987 The official language description appears in the IEEE standard VHDL language Reference manual available from IEEE The language has also been recognized as an American National Standards Institute (ANSI) standard
According to IEEE rules an IEEE standard has to be reballoted every 5 years so that it may remain a standard so that it may remain a standard Consequently the language was upgraded with new features the syntax of many constructs was made more uniform and many ambiguities present in the 1987 version of the language were resolved This new version of the language is known as the IEEE STD 1076-1993
6
22 CAPABILITIES
The following are the major capabilities that the language provides along with the features that the language provides along with the features that differentiate it from other hardware languages
The language can be used as exchange medium between chip vendors and CAD tool users Different chip vendors can provide VHDL descriptions of their components to system designers
The language can be used as a communication medium between different CAD and CAE tools
The language supports hierarchy that is a digital can be modeled as asset of interconnected components each component in turn can be modeled as a set of interconnected subcomponents
The language supports flexible design methodologies top-down bottom-up or mixed It supports both synchronous and asynchronous timing models
Various digital modeling techniques such as finite ndashstate machine descriptions and Boolean equations can be modeled using the language
The language is publicly available human-readable and machine-readable
The language supports three basic different styles Structural Dataflow and behavioral
It supports a wide range of abstraction levels ranging from abstract behavioral descriptions to very precise gate-level descriptions
Arbitrarily large designs can be modeled using the language and there are no limitations imposed by the language on the size of the design
23 HARDWARE ABSTRACTION
VHDL is used to describe a model for a digital hardware device This model specifies the external view of the device and one or more internal views The internal view of the device specifies functionality or structure while the external view specifies the interface of the device through which it communicates with the other modules in the environment In VHDL each device model is treated as a distinct representation of a unique device called an Entity The Entity is thus a hardware abstraction of the actual hardware device Each Entity is described using one model which contains one external view and one or more internal views
7
24 Basic terminology
VHDL is a hardware description language that can be used to model a digital system A hardware abstraction of this digital system is called an entity An entity X when used in another entity Y becomes a component for the entity YTo describe an entity VHDL provides five different types of primary constructs called design units They are
1 Entity declaration 2 Architecture body 3 Configuration declaration 4 Package declaration 5 Package body
1 An entity is modeled using an entity declaration and at least one architecture body the Entity declaration describes the external view of the entity
For example the input and output signal names2 The architecture body contains the internal description of the entity for
example as a set of interconnected components that represents the structure of the entity or a set of concurrent or sequential statements that represents the behavior of the entity
3 A configuration declaration is used to create a configuration for an entity It specifies the binding of one architecture body from the many architecture bodies that may be associated with the entity It may also specify the bindings of the architecture components used in the selected architecture body to other entities An entity may have any number of configurations
4 A package declaration encapsulates a set of related declarations such type of declaration s subtype declaration and subprogram declaration which can be shared across two or more design units
5 A package body contains the definition of subprogram declared in a package declaration
Once an entity has been modeled it needs to be validated by a VHDL system A typical VHDL system consists of an analyzer and a simulator The analyzer reads in one or more design units contained in a single file and compiles them into a design library after validating the syntax and performing some static checks
The language is case insensitive that is lowercase and uppercase characters are treated alike the Language is also free format comments are specified in the language by preceding the text with two Consecutive dashes (- -)
Entity Declaration
The entity declaration specifies the name of entity being modeled and lists the set of inter face ports Ports are signals through which entity communicates with other models in its external environment
8
EXAMPLE
Entity declaration for the half adder circuit is
Entity half adder is Port (A B in Bit sum carry out Bit) End half adder
The entity called half adder has two input ports A and B and two out put ports sum and carry Bit is predefined type of the language
Architecture Body
An architecture body using any of the following modeling styles specifies the internal details of an entity
1 As a set of interconnected components (to represent structure)2 As a set of concurrent assignment statements (to represent data flow)3 As a set of sequential assignment statements (to represent behavior)4 As any combination of the above three
25 Structural style of modeling
In this one an entity is described as a set of interconnected components Such a model for the HALF_ADDER entity is described in a n architecture body
Architecture ha of ha isComponent Xor2 Port (X Y in BIT Zout BIT)End componentComponent And2 Port (L M in BIT NoutBIT) End component Begin X1 Xor2portmap (A B SUM) A1 AND2portmap (A B CARRY) End ha
The name of the architecture body is ha the entity declaration for half adder specifies the interface ports for this architecture body The architecture body is composed of two parts the declaration part and the statement part Two component declarations are present in the declarative part of the architecture body
The declared components are instantiated in the statement part of the architecture body using component instantiation The signals in the port map of a component instantiation and the port signals in the component declaration are associated by the position
9
DATAFLOW STYLE OF MODELING
In this modeling style the flow of data through the entity is expressed primarily using concurrent signal assignment statements The data flow model for the half adder is described using two concurrent signal assignment statements In a signal assignment statement the symbol lt=implies an assignment of a value to a signal
BEHAVIORAL STYLE OF MODELING
The behavioral style of modeling specifies the behavior of an entity as a set of statements that are executed sequentially in the specific order These sets of sequential statements which are specified inside a process statement do not explicitly specify the structure of the entity but merely its functionality A process statement is a concurrent statement that can appear with in an architecture body MIXED STYLE OF MODELING
It is possible to mix the three modeling styles in a single architecture body That is within an architecture body we could use component instantiation statements concurrent signal assignment statements and process statements
MODEL ANALYSIS
Once an entity is declared in VHDL it can be validated using analyzer and a simulator that are apart of a VHDL system The first step in the validation process is analysis The analyzer takes a file that contains one or more design units and compile s them into an intermediate form The generated intermediate form is stored in a specific design library that has been designated as the working library
There is a design library with the logic name STD predefined by the VHDL language environment This library contains two packages STANDARD and TEXTIO The STANDARD package contains declarations for all the predefined types of the language The TexTIO package contains procedures and functions that are necessary for supporting formatted text read and write operations There also exists an IEEE standard package called STD_LOGIC_1164and contains its associated sub types overloaded operator functions and other useful utilities This standard is called the IEEE STD 1164 ndash1993
SIMULATION
For a hierarchical entity to be simulated all of its lowest ndashlevel components must be described at the behavioral level A simulation can be performed on either one of the following
1 An entity declaration and an architecture body pair
2 A configuration
10
Preceding the actual simulation are two major steps
1 Elaboration phase IN this phase the hierarchy of the entity is expanded and linked components are bound to entities in a library and the top- level entity is built as a network of behavioral models that is ready to be simulated
2 Initialization phase Driving and effective values for all explicitly declared signals are computed implicit signals are assigned values processes are executed once until they suspend and simulation time is set to 0ns
Simulation commences by advancing time to that of the next event Values that are assigned to signals at this time are assigned If the value of a signal changes and if that signal is present in the sensitivity list of a process the process is executed until it suspends Simulation stops when an assertion occurs depending on the implementation of the VHDL system or when the maximum time as defined by the language is reached
Entity Declaration
An entity declaration describes the external interface of the entity It specifies the name of the entity the names of the interface ports their mode and the type of ports The syntax for entity declaration is
Entity entity _name is [generic (list of ndashgenerics and ndashtheir types)] [port (list of interface-port-names-and their types )]
[entity item declarations] [begin entity statements] end [entity][entity name]
The entity ndashname is the name of the entity and the interface ports are the signals through which entity passes the information to and from its external environment Each interface port can have one of the following modes
1 in The value of an input port can only read with in the entity model 2 out The value of an out put port can only be updated within the entity model3 inout The value of a bi directional port can be read and updated within the entity
model4 buffer The value of a buffer port can be read and updated within the entity
model It cannot have more than one source
Declarations that are placed in the entity are common to all the design units that are associated with that entity declaration
ARCHITECTURE BODY
An architecture body describes the internal view of an entity It describes the functionality of the structure of the entity
11
Architecture ltarchitecture namegt oflt entity namegt is Begin Concurrent statements Process statements Block statements Concurrent signal assignment-statement Component ndashinstantiation-statement Generate statement End [architecture] [architecture name]
The concurrent statements describe the internal composition of the entity All
concurrent statements are executed in parallel The internal composition of an entity can be expressed in terms of structure dataflow and sequential behavior
Here we describe an entity by using the behavioral model A process statement which is a concurrent statement is the primary mechanism used to describe the functionality of an entity in this modeling style
26 PROCESS STATEMENT
A process statement contains sequential statements that describe the functionality of a portion of an entity in sequential terms The syntax for the process statement is
[Process-label] process [(sensitivity-list)] [is] begin sequential statements variable-assignment-statement signal assignment-statement wait statement if-statement case-statement loop-statement null-statement exit-statement next-statement assertion-statement report-statement procedure-call-statement return end process [process label]
A set of signals to which the process is sensitive is defined by the sensitivity list In other words each time an event occurs on any of the signals in the sensitivity list the sequential statements with in the process are executed in a sequential order that is in the order in which they appear The process then suspends after executing the last sequential statement and waits for another event to occur on a signal in the sensitivity list
12
VARIABLE ASSIGNMENT STAEMENT
Variables can be declared and used inside a process statement A variable is assigned a value using the variable assignment statement that typically has the form
Variable-object = expression
The expression is evaluated when the statement is executed and the computed value is assigned to the variable object instantaneously that is at the concurrent simulation time
A variable can be declared outside of a process or subprogram Such a variable can be read and updated by more than one process These variables are called shared variables
SIGNAL ASSIGNMENT STATEMENT
Signals are assigned values using a signal assignment statement The simplest form of a signal assignment statement is
Signal-object lt= expression [after a delay value]
A signal assignment statement can appear within a process or outside of a process If it occurs outside of a process it is considered to be a concurrent signal assignment statement
When a signal assignment statement appears with in a process it is considered to be a sequential signal assignment statement and is executed in sequences with respect to the other statements which appear with in the process
27 CONDITIONAL STATEMENTS
IF STATEMENT An if statement selects a sequence of statements for execution of statements for execution based on the value of a condition the condition The condition can be any expression that evaluates to a Boolean value The general form of an if statement is
If Boolean expression thenSequential statementselsif Boolean-expression thenSequential-statements[else sequential statements] end if
The if statement is executed by checking each condition sequentially until the first true condition is found the set of sequential statements associated with this condition is executed An if statement is also a sequential statement
13
CASE STATEMENT
The format of a case statement is Case expression is
When choices =gtsequential statementsWhen choices =gtsequential statements End case
The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement
LOOP STATEMENTS
A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is
[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]
14
28 Active HDL Overview
Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries
Standards Supported
VHDL
The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard
Verilog
The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL
EDIF
Active-HDL supports Electronic Design Interchange Format version 2 0 0
VITAL
The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21
WAVES
Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system
15
29 ACTIVE-HDL Macro Language
All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)
1 HDL Editor
HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts
2 Block Diagram Editor
Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
3 State Diagram Editor
State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
4 Waveform Editor
Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors
5 Design Browser
The Design Browser window displays the contents of the current design that is
Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the
current design
16
6 Console window
The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console
210 Compilation
Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following
VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)
In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram
A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection
Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled
211 Simulation
The purpose of simulation is to verify that the circuit works as desired
The Active-HDL simulator provides two simulation engines
Event-Driven Simulation Cycle-Based Simulation
17
The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven
212 XILINX
Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish
ISE enables you to start your design with any of a number of different source types including
HDL (VHDL Verilog HDL ABEL)
Schematic design files
EDIF
NGCNGO
State Machines
IP Cores
From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration
Design Entry
ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports
18
Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow
CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories
Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints
PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints
State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL
Implementation
Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file
Map - The Map program maps a logical design to a Xilinx FPGA
Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator
Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design
FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing
Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design
Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file
Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments
19
Device Download and Program File Formatting
BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration
iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device
XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices
Integration with ChipScope Pro
CH 3 Introduction to SEA
Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication
In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle
20
SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo
Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems
For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted
31 Specifications
Parameters and Definitions
SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds
As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0
Basic Operations
21
Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _
These operations are formally defined as follows
1 Bitwise XOR
The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1
2 Substitution Box S
SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition
S Znb2b rarr Znb
2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR
Word Rotation R
The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1
Bit Rotation r
The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word
22
Addition mod2b _
The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1
The Round and Key Round
Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that
[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri
[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri
[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi
23
R
R-1
r S
ki
Li Ri
Li+1 Ri+1
KLi KRi
R r S
Ci
KLi+1
KRi+1
FIG 31 Encryptdecrypt round and key round
The Complete Cipher
The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K
key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))
encryptionfor i in 1 to nr2
[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)
24
finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD
32 Design Properties of the Components
Substitution Box S
The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table
Bit and Word Rotations r and R
The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3
The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds
Addition mod 2b _
Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in
25
most processors (4) necessity to avoid structural attacks
33 Overall Structure
The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round
34 Security Analysis
Resistance Against Known Attacks
Linear and Differential Cryptanalysis
From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round
Extensions of Linear and Differential Cryptanalysis
Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear
26
cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack
However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC
It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks
As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account
A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a
Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a
These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and
27
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
Micron Technology
Micron SM DSM VDSM
13 MICRON TECHNOLOGY
The micron technology can be classified into 4 categories Evolving from micron technology and extending up to VDSM
Micron Technology The technology up to 10-6 m is the micron
Technology
Submicron Technology The technology below 1um is known as the Submicron technology It generally ranges up to 036m
DSM(Deep Sub Micron technology) The technology extending up to 018m is DSM
VDSM(Very Deep Sub Micron technology) The presently used technology is VDSM It ranges up to 009um
14 FEATURES
5
21 INTRODUCTION TO VHDL
VHDL is acronym for VHSIC hardware Description languageVHSIC is acronym for very high speed Integrated Circuits It is a hardware description language that can be used to model a digital system at many levels of abstraction ranging from the algorithmic level to the gate level
The VHDL language can be regarded as an integrated amalgamation of the following languages
Sequential language
Concurrent language
Net-list language
Timing specifications
Waveform generation language VHDL
This language not only defines the syntax but also defines very clear simulation semantics for each language construct Therefore models written in this language can be verified using a VHDL simulator This subset is usually sufficient to model most applications The complete language however has sufficient power to capture the descriptions of the most complex chips to a complete electronic system
HISTORY
The requirements for the language were first generated in 1988 under the VHSIC chips for the department of Defence (DOD) Reprocurement and reuse was also a big issue Thus a need for a standardized hardware description language for the design documentation and verification of the digital systems was generated The IEEE in the December 1987 standardized VHDL language this version of the language is known as the IEEE STD 1076-1987 The official language description appears in the IEEE standard VHDL language Reference manual available from IEEE The language has also been recognized as an American National Standards Institute (ANSI) standard
According to IEEE rules an IEEE standard has to be reballoted every 5 years so that it may remain a standard so that it may remain a standard Consequently the language was upgraded with new features the syntax of many constructs was made more uniform and many ambiguities present in the 1987 version of the language were resolved This new version of the language is known as the IEEE STD 1076-1993
6
22 CAPABILITIES
The following are the major capabilities that the language provides along with the features that the language provides along with the features that differentiate it from other hardware languages
The language can be used as exchange medium between chip vendors and CAD tool users Different chip vendors can provide VHDL descriptions of their components to system designers
The language can be used as a communication medium between different CAD and CAE tools
The language supports hierarchy that is a digital can be modeled as asset of interconnected components each component in turn can be modeled as a set of interconnected subcomponents
The language supports flexible design methodologies top-down bottom-up or mixed It supports both synchronous and asynchronous timing models
Various digital modeling techniques such as finite ndashstate machine descriptions and Boolean equations can be modeled using the language
The language is publicly available human-readable and machine-readable
The language supports three basic different styles Structural Dataflow and behavioral
It supports a wide range of abstraction levels ranging from abstract behavioral descriptions to very precise gate-level descriptions
Arbitrarily large designs can be modeled using the language and there are no limitations imposed by the language on the size of the design
23 HARDWARE ABSTRACTION
VHDL is used to describe a model for a digital hardware device This model specifies the external view of the device and one or more internal views The internal view of the device specifies functionality or structure while the external view specifies the interface of the device through which it communicates with the other modules in the environment In VHDL each device model is treated as a distinct representation of a unique device called an Entity The Entity is thus a hardware abstraction of the actual hardware device Each Entity is described using one model which contains one external view and one or more internal views
7
24 Basic terminology
VHDL is a hardware description language that can be used to model a digital system A hardware abstraction of this digital system is called an entity An entity X when used in another entity Y becomes a component for the entity YTo describe an entity VHDL provides five different types of primary constructs called design units They are
1 Entity declaration 2 Architecture body 3 Configuration declaration 4 Package declaration 5 Package body
1 An entity is modeled using an entity declaration and at least one architecture body the Entity declaration describes the external view of the entity
For example the input and output signal names2 The architecture body contains the internal description of the entity for
example as a set of interconnected components that represents the structure of the entity or a set of concurrent or sequential statements that represents the behavior of the entity
3 A configuration declaration is used to create a configuration for an entity It specifies the binding of one architecture body from the many architecture bodies that may be associated with the entity It may also specify the bindings of the architecture components used in the selected architecture body to other entities An entity may have any number of configurations
4 A package declaration encapsulates a set of related declarations such type of declaration s subtype declaration and subprogram declaration which can be shared across two or more design units
5 A package body contains the definition of subprogram declared in a package declaration
Once an entity has been modeled it needs to be validated by a VHDL system A typical VHDL system consists of an analyzer and a simulator The analyzer reads in one or more design units contained in a single file and compiles them into a design library after validating the syntax and performing some static checks
The language is case insensitive that is lowercase and uppercase characters are treated alike the Language is also free format comments are specified in the language by preceding the text with two Consecutive dashes (- -)
Entity Declaration
The entity declaration specifies the name of entity being modeled and lists the set of inter face ports Ports are signals through which entity communicates with other models in its external environment
8
EXAMPLE
Entity declaration for the half adder circuit is
Entity half adder is Port (A B in Bit sum carry out Bit) End half adder
The entity called half adder has two input ports A and B and two out put ports sum and carry Bit is predefined type of the language
Architecture Body
An architecture body using any of the following modeling styles specifies the internal details of an entity
1 As a set of interconnected components (to represent structure)2 As a set of concurrent assignment statements (to represent data flow)3 As a set of sequential assignment statements (to represent behavior)4 As any combination of the above three
25 Structural style of modeling
In this one an entity is described as a set of interconnected components Such a model for the HALF_ADDER entity is described in a n architecture body
Architecture ha of ha isComponent Xor2 Port (X Y in BIT Zout BIT)End componentComponent And2 Port (L M in BIT NoutBIT) End component Begin X1 Xor2portmap (A B SUM) A1 AND2portmap (A B CARRY) End ha
The name of the architecture body is ha the entity declaration for half adder specifies the interface ports for this architecture body The architecture body is composed of two parts the declaration part and the statement part Two component declarations are present in the declarative part of the architecture body
The declared components are instantiated in the statement part of the architecture body using component instantiation The signals in the port map of a component instantiation and the port signals in the component declaration are associated by the position
9
DATAFLOW STYLE OF MODELING
In this modeling style the flow of data through the entity is expressed primarily using concurrent signal assignment statements The data flow model for the half adder is described using two concurrent signal assignment statements In a signal assignment statement the symbol lt=implies an assignment of a value to a signal
BEHAVIORAL STYLE OF MODELING
The behavioral style of modeling specifies the behavior of an entity as a set of statements that are executed sequentially in the specific order These sets of sequential statements which are specified inside a process statement do not explicitly specify the structure of the entity but merely its functionality A process statement is a concurrent statement that can appear with in an architecture body MIXED STYLE OF MODELING
It is possible to mix the three modeling styles in a single architecture body That is within an architecture body we could use component instantiation statements concurrent signal assignment statements and process statements
MODEL ANALYSIS
Once an entity is declared in VHDL it can be validated using analyzer and a simulator that are apart of a VHDL system The first step in the validation process is analysis The analyzer takes a file that contains one or more design units and compile s them into an intermediate form The generated intermediate form is stored in a specific design library that has been designated as the working library
There is a design library with the logic name STD predefined by the VHDL language environment This library contains two packages STANDARD and TEXTIO The STANDARD package contains declarations for all the predefined types of the language The TexTIO package contains procedures and functions that are necessary for supporting formatted text read and write operations There also exists an IEEE standard package called STD_LOGIC_1164and contains its associated sub types overloaded operator functions and other useful utilities This standard is called the IEEE STD 1164 ndash1993
SIMULATION
For a hierarchical entity to be simulated all of its lowest ndashlevel components must be described at the behavioral level A simulation can be performed on either one of the following
1 An entity declaration and an architecture body pair
2 A configuration
10
Preceding the actual simulation are two major steps
1 Elaboration phase IN this phase the hierarchy of the entity is expanded and linked components are bound to entities in a library and the top- level entity is built as a network of behavioral models that is ready to be simulated
2 Initialization phase Driving and effective values for all explicitly declared signals are computed implicit signals are assigned values processes are executed once until they suspend and simulation time is set to 0ns
Simulation commences by advancing time to that of the next event Values that are assigned to signals at this time are assigned If the value of a signal changes and if that signal is present in the sensitivity list of a process the process is executed until it suspends Simulation stops when an assertion occurs depending on the implementation of the VHDL system or when the maximum time as defined by the language is reached
Entity Declaration
An entity declaration describes the external interface of the entity It specifies the name of the entity the names of the interface ports their mode and the type of ports The syntax for entity declaration is
Entity entity _name is [generic (list of ndashgenerics and ndashtheir types)] [port (list of interface-port-names-and their types )]
[entity item declarations] [begin entity statements] end [entity][entity name]
The entity ndashname is the name of the entity and the interface ports are the signals through which entity passes the information to and from its external environment Each interface port can have one of the following modes
1 in The value of an input port can only read with in the entity model 2 out The value of an out put port can only be updated within the entity model3 inout The value of a bi directional port can be read and updated within the entity
model4 buffer The value of a buffer port can be read and updated within the entity
model It cannot have more than one source
Declarations that are placed in the entity are common to all the design units that are associated with that entity declaration
ARCHITECTURE BODY
An architecture body describes the internal view of an entity It describes the functionality of the structure of the entity
11
Architecture ltarchitecture namegt oflt entity namegt is Begin Concurrent statements Process statements Block statements Concurrent signal assignment-statement Component ndashinstantiation-statement Generate statement End [architecture] [architecture name]
The concurrent statements describe the internal composition of the entity All
concurrent statements are executed in parallel The internal composition of an entity can be expressed in terms of structure dataflow and sequential behavior
Here we describe an entity by using the behavioral model A process statement which is a concurrent statement is the primary mechanism used to describe the functionality of an entity in this modeling style
26 PROCESS STATEMENT
A process statement contains sequential statements that describe the functionality of a portion of an entity in sequential terms The syntax for the process statement is
[Process-label] process [(sensitivity-list)] [is] begin sequential statements variable-assignment-statement signal assignment-statement wait statement if-statement case-statement loop-statement null-statement exit-statement next-statement assertion-statement report-statement procedure-call-statement return end process [process label]
A set of signals to which the process is sensitive is defined by the sensitivity list In other words each time an event occurs on any of the signals in the sensitivity list the sequential statements with in the process are executed in a sequential order that is in the order in which they appear The process then suspends after executing the last sequential statement and waits for another event to occur on a signal in the sensitivity list
12
VARIABLE ASSIGNMENT STAEMENT
Variables can be declared and used inside a process statement A variable is assigned a value using the variable assignment statement that typically has the form
Variable-object = expression
The expression is evaluated when the statement is executed and the computed value is assigned to the variable object instantaneously that is at the concurrent simulation time
A variable can be declared outside of a process or subprogram Such a variable can be read and updated by more than one process These variables are called shared variables
SIGNAL ASSIGNMENT STATEMENT
Signals are assigned values using a signal assignment statement The simplest form of a signal assignment statement is
Signal-object lt= expression [after a delay value]
A signal assignment statement can appear within a process or outside of a process If it occurs outside of a process it is considered to be a concurrent signal assignment statement
When a signal assignment statement appears with in a process it is considered to be a sequential signal assignment statement and is executed in sequences with respect to the other statements which appear with in the process
27 CONDITIONAL STATEMENTS
IF STATEMENT An if statement selects a sequence of statements for execution of statements for execution based on the value of a condition the condition The condition can be any expression that evaluates to a Boolean value The general form of an if statement is
If Boolean expression thenSequential statementselsif Boolean-expression thenSequential-statements[else sequential statements] end if
The if statement is executed by checking each condition sequentially until the first true condition is found the set of sequential statements associated with this condition is executed An if statement is also a sequential statement
13
CASE STATEMENT
The format of a case statement is Case expression is
When choices =gtsequential statementsWhen choices =gtsequential statements End case
The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement
LOOP STATEMENTS
A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is
[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]
14
28 Active HDL Overview
Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries
Standards Supported
VHDL
The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard
Verilog
The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL
EDIF
Active-HDL supports Electronic Design Interchange Format version 2 0 0
VITAL
The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21
WAVES
Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system
15
29 ACTIVE-HDL Macro Language
All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)
1 HDL Editor
HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts
2 Block Diagram Editor
Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
3 State Diagram Editor
State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
4 Waveform Editor
Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors
5 Design Browser
The Design Browser window displays the contents of the current design that is
Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the
current design
16
6 Console window
The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console
210 Compilation
Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following
VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)
In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram
A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection
Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled
211 Simulation
The purpose of simulation is to verify that the circuit works as desired
The Active-HDL simulator provides two simulation engines
Event-Driven Simulation Cycle-Based Simulation
17
The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven
212 XILINX
Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish
ISE enables you to start your design with any of a number of different source types including
HDL (VHDL Verilog HDL ABEL)
Schematic design files
EDIF
NGCNGO
State Machines
IP Cores
From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration
Design Entry
ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports
18
Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow
CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories
Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints
PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints
State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL
Implementation
Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file
Map - The Map program maps a logical design to a Xilinx FPGA
Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator
Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design
FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing
Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design
Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file
Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments
19
Device Download and Program File Formatting
BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration
iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device
XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices
Integration with ChipScope Pro
CH 3 Introduction to SEA
Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication
In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle
20
SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo
Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems
For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted
31 Specifications
Parameters and Definitions
SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds
As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0
Basic Operations
21
Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _
These operations are formally defined as follows
1 Bitwise XOR
The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1
2 Substitution Box S
SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition
S Znb2b rarr Znb
2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR
Word Rotation R
The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1
Bit Rotation r
The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word
22
Addition mod2b _
The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1
The Round and Key Round
Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that
[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri
[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri
[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi
23
R
R-1
r S
ki
Li Ri
Li+1 Ri+1
KLi KRi
R r S
Ci
KLi+1
KRi+1
FIG 31 Encryptdecrypt round and key round
The Complete Cipher
The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K
key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))
encryptionfor i in 1 to nr2
[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)
24
finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD
32 Design Properties of the Components
Substitution Box S
The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table
Bit and Word Rotations r and R
The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3
The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds
Addition mod 2b _
Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in
25
most processors (4) necessity to avoid structural attacks
33 Overall Structure
The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round
34 Security Analysis
Resistance Against Known Attacks
Linear and Differential Cryptanalysis
From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round
Extensions of Linear and Differential Cryptanalysis
Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear
26
cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack
However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC
It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks
As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account
A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a
Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a
These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and
27
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
21 INTRODUCTION TO VHDL
VHDL is acronym for VHSIC hardware Description languageVHSIC is acronym for very high speed Integrated Circuits It is a hardware description language that can be used to model a digital system at many levels of abstraction ranging from the algorithmic level to the gate level
The VHDL language can be regarded as an integrated amalgamation of the following languages
Sequential language
Concurrent language
Net-list language
Timing specifications
Waveform generation language VHDL
This language not only defines the syntax but also defines very clear simulation semantics for each language construct Therefore models written in this language can be verified using a VHDL simulator This subset is usually sufficient to model most applications The complete language however has sufficient power to capture the descriptions of the most complex chips to a complete electronic system
HISTORY
The requirements for the language were first generated in 1988 under the VHSIC chips for the department of Defence (DOD) Reprocurement and reuse was also a big issue Thus a need for a standardized hardware description language for the design documentation and verification of the digital systems was generated The IEEE in the December 1987 standardized VHDL language this version of the language is known as the IEEE STD 1076-1987 The official language description appears in the IEEE standard VHDL language Reference manual available from IEEE The language has also been recognized as an American National Standards Institute (ANSI) standard
According to IEEE rules an IEEE standard has to be reballoted every 5 years so that it may remain a standard so that it may remain a standard Consequently the language was upgraded with new features the syntax of many constructs was made more uniform and many ambiguities present in the 1987 version of the language were resolved This new version of the language is known as the IEEE STD 1076-1993
6
22 CAPABILITIES
The following are the major capabilities that the language provides along with the features that the language provides along with the features that differentiate it from other hardware languages
The language can be used as exchange medium between chip vendors and CAD tool users Different chip vendors can provide VHDL descriptions of their components to system designers
The language can be used as a communication medium between different CAD and CAE tools
The language supports hierarchy that is a digital can be modeled as asset of interconnected components each component in turn can be modeled as a set of interconnected subcomponents
The language supports flexible design methodologies top-down bottom-up or mixed It supports both synchronous and asynchronous timing models
Various digital modeling techniques such as finite ndashstate machine descriptions and Boolean equations can be modeled using the language
The language is publicly available human-readable and machine-readable
The language supports three basic different styles Structural Dataflow and behavioral
It supports a wide range of abstraction levels ranging from abstract behavioral descriptions to very precise gate-level descriptions
Arbitrarily large designs can be modeled using the language and there are no limitations imposed by the language on the size of the design
23 HARDWARE ABSTRACTION
VHDL is used to describe a model for a digital hardware device This model specifies the external view of the device and one or more internal views The internal view of the device specifies functionality or structure while the external view specifies the interface of the device through which it communicates with the other modules in the environment In VHDL each device model is treated as a distinct representation of a unique device called an Entity The Entity is thus a hardware abstraction of the actual hardware device Each Entity is described using one model which contains one external view and one or more internal views
7
24 Basic terminology
VHDL is a hardware description language that can be used to model a digital system A hardware abstraction of this digital system is called an entity An entity X when used in another entity Y becomes a component for the entity YTo describe an entity VHDL provides five different types of primary constructs called design units They are
1 Entity declaration 2 Architecture body 3 Configuration declaration 4 Package declaration 5 Package body
1 An entity is modeled using an entity declaration and at least one architecture body the Entity declaration describes the external view of the entity
For example the input and output signal names2 The architecture body contains the internal description of the entity for
example as a set of interconnected components that represents the structure of the entity or a set of concurrent or sequential statements that represents the behavior of the entity
3 A configuration declaration is used to create a configuration for an entity It specifies the binding of one architecture body from the many architecture bodies that may be associated with the entity It may also specify the bindings of the architecture components used in the selected architecture body to other entities An entity may have any number of configurations
4 A package declaration encapsulates a set of related declarations such type of declaration s subtype declaration and subprogram declaration which can be shared across two or more design units
5 A package body contains the definition of subprogram declared in a package declaration
Once an entity has been modeled it needs to be validated by a VHDL system A typical VHDL system consists of an analyzer and a simulator The analyzer reads in one or more design units contained in a single file and compiles them into a design library after validating the syntax and performing some static checks
The language is case insensitive that is lowercase and uppercase characters are treated alike the Language is also free format comments are specified in the language by preceding the text with two Consecutive dashes (- -)
Entity Declaration
The entity declaration specifies the name of entity being modeled and lists the set of inter face ports Ports are signals through which entity communicates with other models in its external environment
8
EXAMPLE
Entity declaration for the half adder circuit is
Entity half adder is Port (A B in Bit sum carry out Bit) End half adder
The entity called half adder has two input ports A and B and two out put ports sum and carry Bit is predefined type of the language
Architecture Body
An architecture body using any of the following modeling styles specifies the internal details of an entity
1 As a set of interconnected components (to represent structure)2 As a set of concurrent assignment statements (to represent data flow)3 As a set of sequential assignment statements (to represent behavior)4 As any combination of the above three
25 Structural style of modeling
In this one an entity is described as a set of interconnected components Such a model for the HALF_ADDER entity is described in a n architecture body
Architecture ha of ha isComponent Xor2 Port (X Y in BIT Zout BIT)End componentComponent And2 Port (L M in BIT NoutBIT) End component Begin X1 Xor2portmap (A B SUM) A1 AND2portmap (A B CARRY) End ha
The name of the architecture body is ha the entity declaration for half adder specifies the interface ports for this architecture body The architecture body is composed of two parts the declaration part and the statement part Two component declarations are present in the declarative part of the architecture body
The declared components are instantiated in the statement part of the architecture body using component instantiation The signals in the port map of a component instantiation and the port signals in the component declaration are associated by the position
9
DATAFLOW STYLE OF MODELING
In this modeling style the flow of data through the entity is expressed primarily using concurrent signal assignment statements The data flow model for the half adder is described using two concurrent signal assignment statements In a signal assignment statement the symbol lt=implies an assignment of a value to a signal
BEHAVIORAL STYLE OF MODELING
The behavioral style of modeling specifies the behavior of an entity as a set of statements that are executed sequentially in the specific order These sets of sequential statements which are specified inside a process statement do not explicitly specify the structure of the entity but merely its functionality A process statement is a concurrent statement that can appear with in an architecture body MIXED STYLE OF MODELING
It is possible to mix the three modeling styles in a single architecture body That is within an architecture body we could use component instantiation statements concurrent signal assignment statements and process statements
MODEL ANALYSIS
Once an entity is declared in VHDL it can be validated using analyzer and a simulator that are apart of a VHDL system The first step in the validation process is analysis The analyzer takes a file that contains one or more design units and compile s them into an intermediate form The generated intermediate form is stored in a specific design library that has been designated as the working library
There is a design library with the logic name STD predefined by the VHDL language environment This library contains two packages STANDARD and TEXTIO The STANDARD package contains declarations for all the predefined types of the language The TexTIO package contains procedures and functions that are necessary for supporting formatted text read and write operations There also exists an IEEE standard package called STD_LOGIC_1164and contains its associated sub types overloaded operator functions and other useful utilities This standard is called the IEEE STD 1164 ndash1993
SIMULATION
For a hierarchical entity to be simulated all of its lowest ndashlevel components must be described at the behavioral level A simulation can be performed on either one of the following
1 An entity declaration and an architecture body pair
2 A configuration
10
Preceding the actual simulation are two major steps
1 Elaboration phase IN this phase the hierarchy of the entity is expanded and linked components are bound to entities in a library and the top- level entity is built as a network of behavioral models that is ready to be simulated
2 Initialization phase Driving and effective values for all explicitly declared signals are computed implicit signals are assigned values processes are executed once until they suspend and simulation time is set to 0ns
Simulation commences by advancing time to that of the next event Values that are assigned to signals at this time are assigned If the value of a signal changes and if that signal is present in the sensitivity list of a process the process is executed until it suspends Simulation stops when an assertion occurs depending on the implementation of the VHDL system or when the maximum time as defined by the language is reached
Entity Declaration
An entity declaration describes the external interface of the entity It specifies the name of the entity the names of the interface ports their mode and the type of ports The syntax for entity declaration is
Entity entity _name is [generic (list of ndashgenerics and ndashtheir types)] [port (list of interface-port-names-and their types )]
[entity item declarations] [begin entity statements] end [entity][entity name]
The entity ndashname is the name of the entity and the interface ports are the signals through which entity passes the information to and from its external environment Each interface port can have one of the following modes
1 in The value of an input port can only read with in the entity model 2 out The value of an out put port can only be updated within the entity model3 inout The value of a bi directional port can be read and updated within the entity
model4 buffer The value of a buffer port can be read and updated within the entity
model It cannot have more than one source
Declarations that are placed in the entity are common to all the design units that are associated with that entity declaration
ARCHITECTURE BODY
An architecture body describes the internal view of an entity It describes the functionality of the structure of the entity
11
Architecture ltarchitecture namegt oflt entity namegt is Begin Concurrent statements Process statements Block statements Concurrent signal assignment-statement Component ndashinstantiation-statement Generate statement End [architecture] [architecture name]
The concurrent statements describe the internal composition of the entity All
concurrent statements are executed in parallel The internal composition of an entity can be expressed in terms of structure dataflow and sequential behavior
Here we describe an entity by using the behavioral model A process statement which is a concurrent statement is the primary mechanism used to describe the functionality of an entity in this modeling style
26 PROCESS STATEMENT
A process statement contains sequential statements that describe the functionality of a portion of an entity in sequential terms The syntax for the process statement is
[Process-label] process [(sensitivity-list)] [is] begin sequential statements variable-assignment-statement signal assignment-statement wait statement if-statement case-statement loop-statement null-statement exit-statement next-statement assertion-statement report-statement procedure-call-statement return end process [process label]
A set of signals to which the process is sensitive is defined by the sensitivity list In other words each time an event occurs on any of the signals in the sensitivity list the sequential statements with in the process are executed in a sequential order that is in the order in which they appear The process then suspends after executing the last sequential statement and waits for another event to occur on a signal in the sensitivity list
12
VARIABLE ASSIGNMENT STAEMENT
Variables can be declared and used inside a process statement A variable is assigned a value using the variable assignment statement that typically has the form
Variable-object = expression
The expression is evaluated when the statement is executed and the computed value is assigned to the variable object instantaneously that is at the concurrent simulation time
A variable can be declared outside of a process or subprogram Such a variable can be read and updated by more than one process These variables are called shared variables
SIGNAL ASSIGNMENT STATEMENT
Signals are assigned values using a signal assignment statement The simplest form of a signal assignment statement is
Signal-object lt= expression [after a delay value]
A signal assignment statement can appear within a process or outside of a process If it occurs outside of a process it is considered to be a concurrent signal assignment statement
When a signal assignment statement appears with in a process it is considered to be a sequential signal assignment statement and is executed in sequences with respect to the other statements which appear with in the process
27 CONDITIONAL STATEMENTS
IF STATEMENT An if statement selects a sequence of statements for execution of statements for execution based on the value of a condition the condition The condition can be any expression that evaluates to a Boolean value The general form of an if statement is
If Boolean expression thenSequential statementselsif Boolean-expression thenSequential-statements[else sequential statements] end if
The if statement is executed by checking each condition sequentially until the first true condition is found the set of sequential statements associated with this condition is executed An if statement is also a sequential statement
13
CASE STATEMENT
The format of a case statement is Case expression is
When choices =gtsequential statementsWhen choices =gtsequential statements End case
The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement
LOOP STATEMENTS
A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is
[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]
14
28 Active HDL Overview
Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries
Standards Supported
VHDL
The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard
Verilog
The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL
EDIF
Active-HDL supports Electronic Design Interchange Format version 2 0 0
VITAL
The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21
WAVES
Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system
15
29 ACTIVE-HDL Macro Language
All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)
1 HDL Editor
HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts
2 Block Diagram Editor
Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
3 State Diagram Editor
State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
4 Waveform Editor
Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors
5 Design Browser
The Design Browser window displays the contents of the current design that is
Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the
current design
16
6 Console window
The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console
210 Compilation
Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following
VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)
In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram
A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection
Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled
211 Simulation
The purpose of simulation is to verify that the circuit works as desired
The Active-HDL simulator provides two simulation engines
Event-Driven Simulation Cycle-Based Simulation
17
The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven
212 XILINX
Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish
ISE enables you to start your design with any of a number of different source types including
HDL (VHDL Verilog HDL ABEL)
Schematic design files
EDIF
NGCNGO
State Machines
IP Cores
From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration
Design Entry
ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports
18
Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow
CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories
Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints
PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints
State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL
Implementation
Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file
Map - The Map program maps a logical design to a Xilinx FPGA
Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator
Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design
FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing
Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design
Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file
Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments
19
Device Download and Program File Formatting
BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration
iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device
XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices
Integration with ChipScope Pro
CH 3 Introduction to SEA
Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication
In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle
20
SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo
Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems
For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted
31 Specifications
Parameters and Definitions
SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds
As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0
Basic Operations
21
Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _
These operations are formally defined as follows
1 Bitwise XOR
The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1
2 Substitution Box S
SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition
S Znb2b rarr Znb
2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR
Word Rotation R
The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1
Bit Rotation r
The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word
22
Addition mod2b _
The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1
The Round and Key Round
Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that
[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri
[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri
[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi
23
R
R-1
r S
ki
Li Ri
Li+1 Ri+1
KLi KRi
R r S
Ci
KLi+1
KRi+1
FIG 31 Encryptdecrypt round and key round
The Complete Cipher
The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K
key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))
encryptionfor i in 1 to nr2
[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)
24
finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD
32 Design Properties of the Components
Substitution Box S
The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table
Bit and Word Rotations r and R
The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3
The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds
Addition mod 2b _
Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in
25
most processors (4) necessity to avoid structural attacks
33 Overall Structure
The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round
34 Security Analysis
Resistance Against Known Attacks
Linear and Differential Cryptanalysis
From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round
Extensions of Linear and Differential Cryptanalysis
Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear
26
cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack
However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC
It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks
As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account
A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a
Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a
These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and
27
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
22 CAPABILITIES
The following are the major capabilities that the language provides along with the features that the language provides along with the features that differentiate it from other hardware languages
The language can be used as exchange medium between chip vendors and CAD tool users Different chip vendors can provide VHDL descriptions of their components to system designers
The language can be used as a communication medium between different CAD and CAE tools
The language supports hierarchy that is a digital can be modeled as asset of interconnected components each component in turn can be modeled as a set of interconnected subcomponents
The language supports flexible design methodologies top-down bottom-up or mixed It supports both synchronous and asynchronous timing models
Various digital modeling techniques such as finite ndashstate machine descriptions and Boolean equations can be modeled using the language
The language is publicly available human-readable and machine-readable
The language supports three basic different styles Structural Dataflow and behavioral
It supports a wide range of abstraction levels ranging from abstract behavioral descriptions to very precise gate-level descriptions
Arbitrarily large designs can be modeled using the language and there are no limitations imposed by the language on the size of the design
23 HARDWARE ABSTRACTION
VHDL is used to describe a model for a digital hardware device This model specifies the external view of the device and one or more internal views The internal view of the device specifies functionality or structure while the external view specifies the interface of the device through which it communicates with the other modules in the environment In VHDL each device model is treated as a distinct representation of a unique device called an Entity The Entity is thus a hardware abstraction of the actual hardware device Each Entity is described using one model which contains one external view and one or more internal views
7
24 Basic terminology
VHDL is a hardware description language that can be used to model a digital system A hardware abstraction of this digital system is called an entity An entity X when used in another entity Y becomes a component for the entity YTo describe an entity VHDL provides five different types of primary constructs called design units They are
1 Entity declaration 2 Architecture body 3 Configuration declaration 4 Package declaration 5 Package body
1 An entity is modeled using an entity declaration and at least one architecture body the Entity declaration describes the external view of the entity
For example the input and output signal names2 The architecture body contains the internal description of the entity for
example as a set of interconnected components that represents the structure of the entity or a set of concurrent or sequential statements that represents the behavior of the entity
3 A configuration declaration is used to create a configuration for an entity It specifies the binding of one architecture body from the many architecture bodies that may be associated with the entity It may also specify the bindings of the architecture components used in the selected architecture body to other entities An entity may have any number of configurations
4 A package declaration encapsulates a set of related declarations such type of declaration s subtype declaration and subprogram declaration which can be shared across two or more design units
5 A package body contains the definition of subprogram declared in a package declaration
Once an entity has been modeled it needs to be validated by a VHDL system A typical VHDL system consists of an analyzer and a simulator The analyzer reads in one or more design units contained in a single file and compiles them into a design library after validating the syntax and performing some static checks
The language is case insensitive that is lowercase and uppercase characters are treated alike the Language is also free format comments are specified in the language by preceding the text with two Consecutive dashes (- -)
Entity Declaration
The entity declaration specifies the name of entity being modeled and lists the set of inter face ports Ports are signals through which entity communicates with other models in its external environment
8
EXAMPLE
Entity declaration for the half adder circuit is
Entity half adder is Port (A B in Bit sum carry out Bit) End half adder
The entity called half adder has two input ports A and B and two out put ports sum and carry Bit is predefined type of the language
Architecture Body
An architecture body using any of the following modeling styles specifies the internal details of an entity
1 As a set of interconnected components (to represent structure)2 As a set of concurrent assignment statements (to represent data flow)3 As a set of sequential assignment statements (to represent behavior)4 As any combination of the above three
25 Structural style of modeling
In this one an entity is described as a set of interconnected components Such a model for the HALF_ADDER entity is described in a n architecture body
Architecture ha of ha isComponent Xor2 Port (X Y in BIT Zout BIT)End componentComponent And2 Port (L M in BIT NoutBIT) End component Begin X1 Xor2portmap (A B SUM) A1 AND2portmap (A B CARRY) End ha
The name of the architecture body is ha the entity declaration for half adder specifies the interface ports for this architecture body The architecture body is composed of two parts the declaration part and the statement part Two component declarations are present in the declarative part of the architecture body
The declared components are instantiated in the statement part of the architecture body using component instantiation The signals in the port map of a component instantiation and the port signals in the component declaration are associated by the position
9
DATAFLOW STYLE OF MODELING
In this modeling style the flow of data through the entity is expressed primarily using concurrent signal assignment statements The data flow model for the half adder is described using two concurrent signal assignment statements In a signal assignment statement the symbol lt=implies an assignment of a value to a signal
BEHAVIORAL STYLE OF MODELING
The behavioral style of modeling specifies the behavior of an entity as a set of statements that are executed sequentially in the specific order These sets of sequential statements which are specified inside a process statement do not explicitly specify the structure of the entity but merely its functionality A process statement is a concurrent statement that can appear with in an architecture body MIXED STYLE OF MODELING
It is possible to mix the three modeling styles in a single architecture body That is within an architecture body we could use component instantiation statements concurrent signal assignment statements and process statements
MODEL ANALYSIS
Once an entity is declared in VHDL it can be validated using analyzer and a simulator that are apart of a VHDL system The first step in the validation process is analysis The analyzer takes a file that contains one or more design units and compile s them into an intermediate form The generated intermediate form is stored in a specific design library that has been designated as the working library
There is a design library with the logic name STD predefined by the VHDL language environment This library contains two packages STANDARD and TEXTIO The STANDARD package contains declarations for all the predefined types of the language The TexTIO package contains procedures and functions that are necessary for supporting formatted text read and write operations There also exists an IEEE standard package called STD_LOGIC_1164and contains its associated sub types overloaded operator functions and other useful utilities This standard is called the IEEE STD 1164 ndash1993
SIMULATION
For a hierarchical entity to be simulated all of its lowest ndashlevel components must be described at the behavioral level A simulation can be performed on either one of the following
1 An entity declaration and an architecture body pair
2 A configuration
10
Preceding the actual simulation are two major steps
1 Elaboration phase IN this phase the hierarchy of the entity is expanded and linked components are bound to entities in a library and the top- level entity is built as a network of behavioral models that is ready to be simulated
2 Initialization phase Driving and effective values for all explicitly declared signals are computed implicit signals are assigned values processes are executed once until they suspend and simulation time is set to 0ns
Simulation commences by advancing time to that of the next event Values that are assigned to signals at this time are assigned If the value of a signal changes and if that signal is present in the sensitivity list of a process the process is executed until it suspends Simulation stops when an assertion occurs depending on the implementation of the VHDL system or when the maximum time as defined by the language is reached
Entity Declaration
An entity declaration describes the external interface of the entity It specifies the name of the entity the names of the interface ports their mode and the type of ports The syntax for entity declaration is
Entity entity _name is [generic (list of ndashgenerics and ndashtheir types)] [port (list of interface-port-names-and their types )]
[entity item declarations] [begin entity statements] end [entity][entity name]
The entity ndashname is the name of the entity and the interface ports are the signals through which entity passes the information to and from its external environment Each interface port can have one of the following modes
1 in The value of an input port can only read with in the entity model 2 out The value of an out put port can only be updated within the entity model3 inout The value of a bi directional port can be read and updated within the entity
model4 buffer The value of a buffer port can be read and updated within the entity
model It cannot have more than one source
Declarations that are placed in the entity are common to all the design units that are associated with that entity declaration
ARCHITECTURE BODY
An architecture body describes the internal view of an entity It describes the functionality of the structure of the entity
11
Architecture ltarchitecture namegt oflt entity namegt is Begin Concurrent statements Process statements Block statements Concurrent signal assignment-statement Component ndashinstantiation-statement Generate statement End [architecture] [architecture name]
The concurrent statements describe the internal composition of the entity All
concurrent statements are executed in parallel The internal composition of an entity can be expressed in terms of structure dataflow and sequential behavior
Here we describe an entity by using the behavioral model A process statement which is a concurrent statement is the primary mechanism used to describe the functionality of an entity in this modeling style
26 PROCESS STATEMENT
A process statement contains sequential statements that describe the functionality of a portion of an entity in sequential terms The syntax for the process statement is
[Process-label] process [(sensitivity-list)] [is] begin sequential statements variable-assignment-statement signal assignment-statement wait statement if-statement case-statement loop-statement null-statement exit-statement next-statement assertion-statement report-statement procedure-call-statement return end process [process label]
A set of signals to which the process is sensitive is defined by the sensitivity list In other words each time an event occurs on any of the signals in the sensitivity list the sequential statements with in the process are executed in a sequential order that is in the order in which they appear The process then suspends after executing the last sequential statement and waits for another event to occur on a signal in the sensitivity list
12
VARIABLE ASSIGNMENT STAEMENT
Variables can be declared and used inside a process statement A variable is assigned a value using the variable assignment statement that typically has the form
Variable-object = expression
The expression is evaluated when the statement is executed and the computed value is assigned to the variable object instantaneously that is at the concurrent simulation time
A variable can be declared outside of a process or subprogram Such a variable can be read and updated by more than one process These variables are called shared variables
SIGNAL ASSIGNMENT STATEMENT
Signals are assigned values using a signal assignment statement The simplest form of a signal assignment statement is
Signal-object lt= expression [after a delay value]
A signal assignment statement can appear within a process or outside of a process If it occurs outside of a process it is considered to be a concurrent signal assignment statement
When a signal assignment statement appears with in a process it is considered to be a sequential signal assignment statement and is executed in sequences with respect to the other statements which appear with in the process
27 CONDITIONAL STATEMENTS
IF STATEMENT An if statement selects a sequence of statements for execution of statements for execution based on the value of a condition the condition The condition can be any expression that evaluates to a Boolean value The general form of an if statement is
If Boolean expression thenSequential statementselsif Boolean-expression thenSequential-statements[else sequential statements] end if
The if statement is executed by checking each condition sequentially until the first true condition is found the set of sequential statements associated with this condition is executed An if statement is also a sequential statement
13
CASE STATEMENT
The format of a case statement is Case expression is
When choices =gtsequential statementsWhen choices =gtsequential statements End case
The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement
LOOP STATEMENTS
A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is
[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]
14
28 Active HDL Overview
Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries
Standards Supported
VHDL
The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard
Verilog
The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL
EDIF
Active-HDL supports Electronic Design Interchange Format version 2 0 0
VITAL
The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21
WAVES
Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system
15
29 ACTIVE-HDL Macro Language
All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)
1 HDL Editor
HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts
2 Block Diagram Editor
Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
3 State Diagram Editor
State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
4 Waveform Editor
Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors
5 Design Browser
The Design Browser window displays the contents of the current design that is
Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the
current design
16
6 Console window
The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console
210 Compilation
Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following
VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)
In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram
A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection
Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled
211 Simulation
The purpose of simulation is to verify that the circuit works as desired
The Active-HDL simulator provides two simulation engines
Event-Driven Simulation Cycle-Based Simulation
17
The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven
212 XILINX
Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish
ISE enables you to start your design with any of a number of different source types including
HDL (VHDL Verilog HDL ABEL)
Schematic design files
EDIF
NGCNGO
State Machines
IP Cores
From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration
Design Entry
ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports
18
Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow
CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories
Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints
PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints
State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL
Implementation
Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file
Map - The Map program maps a logical design to a Xilinx FPGA
Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator
Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design
FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing
Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design
Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file
Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments
19
Device Download and Program File Formatting
BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration
iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device
XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices
Integration with ChipScope Pro
CH 3 Introduction to SEA
Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication
In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle
20
SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo
Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems
For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted
31 Specifications
Parameters and Definitions
SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds
As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0
Basic Operations
21
Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _
These operations are formally defined as follows
1 Bitwise XOR
The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1
2 Substitution Box S
SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition
S Znb2b rarr Znb
2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR
Word Rotation R
The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1
Bit Rotation r
The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word
22
Addition mod2b _
The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1
The Round and Key Round
Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that
[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri
[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri
[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi
23
R
R-1
r S
ki
Li Ri
Li+1 Ri+1
KLi KRi
R r S
Ci
KLi+1
KRi+1
FIG 31 Encryptdecrypt round and key round
The Complete Cipher
The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K
key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))
encryptionfor i in 1 to nr2
[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)
24
finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD
32 Design Properties of the Components
Substitution Box S
The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table
Bit and Word Rotations r and R
The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3
The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds
Addition mod 2b _
Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in
25
most processors (4) necessity to avoid structural attacks
33 Overall Structure
The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round
34 Security Analysis
Resistance Against Known Attacks
Linear and Differential Cryptanalysis
From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round
Extensions of Linear and Differential Cryptanalysis
Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear
26
cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack
However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC
It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks
As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account
A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a
Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a
These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and
27
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
24 Basic terminology
VHDL is a hardware description language that can be used to model a digital system A hardware abstraction of this digital system is called an entity An entity X when used in another entity Y becomes a component for the entity YTo describe an entity VHDL provides five different types of primary constructs called design units They are
1 Entity declaration 2 Architecture body 3 Configuration declaration 4 Package declaration 5 Package body
1 An entity is modeled using an entity declaration and at least one architecture body the Entity declaration describes the external view of the entity
For example the input and output signal names2 The architecture body contains the internal description of the entity for
example as a set of interconnected components that represents the structure of the entity or a set of concurrent or sequential statements that represents the behavior of the entity
3 A configuration declaration is used to create a configuration for an entity It specifies the binding of one architecture body from the many architecture bodies that may be associated with the entity It may also specify the bindings of the architecture components used in the selected architecture body to other entities An entity may have any number of configurations
4 A package declaration encapsulates a set of related declarations such type of declaration s subtype declaration and subprogram declaration which can be shared across two or more design units
5 A package body contains the definition of subprogram declared in a package declaration
Once an entity has been modeled it needs to be validated by a VHDL system A typical VHDL system consists of an analyzer and a simulator The analyzer reads in one or more design units contained in a single file and compiles them into a design library after validating the syntax and performing some static checks
The language is case insensitive that is lowercase and uppercase characters are treated alike the Language is also free format comments are specified in the language by preceding the text with two Consecutive dashes (- -)
Entity Declaration
The entity declaration specifies the name of entity being modeled and lists the set of inter face ports Ports are signals through which entity communicates with other models in its external environment
8
EXAMPLE
Entity declaration for the half adder circuit is
Entity half adder is Port (A B in Bit sum carry out Bit) End half adder
The entity called half adder has two input ports A and B and two out put ports sum and carry Bit is predefined type of the language
Architecture Body
An architecture body using any of the following modeling styles specifies the internal details of an entity
1 As a set of interconnected components (to represent structure)2 As a set of concurrent assignment statements (to represent data flow)3 As a set of sequential assignment statements (to represent behavior)4 As any combination of the above three
25 Structural style of modeling
In this one an entity is described as a set of interconnected components Such a model for the HALF_ADDER entity is described in a n architecture body
Architecture ha of ha isComponent Xor2 Port (X Y in BIT Zout BIT)End componentComponent And2 Port (L M in BIT NoutBIT) End component Begin X1 Xor2portmap (A B SUM) A1 AND2portmap (A B CARRY) End ha
The name of the architecture body is ha the entity declaration for half adder specifies the interface ports for this architecture body The architecture body is composed of two parts the declaration part and the statement part Two component declarations are present in the declarative part of the architecture body
The declared components are instantiated in the statement part of the architecture body using component instantiation The signals in the port map of a component instantiation and the port signals in the component declaration are associated by the position
9
DATAFLOW STYLE OF MODELING
In this modeling style the flow of data through the entity is expressed primarily using concurrent signal assignment statements The data flow model for the half adder is described using two concurrent signal assignment statements In a signal assignment statement the symbol lt=implies an assignment of a value to a signal
BEHAVIORAL STYLE OF MODELING
The behavioral style of modeling specifies the behavior of an entity as a set of statements that are executed sequentially in the specific order These sets of sequential statements which are specified inside a process statement do not explicitly specify the structure of the entity but merely its functionality A process statement is a concurrent statement that can appear with in an architecture body MIXED STYLE OF MODELING
It is possible to mix the three modeling styles in a single architecture body That is within an architecture body we could use component instantiation statements concurrent signal assignment statements and process statements
MODEL ANALYSIS
Once an entity is declared in VHDL it can be validated using analyzer and a simulator that are apart of a VHDL system The first step in the validation process is analysis The analyzer takes a file that contains one or more design units and compile s them into an intermediate form The generated intermediate form is stored in a specific design library that has been designated as the working library
There is a design library with the logic name STD predefined by the VHDL language environment This library contains two packages STANDARD and TEXTIO The STANDARD package contains declarations for all the predefined types of the language The TexTIO package contains procedures and functions that are necessary for supporting formatted text read and write operations There also exists an IEEE standard package called STD_LOGIC_1164and contains its associated sub types overloaded operator functions and other useful utilities This standard is called the IEEE STD 1164 ndash1993
SIMULATION
For a hierarchical entity to be simulated all of its lowest ndashlevel components must be described at the behavioral level A simulation can be performed on either one of the following
1 An entity declaration and an architecture body pair
2 A configuration
10
Preceding the actual simulation are two major steps
1 Elaboration phase IN this phase the hierarchy of the entity is expanded and linked components are bound to entities in a library and the top- level entity is built as a network of behavioral models that is ready to be simulated
2 Initialization phase Driving and effective values for all explicitly declared signals are computed implicit signals are assigned values processes are executed once until they suspend and simulation time is set to 0ns
Simulation commences by advancing time to that of the next event Values that are assigned to signals at this time are assigned If the value of a signal changes and if that signal is present in the sensitivity list of a process the process is executed until it suspends Simulation stops when an assertion occurs depending on the implementation of the VHDL system or when the maximum time as defined by the language is reached
Entity Declaration
An entity declaration describes the external interface of the entity It specifies the name of the entity the names of the interface ports their mode and the type of ports The syntax for entity declaration is
Entity entity _name is [generic (list of ndashgenerics and ndashtheir types)] [port (list of interface-port-names-and their types )]
[entity item declarations] [begin entity statements] end [entity][entity name]
The entity ndashname is the name of the entity and the interface ports are the signals through which entity passes the information to and from its external environment Each interface port can have one of the following modes
1 in The value of an input port can only read with in the entity model 2 out The value of an out put port can only be updated within the entity model3 inout The value of a bi directional port can be read and updated within the entity
model4 buffer The value of a buffer port can be read and updated within the entity
model It cannot have more than one source
Declarations that are placed in the entity are common to all the design units that are associated with that entity declaration
ARCHITECTURE BODY
An architecture body describes the internal view of an entity It describes the functionality of the structure of the entity
11
Architecture ltarchitecture namegt oflt entity namegt is Begin Concurrent statements Process statements Block statements Concurrent signal assignment-statement Component ndashinstantiation-statement Generate statement End [architecture] [architecture name]
The concurrent statements describe the internal composition of the entity All
concurrent statements are executed in parallel The internal composition of an entity can be expressed in terms of structure dataflow and sequential behavior
Here we describe an entity by using the behavioral model A process statement which is a concurrent statement is the primary mechanism used to describe the functionality of an entity in this modeling style
26 PROCESS STATEMENT
A process statement contains sequential statements that describe the functionality of a portion of an entity in sequential terms The syntax for the process statement is
[Process-label] process [(sensitivity-list)] [is] begin sequential statements variable-assignment-statement signal assignment-statement wait statement if-statement case-statement loop-statement null-statement exit-statement next-statement assertion-statement report-statement procedure-call-statement return end process [process label]
A set of signals to which the process is sensitive is defined by the sensitivity list In other words each time an event occurs on any of the signals in the sensitivity list the sequential statements with in the process are executed in a sequential order that is in the order in which they appear The process then suspends after executing the last sequential statement and waits for another event to occur on a signal in the sensitivity list
12
VARIABLE ASSIGNMENT STAEMENT
Variables can be declared and used inside a process statement A variable is assigned a value using the variable assignment statement that typically has the form
Variable-object = expression
The expression is evaluated when the statement is executed and the computed value is assigned to the variable object instantaneously that is at the concurrent simulation time
A variable can be declared outside of a process or subprogram Such a variable can be read and updated by more than one process These variables are called shared variables
SIGNAL ASSIGNMENT STATEMENT
Signals are assigned values using a signal assignment statement The simplest form of a signal assignment statement is
Signal-object lt= expression [after a delay value]
A signal assignment statement can appear within a process or outside of a process If it occurs outside of a process it is considered to be a concurrent signal assignment statement
When a signal assignment statement appears with in a process it is considered to be a sequential signal assignment statement and is executed in sequences with respect to the other statements which appear with in the process
27 CONDITIONAL STATEMENTS
IF STATEMENT An if statement selects a sequence of statements for execution of statements for execution based on the value of a condition the condition The condition can be any expression that evaluates to a Boolean value The general form of an if statement is
If Boolean expression thenSequential statementselsif Boolean-expression thenSequential-statements[else sequential statements] end if
The if statement is executed by checking each condition sequentially until the first true condition is found the set of sequential statements associated with this condition is executed An if statement is also a sequential statement
13
CASE STATEMENT
The format of a case statement is Case expression is
When choices =gtsequential statementsWhen choices =gtsequential statements End case
The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement
LOOP STATEMENTS
A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is
[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]
14
28 Active HDL Overview
Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries
Standards Supported
VHDL
The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard
Verilog
The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL
EDIF
Active-HDL supports Electronic Design Interchange Format version 2 0 0
VITAL
The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21
WAVES
Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system
15
29 ACTIVE-HDL Macro Language
All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)
1 HDL Editor
HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts
2 Block Diagram Editor
Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
3 State Diagram Editor
State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
4 Waveform Editor
Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors
5 Design Browser
The Design Browser window displays the contents of the current design that is
Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the
current design
16
6 Console window
The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console
210 Compilation
Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following
VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)
In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram
A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection
Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled
211 Simulation
The purpose of simulation is to verify that the circuit works as desired
The Active-HDL simulator provides two simulation engines
Event-Driven Simulation Cycle-Based Simulation
17
The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven
212 XILINX
Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish
ISE enables you to start your design with any of a number of different source types including
HDL (VHDL Verilog HDL ABEL)
Schematic design files
EDIF
NGCNGO
State Machines
IP Cores
From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration
Design Entry
ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports
18
Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow
CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories
Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints
PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints
State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL
Implementation
Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file
Map - The Map program maps a logical design to a Xilinx FPGA
Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator
Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design
FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing
Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design
Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file
Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments
19
Device Download and Program File Formatting
BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration
iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device
XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices
Integration with ChipScope Pro
CH 3 Introduction to SEA
Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication
In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle
20
SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo
Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems
For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted
31 Specifications
Parameters and Definitions
SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds
As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0
Basic Operations
21
Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _
These operations are formally defined as follows
1 Bitwise XOR
The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1
2 Substitution Box S
SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition
S Znb2b rarr Znb
2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR
Word Rotation R
The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1
Bit Rotation r
The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word
22
Addition mod2b _
The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1
The Round and Key Round
Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that
[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri
[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri
[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi
23
R
R-1
r S
ki
Li Ri
Li+1 Ri+1
KLi KRi
R r S
Ci
KLi+1
KRi+1
FIG 31 Encryptdecrypt round and key round
The Complete Cipher
The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K
key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))
encryptionfor i in 1 to nr2
[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)
24
finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD
32 Design Properties of the Components
Substitution Box S
The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table
Bit and Word Rotations r and R
The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3
The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds
Addition mod 2b _
Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in
25
most processors (4) necessity to avoid structural attacks
33 Overall Structure
The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round
34 Security Analysis
Resistance Against Known Attacks
Linear and Differential Cryptanalysis
From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round
Extensions of Linear and Differential Cryptanalysis
Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear
26
cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack
However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC
It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks
As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account
A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a
Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a
These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and
27
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
EXAMPLE
Entity declaration for the half adder circuit is
Entity half adder is Port (A B in Bit sum carry out Bit) End half adder
The entity called half adder has two input ports A and B and two out put ports sum and carry Bit is predefined type of the language
Architecture Body
An architecture body using any of the following modeling styles specifies the internal details of an entity
1 As a set of interconnected components (to represent structure)2 As a set of concurrent assignment statements (to represent data flow)3 As a set of sequential assignment statements (to represent behavior)4 As any combination of the above three
25 Structural style of modeling
In this one an entity is described as a set of interconnected components Such a model for the HALF_ADDER entity is described in a n architecture body
Architecture ha of ha isComponent Xor2 Port (X Y in BIT Zout BIT)End componentComponent And2 Port (L M in BIT NoutBIT) End component Begin X1 Xor2portmap (A B SUM) A1 AND2portmap (A B CARRY) End ha
The name of the architecture body is ha the entity declaration for half adder specifies the interface ports for this architecture body The architecture body is composed of two parts the declaration part and the statement part Two component declarations are present in the declarative part of the architecture body
The declared components are instantiated in the statement part of the architecture body using component instantiation The signals in the port map of a component instantiation and the port signals in the component declaration are associated by the position
9
DATAFLOW STYLE OF MODELING
In this modeling style the flow of data through the entity is expressed primarily using concurrent signal assignment statements The data flow model for the half adder is described using two concurrent signal assignment statements In a signal assignment statement the symbol lt=implies an assignment of a value to a signal
BEHAVIORAL STYLE OF MODELING
The behavioral style of modeling specifies the behavior of an entity as a set of statements that are executed sequentially in the specific order These sets of sequential statements which are specified inside a process statement do not explicitly specify the structure of the entity but merely its functionality A process statement is a concurrent statement that can appear with in an architecture body MIXED STYLE OF MODELING
It is possible to mix the three modeling styles in a single architecture body That is within an architecture body we could use component instantiation statements concurrent signal assignment statements and process statements
MODEL ANALYSIS
Once an entity is declared in VHDL it can be validated using analyzer and a simulator that are apart of a VHDL system The first step in the validation process is analysis The analyzer takes a file that contains one or more design units and compile s them into an intermediate form The generated intermediate form is stored in a specific design library that has been designated as the working library
There is a design library with the logic name STD predefined by the VHDL language environment This library contains two packages STANDARD and TEXTIO The STANDARD package contains declarations for all the predefined types of the language The TexTIO package contains procedures and functions that are necessary for supporting formatted text read and write operations There also exists an IEEE standard package called STD_LOGIC_1164and contains its associated sub types overloaded operator functions and other useful utilities This standard is called the IEEE STD 1164 ndash1993
SIMULATION
For a hierarchical entity to be simulated all of its lowest ndashlevel components must be described at the behavioral level A simulation can be performed on either one of the following
1 An entity declaration and an architecture body pair
2 A configuration
10
Preceding the actual simulation are two major steps
1 Elaboration phase IN this phase the hierarchy of the entity is expanded and linked components are bound to entities in a library and the top- level entity is built as a network of behavioral models that is ready to be simulated
2 Initialization phase Driving and effective values for all explicitly declared signals are computed implicit signals are assigned values processes are executed once until they suspend and simulation time is set to 0ns
Simulation commences by advancing time to that of the next event Values that are assigned to signals at this time are assigned If the value of a signal changes and if that signal is present in the sensitivity list of a process the process is executed until it suspends Simulation stops when an assertion occurs depending on the implementation of the VHDL system or when the maximum time as defined by the language is reached
Entity Declaration
An entity declaration describes the external interface of the entity It specifies the name of the entity the names of the interface ports their mode and the type of ports The syntax for entity declaration is
Entity entity _name is [generic (list of ndashgenerics and ndashtheir types)] [port (list of interface-port-names-and their types )]
[entity item declarations] [begin entity statements] end [entity][entity name]
The entity ndashname is the name of the entity and the interface ports are the signals through which entity passes the information to and from its external environment Each interface port can have one of the following modes
1 in The value of an input port can only read with in the entity model 2 out The value of an out put port can only be updated within the entity model3 inout The value of a bi directional port can be read and updated within the entity
model4 buffer The value of a buffer port can be read and updated within the entity
model It cannot have more than one source
Declarations that are placed in the entity are common to all the design units that are associated with that entity declaration
ARCHITECTURE BODY
An architecture body describes the internal view of an entity It describes the functionality of the structure of the entity
11
Architecture ltarchitecture namegt oflt entity namegt is Begin Concurrent statements Process statements Block statements Concurrent signal assignment-statement Component ndashinstantiation-statement Generate statement End [architecture] [architecture name]
The concurrent statements describe the internal composition of the entity All
concurrent statements are executed in parallel The internal composition of an entity can be expressed in terms of structure dataflow and sequential behavior
Here we describe an entity by using the behavioral model A process statement which is a concurrent statement is the primary mechanism used to describe the functionality of an entity in this modeling style
26 PROCESS STATEMENT
A process statement contains sequential statements that describe the functionality of a portion of an entity in sequential terms The syntax for the process statement is
[Process-label] process [(sensitivity-list)] [is] begin sequential statements variable-assignment-statement signal assignment-statement wait statement if-statement case-statement loop-statement null-statement exit-statement next-statement assertion-statement report-statement procedure-call-statement return end process [process label]
A set of signals to which the process is sensitive is defined by the sensitivity list In other words each time an event occurs on any of the signals in the sensitivity list the sequential statements with in the process are executed in a sequential order that is in the order in which they appear The process then suspends after executing the last sequential statement and waits for another event to occur on a signal in the sensitivity list
12
VARIABLE ASSIGNMENT STAEMENT
Variables can be declared and used inside a process statement A variable is assigned a value using the variable assignment statement that typically has the form
Variable-object = expression
The expression is evaluated when the statement is executed and the computed value is assigned to the variable object instantaneously that is at the concurrent simulation time
A variable can be declared outside of a process or subprogram Such a variable can be read and updated by more than one process These variables are called shared variables
SIGNAL ASSIGNMENT STATEMENT
Signals are assigned values using a signal assignment statement The simplest form of a signal assignment statement is
Signal-object lt= expression [after a delay value]
A signal assignment statement can appear within a process or outside of a process If it occurs outside of a process it is considered to be a concurrent signal assignment statement
When a signal assignment statement appears with in a process it is considered to be a sequential signal assignment statement and is executed in sequences with respect to the other statements which appear with in the process
27 CONDITIONAL STATEMENTS
IF STATEMENT An if statement selects a sequence of statements for execution of statements for execution based on the value of a condition the condition The condition can be any expression that evaluates to a Boolean value The general form of an if statement is
If Boolean expression thenSequential statementselsif Boolean-expression thenSequential-statements[else sequential statements] end if
The if statement is executed by checking each condition sequentially until the first true condition is found the set of sequential statements associated with this condition is executed An if statement is also a sequential statement
13
CASE STATEMENT
The format of a case statement is Case expression is
When choices =gtsequential statementsWhen choices =gtsequential statements End case
The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement
LOOP STATEMENTS
A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is
[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]
14
28 Active HDL Overview
Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries
Standards Supported
VHDL
The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard
Verilog
The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL
EDIF
Active-HDL supports Electronic Design Interchange Format version 2 0 0
VITAL
The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21
WAVES
Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system
15
29 ACTIVE-HDL Macro Language
All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)
1 HDL Editor
HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts
2 Block Diagram Editor
Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
3 State Diagram Editor
State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
4 Waveform Editor
Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors
5 Design Browser
The Design Browser window displays the contents of the current design that is
Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the
current design
16
6 Console window
The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console
210 Compilation
Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following
VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)
In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram
A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection
Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled
211 Simulation
The purpose of simulation is to verify that the circuit works as desired
The Active-HDL simulator provides two simulation engines
Event-Driven Simulation Cycle-Based Simulation
17
The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven
212 XILINX
Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish
ISE enables you to start your design with any of a number of different source types including
HDL (VHDL Verilog HDL ABEL)
Schematic design files
EDIF
NGCNGO
State Machines
IP Cores
From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration
Design Entry
ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports
18
Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow
CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories
Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints
PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints
State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL
Implementation
Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file
Map - The Map program maps a logical design to a Xilinx FPGA
Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator
Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design
FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing
Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design
Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file
Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments
19
Device Download and Program File Formatting
BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration
iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device
XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices
Integration with ChipScope Pro
CH 3 Introduction to SEA
Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication
In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle
20
SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo
Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems
For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted
31 Specifications
Parameters and Definitions
SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds
As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0
Basic Operations
21
Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _
These operations are formally defined as follows
1 Bitwise XOR
The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1
2 Substitution Box S
SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition
S Znb2b rarr Znb
2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR
Word Rotation R
The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1
Bit Rotation r
The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word
22
Addition mod2b _
The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1
The Round and Key Round
Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that
[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri
[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri
[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi
23
R
R-1
r S
ki
Li Ri
Li+1 Ri+1
KLi KRi
R r S
Ci
KLi+1
KRi+1
FIG 31 Encryptdecrypt round and key round
The Complete Cipher
The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K
key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))
encryptionfor i in 1 to nr2
[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)
24
finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD
32 Design Properties of the Components
Substitution Box S
The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table
Bit and Word Rotations r and R
The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3
The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds
Addition mod 2b _
Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in
25
most processors (4) necessity to avoid structural attacks
33 Overall Structure
The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round
34 Security Analysis
Resistance Against Known Attacks
Linear and Differential Cryptanalysis
From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round
Extensions of Linear and Differential Cryptanalysis
Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear
26
cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack
However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC
It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks
As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account
A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a
Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a
These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and
27
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
DATAFLOW STYLE OF MODELING
In this modeling style the flow of data through the entity is expressed primarily using concurrent signal assignment statements The data flow model for the half adder is described using two concurrent signal assignment statements In a signal assignment statement the symbol lt=implies an assignment of a value to a signal
BEHAVIORAL STYLE OF MODELING
The behavioral style of modeling specifies the behavior of an entity as a set of statements that are executed sequentially in the specific order These sets of sequential statements which are specified inside a process statement do not explicitly specify the structure of the entity but merely its functionality A process statement is a concurrent statement that can appear with in an architecture body MIXED STYLE OF MODELING
It is possible to mix the three modeling styles in a single architecture body That is within an architecture body we could use component instantiation statements concurrent signal assignment statements and process statements
MODEL ANALYSIS
Once an entity is declared in VHDL it can be validated using analyzer and a simulator that are apart of a VHDL system The first step in the validation process is analysis The analyzer takes a file that contains one or more design units and compile s them into an intermediate form The generated intermediate form is stored in a specific design library that has been designated as the working library
There is a design library with the logic name STD predefined by the VHDL language environment This library contains two packages STANDARD and TEXTIO The STANDARD package contains declarations for all the predefined types of the language The TexTIO package contains procedures and functions that are necessary for supporting formatted text read and write operations There also exists an IEEE standard package called STD_LOGIC_1164and contains its associated sub types overloaded operator functions and other useful utilities This standard is called the IEEE STD 1164 ndash1993
SIMULATION
For a hierarchical entity to be simulated all of its lowest ndashlevel components must be described at the behavioral level A simulation can be performed on either one of the following
1 An entity declaration and an architecture body pair
2 A configuration
10
Preceding the actual simulation are two major steps
1 Elaboration phase IN this phase the hierarchy of the entity is expanded and linked components are bound to entities in a library and the top- level entity is built as a network of behavioral models that is ready to be simulated
2 Initialization phase Driving and effective values for all explicitly declared signals are computed implicit signals are assigned values processes are executed once until they suspend and simulation time is set to 0ns
Simulation commences by advancing time to that of the next event Values that are assigned to signals at this time are assigned If the value of a signal changes and if that signal is present in the sensitivity list of a process the process is executed until it suspends Simulation stops when an assertion occurs depending on the implementation of the VHDL system or when the maximum time as defined by the language is reached
Entity Declaration
An entity declaration describes the external interface of the entity It specifies the name of the entity the names of the interface ports their mode and the type of ports The syntax for entity declaration is
Entity entity _name is [generic (list of ndashgenerics and ndashtheir types)] [port (list of interface-port-names-and their types )]
[entity item declarations] [begin entity statements] end [entity][entity name]
The entity ndashname is the name of the entity and the interface ports are the signals through which entity passes the information to and from its external environment Each interface port can have one of the following modes
1 in The value of an input port can only read with in the entity model 2 out The value of an out put port can only be updated within the entity model3 inout The value of a bi directional port can be read and updated within the entity
model4 buffer The value of a buffer port can be read and updated within the entity
model It cannot have more than one source
Declarations that are placed in the entity are common to all the design units that are associated with that entity declaration
ARCHITECTURE BODY
An architecture body describes the internal view of an entity It describes the functionality of the structure of the entity
11
Architecture ltarchitecture namegt oflt entity namegt is Begin Concurrent statements Process statements Block statements Concurrent signal assignment-statement Component ndashinstantiation-statement Generate statement End [architecture] [architecture name]
The concurrent statements describe the internal composition of the entity All
concurrent statements are executed in parallel The internal composition of an entity can be expressed in terms of structure dataflow and sequential behavior
Here we describe an entity by using the behavioral model A process statement which is a concurrent statement is the primary mechanism used to describe the functionality of an entity in this modeling style
26 PROCESS STATEMENT
A process statement contains sequential statements that describe the functionality of a portion of an entity in sequential terms The syntax for the process statement is
[Process-label] process [(sensitivity-list)] [is] begin sequential statements variable-assignment-statement signal assignment-statement wait statement if-statement case-statement loop-statement null-statement exit-statement next-statement assertion-statement report-statement procedure-call-statement return end process [process label]
A set of signals to which the process is sensitive is defined by the sensitivity list In other words each time an event occurs on any of the signals in the sensitivity list the sequential statements with in the process are executed in a sequential order that is in the order in which they appear The process then suspends after executing the last sequential statement and waits for another event to occur on a signal in the sensitivity list
12
VARIABLE ASSIGNMENT STAEMENT
Variables can be declared and used inside a process statement A variable is assigned a value using the variable assignment statement that typically has the form
Variable-object = expression
The expression is evaluated when the statement is executed and the computed value is assigned to the variable object instantaneously that is at the concurrent simulation time
A variable can be declared outside of a process or subprogram Such a variable can be read and updated by more than one process These variables are called shared variables
SIGNAL ASSIGNMENT STATEMENT
Signals are assigned values using a signal assignment statement The simplest form of a signal assignment statement is
Signal-object lt= expression [after a delay value]
A signal assignment statement can appear within a process or outside of a process If it occurs outside of a process it is considered to be a concurrent signal assignment statement
When a signal assignment statement appears with in a process it is considered to be a sequential signal assignment statement and is executed in sequences with respect to the other statements which appear with in the process
27 CONDITIONAL STATEMENTS
IF STATEMENT An if statement selects a sequence of statements for execution of statements for execution based on the value of a condition the condition The condition can be any expression that evaluates to a Boolean value The general form of an if statement is
If Boolean expression thenSequential statementselsif Boolean-expression thenSequential-statements[else sequential statements] end if
The if statement is executed by checking each condition sequentially until the first true condition is found the set of sequential statements associated with this condition is executed An if statement is also a sequential statement
13
CASE STATEMENT
The format of a case statement is Case expression is
When choices =gtsequential statementsWhen choices =gtsequential statements End case
The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement
LOOP STATEMENTS
A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is
[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]
14
28 Active HDL Overview
Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries
Standards Supported
VHDL
The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard
Verilog
The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL
EDIF
Active-HDL supports Electronic Design Interchange Format version 2 0 0
VITAL
The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21
WAVES
Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system
15
29 ACTIVE-HDL Macro Language
All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)
1 HDL Editor
HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts
2 Block Diagram Editor
Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
3 State Diagram Editor
State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
4 Waveform Editor
Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors
5 Design Browser
The Design Browser window displays the contents of the current design that is
Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the
current design
16
6 Console window
The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console
210 Compilation
Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following
VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)
In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram
A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection
Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled
211 Simulation
The purpose of simulation is to verify that the circuit works as desired
The Active-HDL simulator provides two simulation engines
Event-Driven Simulation Cycle-Based Simulation
17
The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven
212 XILINX
Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish
ISE enables you to start your design with any of a number of different source types including
HDL (VHDL Verilog HDL ABEL)
Schematic design files
EDIF
NGCNGO
State Machines
IP Cores
From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration
Design Entry
ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports
18
Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow
CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories
Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints
PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints
State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL
Implementation
Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file
Map - The Map program maps a logical design to a Xilinx FPGA
Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator
Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design
FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing
Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design
Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file
Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments
19
Device Download and Program File Formatting
BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration
iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device
XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices
Integration with ChipScope Pro
CH 3 Introduction to SEA
Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication
In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle
20
SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo
Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems
For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted
31 Specifications
Parameters and Definitions
SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds
As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0
Basic Operations
21
Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _
These operations are formally defined as follows
1 Bitwise XOR
The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1
2 Substitution Box S
SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition
S Znb2b rarr Znb
2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR
Word Rotation R
The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1
Bit Rotation r
The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word
22
Addition mod2b _
The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1
The Round and Key Round
Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that
[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri
[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri
[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi
23
R
R-1
r S
ki
Li Ri
Li+1 Ri+1
KLi KRi
R r S
Ci
KLi+1
KRi+1
FIG 31 Encryptdecrypt round and key round
The Complete Cipher
The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K
key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))
encryptionfor i in 1 to nr2
[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)
24
finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD
32 Design Properties of the Components
Substitution Box S
The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table
Bit and Word Rotations r and R
The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3
The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds
Addition mod 2b _
Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in
25
most processors (4) necessity to avoid structural attacks
33 Overall Structure
The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round
34 Security Analysis
Resistance Against Known Attacks
Linear and Differential Cryptanalysis
From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round
Extensions of Linear and Differential Cryptanalysis
Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear
26
cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack
However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC
It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks
As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account
A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a
Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a
These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and
27
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
Preceding the actual simulation are two major steps
1 Elaboration phase IN this phase the hierarchy of the entity is expanded and linked components are bound to entities in a library and the top- level entity is built as a network of behavioral models that is ready to be simulated
2 Initialization phase Driving and effective values for all explicitly declared signals are computed implicit signals are assigned values processes are executed once until they suspend and simulation time is set to 0ns
Simulation commences by advancing time to that of the next event Values that are assigned to signals at this time are assigned If the value of a signal changes and if that signal is present in the sensitivity list of a process the process is executed until it suspends Simulation stops when an assertion occurs depending on the implementation of the VHDL system or when the maximum time as defined by the language is reached
Entity Declaration
An entity declaration describes the external interface of the entity It specifies the name of the entity the names of the interface ports their mode and the type of ports The syntax for entity declaration is
Entity entity _name is [generic (list of ndashgenerics and ndashtheir types)] [port (list of interface-port-names-and their types )]
[entity item declarations] [begin entity statements] end [entity][entity name]
The entity ndashname is the name of the entity and the interface ports are the signals through which entity passes the information to and from its external environment Each interface port can have one of the following modes
1 in The value of an input port can only read with in the entity model 2 out The value of an out put port can only be updated within the entity model3 inout The value of a bi directional port can be read and updated within the entity
model4 buffer The value of a buffer port can be read and updated within the entity
model It cannot have more than one source
Declarations that are placed in the entity are common to all the design units that are associated with that entity declaration
ARCHITECTURE BODY
An architecture body describes the internal view of an entity It describes the functionality of the structure of the entity
11
Architecture ltarchitecture namegt oflt entity namegt is Begin Concurrent statements Process statements Block statements Concurrent signal assignment-statement Component ndashinstantiation-statement Generate statement End [architecture] [architecture name]
The concurrent statements describe the internal composition of the entity All
concurrent statements are executed in parallel The internal composition of an entity can be expressed in terms of structure dataflow and sequential behavior
Here we describe an entity by using the behavioral model A process statement which is a concurrent statement is the primary mechanism used to describe the functionality of an entity in this modeling style
26 PROCESS STATEMENT
A process statement contains sequential statements that describe the functionality of a portion of an entity in sequential terms The syntax for the process statement is
[Process-label] process [(sensitivity-list)] [is] begin sequential statements variable-assignment-statement signal assignment-statement wait statement if-statement case-statement loop-statement null-statement exit-statement next-statement assertion-statement report-statement procedure-call-statement return end process [process label]
A set of signals to which the process is sensitive is defined by the sensitivity list In other words each time an event occurs on any of the signals in the sensitivity list the sequential statements with in the process are executed in a sequential order that is in the order in which they appear The process then suspends after executing the last sequential statement and waits for another event to occur on a signal in the sensitivity list
12
VARIABLE ASSIGNMENT STAEMENT
Variables can be declared and used inside a process statement A variable is assigned a value using the variable assignment statement that typically has the form
Variable-object = expression
The expression is evaluated when the statement is executed and the computed value is assigned to the variable object instantaneously that is at the concurrent simulation time
A variable can be declared outside of a process or subprogram Such a variable can be read and updated by more than one process These variables are called shared variables
SIGNAL ASSIGNMENT STATEMENT
Signals are assigned values using a signal assignment statement The simplest form of a signal assignment statement is
Signal-object lt= expression [after a delay value]
A signal assignment statement can appear within a process or outside of a process If it occurs outside of a process it is considered to be a concurrent signal assignment statement
When a signal assignment statement appears with in a process it is considered to be a sequential signal assignment statement and is executed in sequences with respect to the other statements which appear with in the process
27 CONDITIONAL STATEMENTS
IF STATEMENT An if statement selects a sequence of statements for execution of statements for execution based on the value of a condition the condition The condition can be any expression that evaluates to a Boolean value The general form of an if statement is
If Boolean expression thenSequential statementselsif Boolean-expression thenSequential-statements[else sequential statements] end if
The if statement is executed by checking each condition sequentially until the first true condition is found the set of sequential statements associated with this condition is executed An if statement is also a sequential statement
13
CASE STATEMENT
The format of a case statement is Case expression is
When choices =gtsequential statementsWhen choices =gtsequential statements End case
The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement
LOOP STATEMENTS
A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is
[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]
14
28 Active HDL Overview
Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries
Standards Supported
VHDL
The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard
Verilog
The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL
EDIF
Active-HDL supports Electronic Design Interchange Format version 2 0 0
VITAL
The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21
WAVES
Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system
15
29 ACTIVE-HDL Macro Language
All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)
1 HDL Editor
HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts
2 Block Diagram Editor
Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
3 State Diagram Editor
State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
4 Waveform Editor
Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors
5 Design Browser
The Design Browser window displays the contents of the current design that is
Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the
current design
16
6 Console window
The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console
210 Compilation
Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following
VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)
In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram
A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection
Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled
211 Simulation
The purpose of simulation is to verify that the circuit works as desired
The Active-HDL simulator provides two simulation engines
Event-Driven Simulation Cycle-Based Simulation
17
The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven
212 XILINX
Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish
ISE enables you to start your design with any of a number of different source types including
HDL (VHDL Verilog HDL ABEL)
Schematic design files
EDIF
NGCNGO
State Machines
IP Cores
From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration
Design Entry
ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports
18
Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow
CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories
Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints
PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints
State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL
Implementation
Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file
Map - The Map program maps a logical design to a Xilinx FPGA
Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator
Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design
FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing
Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design
Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file
Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments
19
Device Download and Program File Formatting
BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration
iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device
XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices
Integration with ChipScope Pro
CH 3 Introduction to SEA
Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication
In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle
20
SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo
Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems
For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted
31 Specifications
Parameters and Definitions
SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds
As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0
Basic Operations
21
Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _
These operations are formally defined as follows
1 Bitwise XOR
The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1
2 Substitution Box S
SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition
S Znb2b rarr Znb
2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR
Word Rotation R
The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1
Bit Rotation r
The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word
22
Addition mod2b _
The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1
The Round and Key Round
Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that
[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri
[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri
[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi
23
R
R-1
r S
ki
Li Ri
Li+1 Ri+1
KLi KRi
R r S
Ci
KLi+1
KRi+1
FIG 31 Encryptdecrypt round and key round
The Complete Cipher
The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K
key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))
encryptionfor i in 1 to nr2
[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)
24
finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD
32 Design Properties of the Components
Substitution Box S
The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table
Bit and Word Rotations r and R
The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3
The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds
Addition mod 2b _
Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in
25
most processors (4) necessity to avoid structural attacks
33 Overall Structure
The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round
34 Security Analysis
Resistance Against Known Attacks
Linear and Differential Cryptanalysis
From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round
Extensions of Linear and Differential Cryptanalysis
Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear
26
cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack
However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC
It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks
As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account
A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a
Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a
These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and
27
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
Architecture ltarchitecture namegt oflt entity namegt is Begin Concurrent statements Process statements Block statements Concurrent signal assignment-statement Component ndashinstantiation-statement Generate statement End [architecture] [architecture name]
The concurrent statements describe the internal composition of the entity All
concurrent statements are executed in parallel The internal composition of an entity can be expressed in terms of structure dataflow and sequential behavior
Here we describe an entity by using the behavioral model A process statement which is a concurrent statement is the primary mechanism used to describe the functionality of an entity in this modeling style
26 PROCESS STATEMENT
A process statement contains sequential statements that describe the functionality of a portion of an entity in sequential terms The syntax for the process statement is
[Process-label] process [(sensitivity-list)] [is] begin sequential statements variable-assignment-statement signal assignment-statement wait statement if-statement case-statement loop-statement null-statement exit-statement next-statement assertion-statement report-statement procedure-call-statement return end process [process label]
A set of signals to which the process is sensitive is defined by the sensitivity list In other words each time an event occurs on any of the signals in the sensitivity list the sequential statements with in the process are executed in a sequential order that is in the order in which they appear The process then suspends after executing the last sequential statement and waits for another event to occur on a signal in the sensitivity list
12
VARIABLE ASSIGNMENT STAEMENT
Variables can be declared and used inside a process statement A variable is assigned a value using the variable assignment statement that typically has the form
Variable-object = expression
The expression is evaluated when the statement is executed and the computed value is assigned to the variable object instantaneously that is at the concurrent simulation time
A variable can be declared outside of a process or subprogram Such a variable can be read and updated by more than one process These variables are called shared variables
SIGNAL ASSIGNMENT STATEMENT
Signals are assigned values using a signal assignment statement The simplest form of a signal assignment statement is
Signal-object lt= expression [after a delay value]
A signal assignment statement can appear within a process or outside of a process If it occurs outside of a process it is considered to be a concurrent signal assignment statement
When a signal assignment statement appears with in a process it is considered to be a sequential signal assignment statement and is executed in sequences with respect to the other statements which appear with in the process
27 CONDITIONAL STATEMENTS
IF STATEMENT An if statement selects a sequence of statements for execution of statements for execution based on the value of a condition the condition The condition can be any expression that evaluates to a Boolean value The general form of an if statement is
If Boolean expression thenSequential statementselsif Boolean-expression thenSequential-statements[else sequential statements] end if
The if statement is executed by checking each condition sequentially until the first true condition is found the set of sequential statements associated with this condition is executed An if statement is also a sequential statement
13
CASE STATEMENT
The format of a case statement is Case expression is
When choices =gtsequential statementsWhen choices =gtsequential statements End case
The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement
LOOP STATEMENTS
A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is
[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]
14
28 Active HDL Overview
Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries
Standards Supported
VHDL
The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard
Verilog
The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL
EDIF
Active-HDL supports Electronic Design Interchange Format version 2 0 0
VITAL
The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21
WAVES
Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system
15
29 ACTIVE-HDL Macro Language
All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)
1 HDL Editor
HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts
2 Block Diagram Editor
Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
3 State Diagram Editor
State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
4 Waveform Editor
Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors
5 Design Browser
The Design Browser window displays the contents of the current design that is
Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the
current design
16
6 Console window
The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console
210 Compilation
Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following
VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)
In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram
A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection
Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled
211 Simulation
The purpose of simulation is to verify that the circuit works as desired
The Active-HDL simulator provides two simulation engines
Event-Driven Simulation Cycle-Based Simulation
17
The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven
212 XILINX
Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish
ISE enables you to start your design with any of a number of different source types including
HDL (VHDL Verilog HDL ABEL)
Schematic design files
EDIF
NGCNGO
State Machines
IP Cores
From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration
Design Entry
ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports
18
Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow
CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories
Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints
PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints
State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL
Implementation
Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file
Map - The Map program maps a logical design to a Xilinx FPGA
Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator
Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design
FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing
Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design
Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file
Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments
19
Device Download and Program File Formatting
BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration
iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device
XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices
Integration with ChipScope Pro
CH 3 Introduction to SEA
Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication
In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle
20
SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo
Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems
For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted
31 Specifications
Parameters and Definitions
SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds
As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0
Basic Operations
21
Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _
These operations are formally defined as follows
1 Bitwise XOR
The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1
2 Substitution Box S
SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition
S Znb2b rarr Znb
2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR
Word Rotation R
The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1
Bit Rotation r
The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word
22
Addition mod2b _
The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1
The Round and Key Round
Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that
[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri
[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri
[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi
23
R
R-1
r S
ki
Li Ri
Li+1 Ri+1
KLi KRi
R r S
Ci
KLi+1
KRi+1
FIG 31 Encryptdecrypt round and key round
The Complete Cipher
The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K
key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))
encryptionfor i in 1 to nr2
[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)
24
finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD
32 Design Properties of the Components
Substitution Box S
The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table
Bit and Word Rotations r and R
The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3
The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds
Addition mod 2b _
Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in
25
most processors (4) necessity to avoid structural attacks
33 Overall Structure
The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round
34 Security Analysis
Resistance Against Known Attacks
Linear and Differential Cryptanalysis
From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round
Extensions of Linear and Differential Cryptanalysis
Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear
26
cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack
However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC
It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks
As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account
A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a
Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a
These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and
27
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
VARIABLE ASSIGNMENT STAEMENT
Variables can be declared and used inside a process statement A variable is assigned a value using the variable assignment statement that typically has the form
Variable-object = expression
The expression is evaluated when the statement is executed and the computed value is assigned to the variable object instantaneously that is at the concurrent simulation time
A variable can be declared outside of a process or subprogram Such a variable can be read and updated by more than one process These variables are called shared variables
SIGNAL ASSIGNMENT STATEMENT
Signals are assigned values using a signal assignment statement The simplest form of a signal assignment statement is
Signal-object lt= expression [after a delay value]
A signal assignment statement can appear within a process or outside of a process If it occurs outside of a process it is considered to be a concurrent signal assignment statement
When a signal assignment statement appears with in a process it is considered to be a sequential signal assignment statement and is executed in sequences with respect to the other statements which appear with in the process
27 CONDITIONAL STATEMENTS
IF STATEMENT An if statement selects a sequence of statements for execution of statements for execution based on the value of a condition the condition The condition can be any expression that evaluates to a Boolean value The general form of an if statement is
If Boolean expression thenSequential statementselsif Boolean-expression thenSequential-statements[else sequential statements] end if
The if statement is executed by checking each condition sequentially until the first true condition is found the set of sequential statements associated with this condition is executed An if statement is also a sequential statement
13
CASE STATEMENT
The format of a case statement is Case expression is
When choices =gtsequential statementsWhen choices =gtsequential statements End case
The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement
LOOP STATEMENTS
A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is
[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]
14
28 Active HDL Overview
Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries
Standards Supported
VHDL
The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard
Verilog
The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL
EDIF
Active-HDL supports Electronic Design Interchange Format version 2 0 0
VITAL
The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21
WAVES
Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system
15
29 ACTIVE-HDL Macro Language
All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)
1 HDL Editor
HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts
2 Block Diagram Editor
Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
3 State Diagram Editor
State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
4 Waveform Editor
Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors
5 Design Browser
The Design Browser window displays the contents of the current design that is
Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the
current design
16
6 Console window
The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console
210 Compilation
Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following
VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)
In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram
A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection
Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled
211 Simulation
The purpose of simulation is to verify that the circuit works as desired
The Active-HDL simulator provides two simulation engines
Event-Driven Simulation Cycle-Based Simulation
17
The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven
212 XILINX
Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish
ISE enables you to start your design with any of a number of different source types including
HDL (VHDL Verilog HDL ABEL)
Schematic design files
EDIF
NGCNGO
State Machines
IP Cores
From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration
Design Entry
ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports
18
Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow
CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories
Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints
PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints
State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL
Implementation
Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file
Map - The Map program maps a logical design to a Xilinx FPGA
Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator
Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design
FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing
Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design
Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file
Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments
19
Device Download and Program File Formatting
BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration
iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device
XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices
Integration with ChipScope Pro
CH 3 Introduction to SEA
Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication
In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle
20
SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo
Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems
For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted
31 Specifications
Parameters and Definitions
SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds
As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0
Basic Operations
21
Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _
These operations are formally defined as follows
1 Bitwise XOR
The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1
2 Substitution Box S
SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition
S Znb2b rarr Znb
2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR
Word Rotation R
The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1
Bit Rotation r
The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word
22
Addition mod2b _
The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1
The Round and Key Round
Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that
[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri
[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri
[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi
23
R
R-1
r S
ki
Li Ri
Li+1 Ri+1
KLi KRi
R r S
Ci
KLi+1
KRi+1
FIG 31 Encryptdecrypt round and key round
The Complete Cipher
The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K
key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))
encryptionfor i in 1 to nr2
[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)
24
finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD
32 Design Properties of the Components
Substitution Box S
The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table
Bit and Word Rotations r and R
The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3
The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds
Addition mod 2b _
Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in
25
most processors (4) necessity to avoid structural attacks
33 Overall Structure
The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round
34 Security Analysis
Resistance Against Known Attacks
Linear and Differential Cryptanalysis
From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round
Extensions of Linear and Differential Cryptanalysis
Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear
26
cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack
However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC
It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks
As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account
A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a
Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a
These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and
27
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
CASE STATEMENT
The format of a case statement is Case expression is
When choices =gtsequential statementsWhen choices =gtsequential statements End case
The case statement selects one of the branches for the execution based on the value of the expression The expression value must be of a discrete type or one-dimensional array type Choices may be expressed as single values as a range of values by choosing ldquoothersrdquo The other clause can be used as a choice to cover the ldquocatch-allrdquo values and if present must be the last branch in the case statement
LOOP STATEMENTS
A loop statement is used to iterate through a set of sequential statements the syntax for loop statement is
[Loop-label] iteration-scheme loopSequential-statements End loop [loop label]
14
28 Active HDL Overview
Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries
Standards Supported
VHDL
The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard
Verilog
The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL
EDIF
Active-HDL supports Electronic Design Interchange Format version 2 0 0
VITAL
The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21
WAVES
Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system
15
29 ACTIVE-HDL Macro Language
All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)
1 HDL Editor
HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts
2 Block Diagram Editor
Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
3 State Diagram Editor
State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
4 Waveform Editor
Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors
5 Design Browser
The Design Browser window displays the contents of the current design that is
Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the
current design
16
6 Console window
The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console
210 Compilation
Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following
VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)
In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram
A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection
Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled
211 Simulation
The purpose of simulation is to verify that the circuit works as desired
The Active-HDL simulator provides two simulation engines
Event-Driven Simulation Cycle-Based Simulation
17
The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven
212 XILINX
Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish
ISE enables you to start your design with any of a number of different source types including
HDL (VHDL Verilog HDL ABEL)
Schematic design files
EDIF
NGCNGO
State Machines
IP Cores
From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration
Design Entry
ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports
18
Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow
CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories
Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints
PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints
State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL
Implementation
Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file
Map - The Map program maps a logical design to a Xilinx FPGA
Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator
Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design
FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing
Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design
Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file
Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments
19
Device Download and Program File Formatting
BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration
iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device
XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices
Integration with ChipScope Pro
CH 3 Introduction to SEA
Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication
In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle
20
SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo
Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems
For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted
31 Specifications
Parameters and Definitions
SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds
As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0
Basic Operations
21
Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _
These operations are formally defined as follows
1 Bitwise XOR
The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1
2 Substitution Box S
SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition
S Znb2b rarr Znb
2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR
Word Rotation R
The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1
Bit Rotation r
The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word
22
Addition mod2b _
The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1
The Round and Key Round
Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that
[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri
[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri
[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi
23
R
R-1
r S
ki
Li Ri
Li+1 Ri+1
KLi KRi
R r S
Ci
KLi+1
KRi+1
FIG 31 Encryptdecrypt round and key round
The Complete Cipher
The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K
key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))
encryptionfor i in 1 to nr2
[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)
24
finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD
32 Design Properties of the Components
Substitution Box S
The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table
Bit and Word Rotations r and R
The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3
The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds
Addition mod 2b _
Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in
25
most processors (4) necessity to avoid structural attacks
33 Overall Structure
The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round
34 Security Analysis
Resistance Against Known Attacks
Linear and Differential Cryptanalysis
From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round
Extensions of Linear and Differential Cryptanalysis
Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear
26
cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack
However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC
It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks
As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account
A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a
Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a
These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and
27
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
28 Active HDL Overview
Active-HDL is an integrated environment designed for development of VHDL Verilog EDIF and mixed VHDL-Verilog-EDIF designs It comprises three different design entry tools VHDL93 compiler Verilog compiler single simulation kernel several debugging tools graphical and textual simulation output viewers and auxiliary utilities designed for easy management of resource files designs and libraries
Standards Supported
VHDL
The VHDL simulator implemented in Active-HDL supports the IEEE Std 1076-1993 standard
Verilog
The Verilog simulator implemented in Active-HDL supports the IEEE Std 1364-1995 standard Both PLI (Programming Language Interface) and VCD (Value Change Dump) are also supported in Active-HDL
EDIF
Active-HDL supports Electronic Design Interchange Format version 2 0 0
VITAL
The simulator provides built-in acceleration for VITAL packages version 30 The VITAL-compliant models can be annotated with timing data from SDF files SDF files must comply with OVI Standard Delay Format Specification Version 21
WAVES
Active-HDL supports automatic generation of test benches compliant with the WAVES standard The basis for this implementation is a draft version of the standard dated to May 1997 (IEEE P10291D10 May 1997) The WAVES standard (Waveform and Vector Exchange to Support Design and Test Verification) defines a formal notation that supports the verification and testing of hardware designs the communication of hardware design and test verification data the maintenance modification and procurement of hardware system
15
29 ACTIVE-HDL Macro Language
All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)
1 HDL Editor
HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts
2 Block Diagram Editor
Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
3 State Diagram Editor
State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
4 Waveform Editor
Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors
5 Design Browser
The Design Browser window displays the contents of the current design that is
Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the
current design
16
6 Console window
The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console
210 Compilation
Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following
VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)
In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram
A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection
Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled
211 Simulation
The purpose of simulation is to verify that the circuit works as desired
The Active-HDL simulator provides two simulation engines
Event-Driven Simulation Cycle-Based Simulation
17
The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven
212 XILINX
Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish
ISE enables you to start your design with any of a number of different source types including
HDL (VHDL Verilog HDL ABEL)
Schematic design files
EDIF
NGCNGO
State Machines
IP Cores
From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration
Design Entry
ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports
18
Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow
CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories
Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints
PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints
State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL
Implementation
Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file
Map - The Map program maps a logical design to a Xilinx FPGA
Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator
Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design
FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing
Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design
Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file
Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments
19
Device Download and Program File Formatting
BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration
iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device
XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices
Integration with ChipScope Pro
CH 3 Introduction to SEA
Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication
In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle
20
SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo
Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems
For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted
31 Specifications
Parameters and Definitions
SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds
As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0
Basic Operations
21
Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _
These operations are formally defined as follows
1 Bitwise XOR
The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1
2 Substitution Box S
SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition
S Znb2b rarr Znb
2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR
Word Rotation R
The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1
Bit Rotation r
The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word
22
Addition mod2b _
The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1
The Round and Key Round
Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that
[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri
[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri
[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi
23
R
R-1
r S
ki
Li Ri
Li+1 Ri+1
KLi KRi
R r S
Ci
KLi+1
KRi+1
FIG 31 Encryptdecrypt round and key round
The Complete Cipher
The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K
key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))
encryptionfor i in 1 to nr2
[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)
24
finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD
32 Design Properties of the Components
Substitution Box S
The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table
Bit and Word Rotations r and R
The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3
The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds
Addition mod 2b _
Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in
25
most processors (4) necessity to avoid structural attacks
33 Overall Structure
The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round
34 Security Analysis
Resistance Against Known Attacks
Linear and Differential Cryptanalysis
From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round
Extensions of Linear and Differential Cryptanalysis
Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear
26
cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack
However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC
It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks
As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account
A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a
Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a
These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and
27
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
29 ACTIVE-HDL Macro Language
All operations in Active-HDL can be performed using Active-HDL macro language The language has been designed to enable the user to work with Active-HDL without using the graphical user interface (GUI)
1 HDL Editor
HDL Editor is a text editor designed for HDL source files It displays specific syntax categories in different colors (keyword coloring) The editor is tightly integrated with the simulator to enable debugging source code The keyword coloring is also available when HDL Editor is used for editing macro files Perl scripts and Tcl scripts
2 Block Diagram Editor
Block Diagram Editor is a graphical tool designed to create block diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
3 State Diagram Editor
State Diagram Editor is a graphical tool designed to edit state machine diagrams The editor automatically translates graphically designed diagrams into VHDL or Verilog code
4 Waveform Editor
Waveform Editor displays the results of a simulation run as signal waveforms It allows you to graphically edit waveforms so as to create desired test vectors
5 Design Browser
The Design Browser window displays the contents of the current design that is
Resource files attached to the design The contents of the default-working library of the design The structure of the design unit selected for simulation VHDL Verilog or EDIF objects declared within a selected region of the
current design
16
6 Console window
The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console
210 Compilation
Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following
VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)
In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram
A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection
Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled
211 Simulation
The purpose of simulation is to verify that the circuit works as desired
The Active-HDL simulator provides two simulation engines
Event-Driven Simulation Cycle-Based Simulation
17
The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven
212 XILINX
Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish
ISE enables you to start your design with any of a number of different source types including
HDL (VHDL Verilog HDL ABEL)
Schematic design files
EDIF
NGCNGO
State Machines
IP Cores
From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration
Design Entry
ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports
18
Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow
CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories
Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints
PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints
State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL
Implementation
Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file
Map - The Map program maps a logical design to a Xilinx FPGA
Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator
Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design
FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing
Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design
Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file
Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments
19
Device Download and Program File Formatting
BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration
iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device
XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices
Integration with ChipScope Pro
CH 3 Introduction to SEA
Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication
In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle
20
SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo
Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems
For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted
31 Specifications
Parameters and Definitions
SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds
As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0
Basic Operations
21
Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _
These operations are formally defined as follows
1 Bitwise XOR
The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1
2 Substitution Box S
SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition
S Znb2b rarr Znb
2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR
Word Rotation R
The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1
Bit Rotation r
The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word
22
Addition mod2b _
The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1
The Round and Key Round
Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that
[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri
[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri
[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi
23
R
R-1
r S
ki
Li Ri
Li+1 Ri+1
KLi KRi
R r S
Ci
KLi+1
KRi+1
FIG 31 Encryptdecrypt round and key round
The Complete Cipher
The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K
key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))
encryptionfor i in 1 to nr2
[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)
24
finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD
32 Design Properties of the Components
Substitution Box S
The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table
Bit and Word Rotations r and R
The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3
The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds
Addition mod 2b _
Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in
25
most processors (4) necessity to avoid structural attacks
33 Overall Structure
The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round
34 Security Analysis
Resistance Against Known Attacks
Linear and Differential Cryptanalysis
From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round
Extensions of Linear and Differential Cryptanalysis
Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear
26
cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack
However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC
It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks
As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account
A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a
Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a
These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and
27
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
6 Console window
The Console window is an interactive input-output text device providing entry for Active-HDL macro language commands macros and scripts All Active-HDL tools output their messages to Console
210 Compilation
Compilation is a process of analysis of a source file Analyzed design units contained within the file are placed into the working library in a format understandable for the simulator In Active-HDL a source file can be on of the following
VHDL file (vhd) Verilog file (v) EDIF net list file State diagram file (asf) Block diagram file (bde)
In the case of a block or state diagram file the compiler analyzes the intermediate VHDL Verilog or EDIF file containing HDL code (or net list) generated from the diagram
A net list is a set of statements that specifies the elements of a circuit (for example transistors or gates) and their interconnection
Active-HDL provides three compilers respectively for VHDL Verilog and EDIF When you choose a menu command or toolbar button for compilation Active-HDL automatically employs the compiler appropriate for the type of the source file being compiled
211 Simulation
The purpose of simulation is to verify that the circuit works as desired
The Active-HDL simulator provides two simulation engines
Event-Driven Simulation Cycle-Based Simulation
17
The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven
212 XILINX
Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish
ISE enables you to start your design with any of a number of different source types including
HDL (VHDL Verilog HDL ABEL)
Schematic design files
EDIF
NGCNGO
State Machines
IP Cores
From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration
Design Entry
ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports
18
Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow
CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories
Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints
PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints
State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL
Implementation
Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file
Map - The Map program maps a logical design to a Xilinx FPGA
Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator
Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design
FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing
Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design
Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file
Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments
19
Device Download and Program File Formatting
BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration
iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device
XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices
Integration with ChipScope Pro
CH 3 Introduction to SEA
Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication
In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle
20
SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo
Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems
For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted
31 Specifications
Parameters and Definitions
SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds
As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0
Basic Operations
21
Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _
These operations are formally defined as follows
1 Bitwise XOR
The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1
2 Substitution Box S
SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition
S Znb2b rarr Znb
2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR
Word Rotation R
The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1
Bit Rotation r
The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word
22
Addition mod2b _
The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1
The Round and Key Round
Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that
[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri
[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri
[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi
23
R
R-1
r S
ki
Li Ri
Li+1 Ri+1
KLi KRi
R r S
Ci
KLi+1
KRi+1
FIG 31 Encryptdecrypt round and key round
The Complete Cipher
The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K
key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))
encryptionfor i in 1 to nr2
[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)
24
finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD
32 Design Properties of the Components
Substitution Box S
The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table
Bit and Word Rotations r and R
The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3
The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds
Addition mod 2b _
Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in
25
most processors (4) necessity to avoid structural attacks
33 Overall Structure
The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round
34 Security Analysis
Resistance Against Known Attacks
Linear and Differential Cryptanalysis
From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round
Extensions of Linear and Differential Cryptanalysis
Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear
26
cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack
However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC
It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks
As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account
A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a
Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a
These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and
27
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
The simulator supports hybrid simulation ndash some portions of a design can be simulated in the event-driven kernel while the others in the cycle-based kernel Cycle-based simulation is significantly faster than event-driven
212 XILINX
Integrated Software Environment (ISE) is the Xilinx design software suite This overview explains the general progression of a design through ISE from start to finish
ISE enables you to start your design with any of a number of different source types including
HDL (VHDL Verilog HDL ABEL)
Schematic design files
EDIF
NGCNGO
State Machines
IP Cores
From your source files ISE enables you to quickly verify the functionality of these sources using the integrated simulation capabilities including ModelSim Xilinx Edition and the HDL Bencher test bench generator HDL sources may be synthesized using the Xilinx Synthesis Technology (XST) as well as partner synthesis engines used standalone or integrated into ISE The Xilinx implementation tools continue the process into a placed and routed FPGA or fitted CPLD and finally produce a bit stream for your device configuration
Design Entry
ISE Text Editor - The ISE Text Editor is provided in ISE for entering design code and viewing reports
18
Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow
CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories
Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints
PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints
State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL
Implementation
Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file
Map - The Map program maps a logical design to a Xilinx FPGA
Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator
Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design
FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing
Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design
Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file
Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments
19
Device Download and Program File Formatting
BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration
iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device
XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices
Integration with ChipScope Pro
CH 3 Introduction to SEA
Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication
In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle
20
SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo
Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems
For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted
31 Specifications
Parameters and Definitions
SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds
As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0
Basic Operations
21
Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _
These operations are formally defined as follows
1 Bitwise XOR
The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1
2 Substitution Box S
SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition
S Znb2b rarr Znb
2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR
Word Rotation R
The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1
Bit Rotation r
The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word
22
Addition mod2b _
The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1
The Round and Key Round
Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that
[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri
[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri
[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi
23
R
R-1
r S
ki
Li Ri
Li+1 Ri+1
KLi KRi
R r S
Ci
KLi+1
KRi+1
FIG 31 Encryptdecrypt round and key round
The Complete Cipher
The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K
key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))
encryptionfor i in 1 to nr2
[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)
24
finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD
32 Design Properties of the Components
Substitution Box S
The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table
Bit and Word Rotations r and R
The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3
The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds
Addition mod 2b _
Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in
25
most processors (4) necessity to avoid structural attacks
33 Overall Structure
The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round
34 Security Analysis
Resistance Against Known Attacks
Linear and Differential Cryptanalysis
From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round
Extensions of Linear and Differential Cryptanalysis
Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear
26
cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack
However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC
It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks
As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account
A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a
Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a
These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and
27
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
Schematic Editor - The Engineering Capture System (ECS) is a graphical user interface (GUI) that allows you to create view and edit schematics and symbols for the Design Entry step of the Xilinxreg design flow
CORE Generator - The CORE Generator System is a design tool that delivers parameterized cores optimized for Xilinx FPGAs ranging in complexity from simple arithmetic operators such as adders to system-level building blocks such as filters transforms FIFOs and memories
Constraints Editor - The Constraints Editor allows you to create and modify the most commonly used timing constraints
PACE - The Pin out and Area Constraints Editor (PACE) allows you to view and edit IO Global logic and Area Group constraints
State CAD State Machine Editor - State CAD allows you to specify states transitions and actions in a graphical editor The state machine will be created in HDL
Implementation
Translate - The Translate process runs NGDBuild to merge all of the input net lists as well as design constraint information into a Xilinx database file
Map - The Map program maps a logical design to a Xilinx FPGA
Place and Route (PAR) - The PAR program accepts the mapped design places and routes the FPGA and produces output for the bit stream generator
Floor planner - The Floor planner allows you to view a graphical representation of the FPGA and to view and modify the placed design
FPGA Editor - The FPGA Editor allows you view and modify the physical implementation including routing
Timing Analyzer - The Timing Analyzer provides a way to perform static timing analysis on FPGA and CPLD designs With Timing Analyzer analysis can be performed immediately after mapping placing or routing an FPGA design and after fitting and routing a CPLD design
Fit (CPLD only) - The CPLDFit process maps a net list(s) into specified devices and creates the JEDEC programming file
Chip Viewer (CPLD only) - The Chip Viewer tool provides a graphical view of the inputs and outputs macro cell details equations and pin assignments
19
Device Download and Program File Formatting
BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration
iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device
XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices
Integration with ChipScope Pro
CH 3 Introduction to SEA
Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication
In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle
20
SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo
Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems
For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted
31 Specifications
Parameters and Definitions
SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds
As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0
Basic Operations
21
Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _
These operations are formally defined as follows
1 Bitwise XOR
The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1
2 Substitution Box S
SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition
S Znb2b rarr Znb
2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR
Word Rotation R
The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1
Bit Rotation r
The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word
22
Addition mod2b _
The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1
The Round and Key Round
Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that
[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri
[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri
[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi
23
R
R-1
r S
ki
Li Ri
Li+1 Ri+1
KLi KRi
R r S
Ci
KLi+1
KRi+1
FIG 31 Encryptdecrypt round and key round
The Complete Cipher
The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K
key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))
encryptionfor i in 1 to nr2
[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)
24
finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD
32 Design Properties of the Components
Substitution Box S
The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table
Bit and Word Rotations r and R
The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3
The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds
Addition mod 2b _
Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in
25
most processors (4) necessity to avoid structural attacks
33 Overall Structure
The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round
34 Security Analysis
Resistance Against Known Attacks
Linear and Differential Cryptanalysis
From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round
Extensions of Linear and Differential Cryptanalysis
Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear
26
cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack
However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC
It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks
As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account
A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a
Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a
These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and
27
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
Device Download and Program File Formatting
BitGen - The BitGen program receives the placed and routed design and produces a bit stream for Xilinx device configuration
iMPACT - The iMPACT tool generates various programming file formats and subsequently allows you to configure your device
XPower - XPower enables you to interactively and automatically analyze power consumption for Xilinx FPGA and CPLD devices
Integration with ChipScope Pro
CH 3 Introduction to SEA
Most present symmetric encryption algorithms result from a tradeoff between implementation cost and resulting performances In addition they generally aim to be implemented efficiently on a large variety of platforms In this paper we take an opposite approach and consider a context where we have very limited processing resources and throughput requirements For this purpose we propose low-cost encryption routines (ie with small code size and memory) targeted for processors with a limited instruction set (ie AND OR XOR gates word rotation and modular addition) The proposed design is parametric in the text key and processor size allows efficient combination of encryptiondecryption ldquoon-the-flyrdquo key derivation and its security against a number of recent cryptanalytic techniques is discussed Target applications for such routines include any context requiring low-cost encryption andor authentication
In this paper we consequently consider a general context where we have very limited processing resources (eg a small processor) and throughput requirements It yields design criteria such as low memory requirements small code size limited instruction set In addition we propose the flexibility as another unusual design principle
20
SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo
Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems
For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted
31 Specifications
Parameters and Definitions
SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds
As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0
Basic Operations
21
Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _
These operations are formally defined as follows
1 Bitwise XOR
The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1
2 Substitution Box S
SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition
S Znb2b rarr Znb
2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR
Word Rotation R
The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1
Bit Rotation r
The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word
22
Addition mod2b _
The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1
The Round and Key Round
Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that
[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri
[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri
[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi
23
R
R-1
r S
ki
Li Ri
Li+1 Ri+1
KLi KRi
R r S
Ci
KLi+1
KRi+1
FIG 31 Encryptdecrypt round and key round
The Complete Cipher
The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K
key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))
encryptionfor i in 1 to nr2
[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)
24
finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD
32 Design Properties of the Components
Substitution Box S
The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table
Bit and Word Rotations r and R
The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3
The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds
Addition mod 2b _
Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in
25
most processors (4) necessity to avoid structural attacks
33 Overall Structure
The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round
34 Security Analysis
Resistance Against Known Attacks
Linear and Differential Cryptanalysis
From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round
Extensions of Linear and Differential Cryptanalysis
Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear
26
cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack
However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC
It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks
As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account
A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a
Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a
These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and
27
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
SEAnb is parametric in the text key and processor size Such an approach was motivated by the fact that many algorithms behave differently on different platforms (eg 8-bit or 32-bit processors) In opposition SEAnb allows to obtain a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size Beyond these general guidelines alternative features were wanted including the efficient combination of encryption and decryption or the ability to derive keys ldquoon the flyrdquo
Those goals are particularly relevant in contexts where the same constrained device has to perform encryption and decryption operations (eg authentication) Finally the simplicity of SEAnb makes its implementation straightforward Embedded applications such as building infrastructures present a significant opportunity and challenge for such new cryptosystems
For example introducing programmability into the configuration of lights and switches thermostats and air handlers promises to improve the cost of construction flexibility in occupancy and energy efficiency of buildings But meeting this demand on a scale compatible with the economics of the construction industry is going to require secure lightweight implementations of peer-to-peer networks in resource-constrained systems The Internet-0 approach to end-to-end modulation for interdevice internetworking is typically appropriate in this limit [20] SEAnb constitutes a suitable solution for low-cost encryptionauthentication within such networks RFIDrsquos or any powerspace-limited applications are similarly targeted
31 Specifications
Parameters and Definitions
SEAnb operates on various text key and word sizes It is based on a Feistelstructure with a variable number of rounds and is defined with respect to thefollowing parametersndash n plaintext size key sizendash b processor (or word) sizendash nb = n2b number of words per Feistel branch--nr number of block cipher rounds
As only constraint it is required that n is a multiple of 6b For example usingan 8-bit processor we can derive 48 96 144 -bit block ciphers respectivelydenoted as SEA488 SEA968 SEA1448 Let x be a n2-bit vector In the following we will consider two representationsndash Bit representation xb = x(n2minus 1) x(n2minus 2) x(2) x(1) x(0)--Word representation xW = xnbminus1 xnbminus2 x2 x1 x0
Basic Operations
21
Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _
These operations are formally defined as follows
1 Bitwise XOR
The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1
2 Substitution Box S
SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition
S Znb2b rarr Znb
2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR
Word Rotation R
The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1
Bit Rotation r
The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word
22
Addition mod2b _
The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1
The Round and Key Round
Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that
[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri
[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri
[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi
23
R
R-1
r S
ki
Li Ri
Li+1 Ri+1
KLi KRi
R r S
Ci
KLi+1
KRi+1
FIG 31 Encryptdecrypt round and key round
The Complete Cipher
The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K
key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))
encryptionfor i in 1 to nr2
[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)
24
finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD
32 Design Properties of the Components
Substitution Box S
The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table
Bit and Word Rotations r and R
The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3
The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds
Addition mod 2b _
Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in
25
most processors (4) necessity to avoid structural attacks
33 Overall Structure
The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round
34 Security Analysis
Resistance Against Known Attacks
Linear and Differential Cryptanalysis
From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round
Extensions of Linear and Differential Cryptanalysis
Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear
26
cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack
However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC
It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks
As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account
A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a
Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a
These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and
27
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
Due to its simplicity constraints SEAnb is based on a limited number of elementaryoperations (selected for their availability in any processing device) denotedas follows (1) bitwise XOR oplus (2) substitution box S (3) word (left) rotationR and inverse word rotation Rminus1 (4) bit rotation r (5) addition mod 2b _
These operations are formally defined as follows
1 Bitwise XOR
The bitwise XOR is defined on n2-bit vectorsoplus Zn22 ラ Zn22 rarr Zn22 x y rarr z = x oplus y hArr z(i) = x(i) oplus y(i) 0 le i len2 minus 1
2 Substitution Box S
SEAnb uses the following 3-bit substitution tableST = 0 5 6 7 4 3 1 2in C-like notation For efficiency purposes it is applied bitwise to any set of threewords of data using the following recursive definition
S Znb2b rarr Znb
2b x rarr x = S(x) hArrx3i = (x3i+2 and x3i+1) oplus x3ix3i+1 = (x3i+2 and x3i) oplus x3i+1x3i+2 = (x3i or x3i+1) oplus x3i+2 0le i le nb3 minus 1where and and or respectively represent the bitwise AND and OR
Word Rotation R
The word rotation is defined on nb-word vectorsR Znb2b rarr Znb2b x rarr y = R(x) hArr yi+1 = xi 0 le i le nb minus 2y0 = xnbminus1
Bit Rotation r
The bit rotation is defined on nb-word vectorsr Znb2b rarr Znb2b x rarr y = r(x) hArr y3i = x3i≫1y3i+1 = x3i+1y3i+2 = x3i+2 ≪1 0 le i le nb3 minus 1where≫and ≪represent the cyclic right and left shifts inside a word
22
Addition mod2b _
The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1
The Round and Key Round
Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that
[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri
[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri
[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi
23
R
R-1
r S
ki
Li Ri
Li+1 Ri+1
KLi KRi
R r S
Ci
KLi+1
KRi+1
FIG 31 Encryptdecrypt round and key round
The Complete Cipher
The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K
key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))
encryptionfor i in 1 to nr2
[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)
24
finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD
32 Design Properties of the Components
Substitution Box S
The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table
Bit and Word Rotations r and R
The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3
The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds
Addition mod 2b _
Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in
25
most processors (4) necessity to avoid structural attacks
33 Overall Structure
The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round
34 Security Analysis
Resistance Against Known Attacks
Linear and Differential Cryptanalysis
From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round
Extensions of Linear and Differential Cryptanalysis
Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear
26
cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack
However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC
It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks
As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account
A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a
Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a
These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and
27
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
Addition mod2b _
The mod 2b addition is defined on nb-word vectorsr Znb2b ラ Znb2b rarr Znb2b x y rarr z = x _ y hArr zi = xi _ yi 0 le i le nb minus 1
The Round and Key Round
Based on the previous definitions the encrypt round FE decrypt round FDand key round FK are pictured in Figure 1 and defined as the functions F Z2 2n2 ラ Z2n2 rarr Z2 2n2 such that
[Li+1Ri+1] = FE(LiRiKi) Ri+1 = R(Li) oplus r_S(Ri _ Ki)_ Li+1 = Ri
[Li+1Ri+1] = FD(LiRiKi) Ri+1 = Rminus1_Li oplus r_S(Ri _ Ki)__ Li+1 = Ri
[KLi+1KRi+1] = FK(KLiKRi Ci) KRi+1=KLi oplus R_r_S(KRi _ Ci)__ KLi+1 = KRi
23
R
R-1
r S
ki
Li Ri
Li+1 Ri+1
KLi KRi
R r S
Ci
KLi+1
KRi+1
FIG 31 Encryptdecrypt round and key round
The Complete Cipher
The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K
key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))
encryptionfor i in 1 to nr2
[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)
24
finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD
32 Design Properties of the Components
Substitution Box S
The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table
Bit and Word Rotations r and R
The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3
The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds
Addition mod 2b _
Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in
25
most processors (4) necessity to avoid structural attacks
33 Overall Structure
The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round
34 Security Analysis
Resistance Against Known Attacks
Linear and Differential Cryptanalysis
From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round
Extensions of Linear and Differential Cryptanalysis
Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear
26
cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack
However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC
It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks
As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account
A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a
Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a
These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and
27
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
R
R-1
r S
ki
Li Ri
Li+1 Ri+1
KLi KRi
R r S
Ci
KLi+1
KRi+1
FIG 31 Encryptdecrypt round and key round
The Complete Cipher
The cipher iterates an odd number nr of rounds The following pseudo-C codeencrypts a plaintext P under a key K and produces a ciphertext C PC andK have a parametric bit size n The operations within the cipher are performedconsidering parametric b-bit wordsC=SEAnb(PK) initializationL0ampR0 = PKL0ampKR0 = K
key schedulingfor i in 1 to _nr2_[KLiKRi] = FK(KLiminus1KRiminus1 C(i))switch KL_ nrfor i in nr 2_ KR_ nr2_2 to nr minus 1[KLiKRi] = FK(KLiminus1KRiminus1 C(r minus i))
encryptionfor i in 1 to nr2
[LiRi] = FE(Liminus1Riminus1KRiminus1)for i in nr2 + 1 to nr[LiRi] = FE(Liminus1Riminus1KLiminus1)
24
finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD
32 Design Properties of the Components
Substitution Box S
The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table
Bit and Word Rotations r and R
The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3
The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds
Addition mod 2b _
Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in
25
most processors (4) necessity to avoid structural attacks
33 Overall Structure
The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round
34 Security Analysis
Resistance Against Known Attacks
Linear and Differential Cryptanalysis
From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round
Extensions of Linear and Differential Cryptanalysis
Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear
26
cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack
However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC
It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks
As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account
A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a
Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a
These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and
27
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
finalC = RnrampLnr switch KLnrminus1 KRnrminus1wherewhere amp is the concatenation operator KR_ nr2 _is taken before the switch andC(i) is a nb-word vector of which all the words have value 0 excepted the LSWthat equals i Decryption is exactly the same using the decrypt round FD
32 Design Properties of the Components
Substitution Box S
The substitution box was searched exhaustively in order to meet the following security and efficiency criteriandash λ-parameter1 12ndash δ-parameter2 14ndash Maximum nonlinear order namely 2ndash Recursive definitionndash Minimum number of instructionsRemark that if 3-operand instructions are available the recursive definition allows to perform the substitution box in 2 operations per word of data As a comparison the 3 ラ 3 bitwise substitution box used in 3-WAY [15] requires 3 The counterpart of this efficiency is the presence of two fixed points in the table
Bit and Word Rotations r and R
The cyclic rotations were defined in order to provide predictable low-cost diffusion within the cipher when combined with the bitslice substitution box It is illustrated in Figure 2 for a single substitution box scheme with parameters n = 48 b = 8 nb = 3 Looking at the figure it can be seen that SEAnb divides its data in 2nb3 blocks of 3 words The substitution box is applied in parallel to these blocks Therefore the diffusion process (starting with one single active bit in the left branch) is divided into two steps3
The first phase is obtained by the combination of the word rotation R (which is the only transform to provide inter-word diffusion) with the substitution box It requires at most nb rounds to be completed (in our example nb = 3 which yields 3 rounds) Once every word has at least one active bit the combination of r and S yields six more active bits per block in each round Therefore finishing the diffusion of all the blocks requires at most _b2_ rounds Combining these observations the diffusion is complete after nb + _b2_ rounds
Addition mod 2b _
Using a mod 2b key addition in place of a bitwise XOR was motivated by different reasons (1) improvement of the diffusion process (2) improvement of the non-linearity (3) same costspeed as the bitwise XOR in
25
most processors (4) necessity to avoid structural attacks
33 Overall Structure
The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round
34 Security Analysis
Resistance Against Known Attacks
Linear and Differential Cryptanalysis
From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round
Extensions of Linear and Differential Cryptanalysis
Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear
26
cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack
However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC
It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks
As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account
A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a
Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a
These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and
27
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
most processors (4) necessity to avoid structural attacks
33 Overall Structure
The overall structure of the cipher follows the Feistel strategy However a few points are specific to SEAnb namely the key schedule and the position of R Rminus1 in the encryptdecrypt roundsThe key schedule is designed such that the master key is encrypted during half the rounds and decrypted during the other half It allows to obtain a particular structure of the sequence of round keys such that the key expansion is exactly the same in encryption and decryption Namely we haveK0K1K2 K_ r2 _K_ r2 _minus1 K2K1K0As a consequence of this structure the encryptiondecryption rounds cannotkeep the traditional Feistel structure it would result in having identical encryptionand decryption functions This is the reason of moving the word rotationto the left branch of the Feistel round
34 Security Analysis
Resistance Against Known Attacks
Linear and Differential Cryptanalysis
From the properties of the substitution box we can compute bounds for the best linear and differential characteristics through the cipher We first use the following lemma [29]Lemma 1 Let f be the bijective nonlinear function of a 3-round Feistel cipher Assuming that the linear parameter of f is smaller than λ and its differential parameter is smaller than δ then the linear differential parameters of the 3-round cipher ΔΛ are respectively smaller than λ2 δ2 Since our nonlinear function S has parameter δ = 2minus2 and parameter λ = 2minus1it implies that 3 rounds of SEAnb have their linear and differential parameters respectively bounded by Δ lt 2minus4 and Λ lt 2minus2 However for a n-bit block cipher it is respectively required that Δ _ 2minusn and Λ _ 2minusn2 to resist against differential [4] and linear cryptanalysis [28] In order to approach these bounds we require thatδ2nr3 = _2minus2_2nr3lt 2minusn and λ2nr3 = _2minus1_2nr3lt 2minusn2 (1)In both cases the required number of rounds is nr ge 3n4 We note that we used a hybrid approach between the provable security against linear and differential attacks that consists in bounding the parameter of the best differentialhull like in lemma 1 and the usual heuristics to estimate the best lineardifferential characteristic through a cipher (as in the previous estimation for nr) In fact the strategy of Equation (1) is similar to the one of eg the AES Rijndael [17] but we only assume one active s-box per round
Extensions of Linear and Differential Cryptanalysis
Classical extensions of linear and differential cryptanalysis are non-linear approximations of outer rounds [26] bi-linear cryptanalysis [14] differential-linear
26
cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack
However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC
It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks
As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account
A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a
Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a
These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and
27
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
cryptanalysis [27] multiple linear cryptanalysis [22 10] boomerang [31] and rectangle [8] attack
However these extensions usually imply only a small improvement compared to the basic attacks As a matter of fact non-linear approximations of outer rounds allow to improve the bias of one or two rounds only Regarding bi-linear cryptanalysis we quote the author of [14] For ciphers similar to DES based on small substitution boxes we claim that bi-linear cryptanalysis is very closely related to LC and we do not expect to find a bi-linear attack much faster than by LC
It is difficult to evaluate the efficiency of multiple linear cryptanalysis but it seems more promising for big substitution boxes (as mentioned in [22]) Moreover the improvement on classical cryptanalysis obtained in [10] for the case of DES (which shares with SEAnb a Feistel structure and a poor diffusion) is limited Finally the complexity of differential-linear cryptanalysis and of the boomerang attack and its variants is inherently greater than the one of the basic attacks
As an example the boomerang (or rectangle) attack allows us to use two short differentials instead of a long one but using a long differential with probability pq is in general highly preferable to applying a boomerang attack with two short differentials of probability p and q Therefore although these attacks can perform slightly better in specific cases the expected improvement is never outstandingThe conclusion is that these extensions actually deserve to be considered in the estimation of the number of rounds necessary to achieve security but that a reasonable multiplicative factor should be enough to take them into account
A Dedicated Related-Key Attack Against a Modified Version Forx isin Znb2b we denote by x≪a the left rotation by a bits of each of the nb wordsof x The non-linear and diffusion layers have the following propertiesndash S(x≪a) = S(x)≪andash r(x≪a) = r(x)≪andash R(x≪a) = R(x)≪a
Consider a modified version of our cipher where key addition is performed using rather than modular addition and where all round constants Ci are such that Ci ≪ a = Ci eg all Cirsquos equal 0 As a consequence of the previous observations the modified round F_E and the key round FK satisfyF_E (L≪aR≪aK ≪a) = F_E (LRK)≪aFK(KL≪aKR≪a 0) = FK(KLKR 0)≪a
These properties are iterative in the sense that they also hold for the composition of several block cipher rounds It is immediate to deduce from them a distinguisher on the modified cipher which requires 2 chosen encryption queries under 2 related keys K and
27
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
K ≪a In the actual SEAnb the key addition is performed word-wise mod 2b As the property (X ≪ a) _ (K ≪ a) = (X _ K) ≪ a is prevented by certain carry propagations it only holds with a probability p which depends on a and the word size b For a = 1 p rapidly converges to 38 as b grows It is smaller for 1 lt a lt bminus1 Of course this probability is averaged for all possible (XK) and certain keys (eg ldquoall zeroesrdquo) yield no carry propagation at all However the design properties of the key schedule prevent SEAnb from having such weak keys
Moreover the round constants Ci are generally not such that Ci ≪a = Ci (because they are generated from a counter) Combined with the diffusion in the key schedule it implies that the similarity between the round keys derived from K and those derived from K ≪a rapidly vanishes These properties avoid this structural distinguisher to be propagated through more than a few rounds of SEAnb
Square Attacks
We explored square attacks [16] on SEA488 More precisely we considered all possible sets of inputs to one branch of the Feistel structure where the input to some of the substitution boxes is active (ie takes all possible input values the same number of times) and the input to the other substitution boxes is constant The other branch is also constant Therefore the number of plaintexts considered goes from 23 (when the input to only one substitution box is active) to 221 (when the input to 7 substitution boxes is active) Our experiments showed that square attacks do not allow to pass through more rounds than the diffusion pattern illustrated in Figure It is expected that it remains the same when different parameters n and b are considered which implies that nb + _b2_ rounds are enough to prevent square attacks Note that although our observations also hold for oplus-SEAnb the use of addition mod 2b provides better resistance against square attacks
Truncated and Impossible Differentials
As for square attacks the diffusion analysis illustrated in Figure provides an estimation of the number of rounds required to prevent truncated differential attacks [25] Impossible differentials[7] are usually built by concatenating two incompatible truncated differentials As a consequence we estimate the number of rounds necessary to prevent the construction of an impossible differential distinguisher as 2 キ (nb + _b2_)
Interpolation Attacks
The interpolation attack [21] is possible when the whole cipher can be written as a relatively simple algebraic expression It requires the substitution box to have a compact expression and the diffusion layer to permit the composition of these expressions In the case of SEAnb there is a priori no such expression and the bitwise diffusion would make the combination of algebraic expressions difficult anyway
Slide Attacks
The sequence of round keys of SEAnb is the same as the one of ICEBERG Therefore the analysis done in [30] is still valid Namely the non periodicity of the sequence should make slide attacks [11 12] irrelevant The particular structure of this sequence also has
28
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
some similarities with the one of GOST of which the vulnerability against slide attacks is examined in [12] None of the attacks presented in [12] seems to be applicable to our cipher
Related-Key Attacks
The first related-key attack has been described in [5] It is the related-key counterpart of the slide attack Such an attack is applicable when a round key Ki is computed from the previous round key Kiminus1 using a function f which is always the same Ki = f(Kiminus1) However in the case of SEAnb a round constant that changes for each key round is used which prevents this attack Another type of related-key attack is the differential related key attack [23 24] The non-linearity of the SEAnb key schedule should prevent it Moreover note that the improvement of the differential related-key attack over classical differential cryptanalysis usually results from the fact that choosing a given round key difference allows to ldquocounterrdquo the effect of the diffusion layer on the differential characteristic a typical example is the attack on 3-WAY [24] As the security of SEAnb against differential cryptanalysis results from its large number of rounds rather than from its diffusion this effect is notrelevant here
Complementation Properties
The DES has the following complementation property if P KrarrC denotes the fact that encryption of P under key K gives ciphertext C then P K minusrarr C lArrrArr P K minusrarr C The non-linear key scheduling and the presence of carry propagations in the actual SEAnb algorithm prevents this property We are not aware of any other similar structural feature in the design
Algebraic Attacks
Algebraic attacks intend to exploit the simple algebraic structure of a block cipher For example certain block ciphers can be written as an overdefined system of quadratic equations Reference [13] argues that a method called XSL might provide a way to effectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs Clearly SEAnb has a simple algebraic structure as it is based on a 3-bit substitution box Therefore if such an attack practically applies to a cipher like Serpent [1] it is likely applicable to one of the versions of our routines As the complexity of XSL is supposedly polynomial in the plaintext size and number of rounds it is specially true when those values increase However as the criteria for these techniques to be successful are still being discussed [9] we did consider this latter point as a scope for further research We note that resistance against algebraic attacks would anyway exclude the use of small substitution boxes and therefore the possibility to build very low cost encryption routines
Suggested Number of Rounds
29
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
From the previous descriptions the minimum required number of rounds to provide security against known attacks would be 3n4 + 2 キ (nb + _b2_) This roughly corresponds to the number of rounds to resist lineardifferential attacks plus twice the number of rounds to obtain complete diffusion (to prevent both structural attacks and outer rounds improvements of statistical attacks) A more conservative approach (applied in most present block ciphers) would be to take a large security margin eg by doubling this number of rounds4 nr has to be odd we add one if it is even We also assume a minimum word size b ge 8 bits
35 Performance Analysis
SEAnb is targeted for being implemented on low-cost processors with little code size and a small instruction set However SEAnbrsquos simple structure makes it easy to implement on any processor In appendix we propose a pseudo-assembly code of an encryptiondecryption design with ldquoon the flyrdquo key scheduling The implementation objectives were in decreasing order of importance (1) low RAM and registers usage (2) low code size and (3) speed It is based on the following (very) reduced instruction set (assuming 2-operand instructions only)
ndash Arithmetic and logic operators or andoplus_≫≪
ndash Branch instructions goto subroutine call and return
ndash Comparison load RAM in register store register in RAM
According to the code in appendix the performances can be roughly estimated as follows First the combined number of RAM words and registers equals 5nb + 3 Then the code size and implementation time (both in expressed in ops) is evaluated by summing the values given in appendix For the code size it directly yields 31nb+36 ops For the implementation time the round and key round respectively require 12nb + 11 ops and 10nb + 11 ops It yields a total of (nr minus 1) ラ (12nb + 11 + 10nb + 11 + 7) + (12nb + 11) + 8nb + 7 These values are summarized in Table 1 Remark that due to the particular structure of the key scheduling we do not need to keep the master key in memory as at the end of an encryptiondecryption we have Knrminus1 = K0 Remark also that this implementation uses a low number of registers namely nb +3 However if more registers are available they can be traded for RAM words which will result in lower code size and faster implementation
30
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
For illustration purposes we implemented SEAn b on Atmel AVR ATtiny[3] And ARM [2] microprocessors The Atmel ATtiny represents a typical target for such a low-cost encryption routine We chose the ARM platform in order to provide rough comparisons between SEAnb and the AES Rijndael While direct comparisons are made difficult by their high dependencies on the target devices the following general comments can be madendash SEAnb designs combine encryption and decryption more efficiently than most other encryption algorithms In particular key agility in decryption is usually not possible (eg for the AES Rijndael)ndash The combined number of RAM words and registers of SEAnb implementations (ie 5nb + 3) is generally lower than for other block ciphersndash The code size of SEAnb is generally lower than for other block ciphers implemented on similar platforms
The flexibility of SEAnb also makes it less sensitive to the choice of a processor than fixed-sized algorithms although it is obvious that large buses improve efficiency The drawback of these limited resources is in the number of cycles required for the encryption (ie SEAnb trades space for time which may be relevant due to present processors speeds) Looking at the code size - cycles product the efficiency of SEAnb remains similar to the one of Rijndael (encryption only) that is well known for its efficient smart cards implementations
CH4 AN EXPOSITION OF THE SEA ALGORITHM
31
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
The SchoofElkiesAtkin algorithm is an e_cient way to count the number of points on an elliptic curve de_ned over a large prime _eld This expository paper describes the algorithm in su_cient detail to allow a reader not familiar with arithmetic geometry to implement the algorithm The mathematical background for the technique is then givenLet p be a large (odd) prime and let E y2 = x3 + a4x + a6 be an elliptic curve where a4 and a6 are given _xed integers In the case where p does not divide 4a34 +27a26 E can be reduced to an elliptic curve over Fp The number of points of E over Fp denoted by E(Fp) is of cryptographic interest since the properties of this number determine the security of elliptic curve cryptosystems based on E against various known attacks
The _rst polynomial time algorithm for determining the number of rational points on an elliptic curve de_ned over a _nite _eld is due to Schoof He used calculations with torsion points on the curve to arrive at the number of points At _rst Schoofs algorithm was considered impractical but Elkies suggested the use of good primes (now known as Elkies primes) where isogenies and modular curves can be involved to speed up the calculation Atkin also made a number of important contributions to the algorithm which then became known as the SchoofElkiesAtkin (SEA) algorithm
Further improvements were later proposed by Dewaghe and CouveignesDewagheMorain The SEA algorithm was implemented by Morain Muller and Izu et al Schoofs seminal paper [18] describes the original algorithm He later also published a paper [19] that is a lovely overview of the developments in the subject up to 1995 Elkies paper [9] describes the ideas of his original manuscript [8] and contains many other theoretical insights and illuminating examples The implementations of Morain and Muller are described in [15] and [16] The implementation of Izu Kogure Noro and Yokoyama which focuses on speeding up the algorithm as much as possible is described in [13]
Dewaghes improvement is published in [7 The improvement by CouveignesDewagheMorain is published in [5] Atkin never formally published his contributions described in [1] but they are discussed extensively in [9 19] This paper which is not aimed at the experts in the area describes in detail a reasonably fast implementation of the SEA algorithm that is closely modeled upon Morains The algorithm considered below is probabilistic and for a 200-bit prime p succeeds with a probability of about 3=4 (which can be brought arbitrarily close to 1 by enlarging the set A of auxiliary primes below) The algorithm implemented on a typical personal computer takes several minutes to _nd the number of points on a typical curve over Fp where p has 200 bits
It is known thatE(Fp) = p + 1 1048576 twhere t is an integer which satis_es the Hasse bound10485762pp _ t _ 2pp
The algorithm works by calculating t modulo several small auxiliary primes ` When the product of the auxiliary primes exceeds 4pp the Chinese Remainder Theorem is used to
32
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
recover the exact value of t and hence that of E(Fp) The algorithm works its way though a _xed list of 40 candidates for auxiliary primes given below For each candidate a calculation has to be carried out to generate a certain polynomial ` that is necessary for further calculations with this ` These polynomials` do not depend on the curve E under consideration and hence might be precomputed and stored if memory allows Then for any elliptic curve E we can quickly decide if our algorithm applies (the probability that the algorithm applies for a speci_c E and ` is 1=2) For those curves where the algorithm applies we can determine t modulo ` When we _nished with all our candidates for the auxiliary primes we can look at the elliptic curve and check whether the product of auxiliary primes that worked exceeds 4pp or not In the former case we succeeded in determining t
A typical application for this point counting would be to take a random prime p and a random elliptic curve E over Fp with the intention of _nding an E with E(Fp) = xr where r is a prime and x is small Given such a curve a point P of order r can be located easily and the pair (E P) could be used for a number of cryptographic algorithms such as Di_e-Hellman key exchange El Gamal encryption etc If we use 200-bit primes for p and require x _ 32 then the probability that E = xr is about 25 so we expect to have to run our algorithm on about 55 curves Section 2 describes the algorithm in detail Section 3 presents the mathematical background of the algorithm Section 4 presents ideas by which the algorithm could be improved Section 5 contains certain tables of data that need to be hardwired into a program implementing this algorithm
The Algorithm
41 Overview
The set A of potential auxiliary primes is the union of the set As of small primes and the set Al of larger primes For each ` 2 A we need to determine a polynomial `(F J) 2 Z[F J] For ` 2 As this is stored in the program For ` 2 Al must be calculated by determining a number of coefficients of a certain q-series f(q) 2 Z[[q]] and carrying out certain algebraic operations on it The polynomials do not depend on the elliptic curve under consideration and therefore may be pre-calculated and stored if there is enough space for them (they require just under a half megabyte to store)We start out with a given prime p and an elliptic curveE y2 = x3 + a4x + a6
CH 5 SEA Architecture Block Diagram
33
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
Mod
SBox
BR
WR
XOR
Round Reg
KeyReg[950]
KeyReg0[950]
KeyReg1[950]
KeyReg8[950]
KeyReg9[950]K E Y C O M P U T A T I O N A L B L O C K
Mod SBoxxxx BR
WR XOR Round Reg
Encryption Computational Block
Cipher data Register
Key0[950]
Key9[950]
DataI[950]DataLd
ClkRst
Mod SBox BR
IWR XOR Round Reg
Decryption Computational Block
Plain text data Register
Key0[950]
Key9[950]
DataO[950]DataRd
ClkRst
SMCClkRstEnaEDOvr
KeyI[950]KeyLd
ClkRst
FIG 51
51 KEY GENERATION
34
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
Key generation is the process of generating keys for cryptography A key is used to encrypt and decrypt whatever data is being encrypteddecrypted
Modern cryptographic systems include symmetric-key algorithms (such as DES and AES) and public-key algorithms (such as RSA) Symmetric-key algorithms use a single shared key keeping data secret requires keeping this key secret Public-key algorithms use a public key and a private key The public key is made available to anyone (often by means of a digital certificate) A sender will encrypt data with the public key only the holder of the private key can decrypt this data
Since public-key algorithms tend to be much slower than symmetric-key algorithms modern systems such as TLS and SSH use a combination of the two one party receives the others public key and encrypts a small piece of data (either a symmetric key or some data that will be used to generate it) The remainder of the conversation uses a (typically faster) symmetric-key algorithm for encryption
In computer cryptography keys are integers In some cases keys are randomly generated using a random number generator (RNG) or pseudorandom number generator (PRNG) the latter being a computer algorithm that produces data which appears random under analysis Of the PRNGs those which use system entropy to seed data generally produce better results since this makes the initial conditions of the PRNG much more difficult for an attacker to guess In other situations the key is created using a passphrase and a key generation algorithm usually involving a cryptographic hash function such as SHA-1
The simplest method to read encrypted data is a brute force attackmdashsimply attempting every number up to the maximum length of the key Therefore it is important to use a sufficiently long key length longer keys take exponentially longer to attack rendering a brute force attack impractical Currently key lengths of 128 bits (for symmetric key algorithms) and 1024 bits (for public-key algorithms) are common
35
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
36
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
Cryptography
Cryptography is the art and science of secret writing The term is derived from the Greek language
krytos - secret graphos - writing
52 Encryption
Encryption is the actual process of applying cryptography Much of cryptography is math oriented and uses patterns and algorithms to encrypt messages text words signals and other forms of communication Cryptography has many uses especially in the areas of espionage intelligence and military operations Cryptography deals with all aspects of secure messaging authentication digital signatures electronic money and other applications
Today many security systems and companies use cryptography to transfer information over the Internet or radio for fears of interception Some of this encryption is highly advanced however even simple encryption techniques can help uphold the privacy of any everyday person The term cryptography also meant the breaking of encrypted messages until the early 1920s when the concept of Cryptanalysis began being used and is now practically an art and science all on its own
The two main areas of cryptography are Cipher and Code
Code is one of the two major methods of cryptography This method involves the replacement of complete words or phrases by code words or numbers
Cipher is the other major method of cryptography This works on the principal of replacing individual letters by other numbers or letter
Cryptographic algorithms all perform the same basic function They take two inputs ndash a message and a key -- and transform them into a single output There are two ways to perform this function Encryption as shown in Figure 1 uses the cryptographic key to transform the original message into an encrypted form Decryption as shown in Figure 2 does the reverse it uses a cryptographic key to transform an encrypted message back into its original (aka plaintext) form
37
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
FIG 52 ENCRYPTION BLOCK
FIG 53 Encryption Operation
38
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
53 DECRYPTION
The process of decoding data that has been encrypted into a secret format Decryption requires a secret key or password
It is a commonly held misconception that every encryption method can be broken In connection with his WWII work at Bell Labs Claude Shannon proved that the one-time pad cipher is unbreakable provided the key material is truly random never reused kept secret from all possible attackers and of equal or greater length than the message [22]
Most ciphers apart from the one-time pad can be broken with enough computational effort by brute force attack but the amount of effort needed may be exponentially dependent on the key size as compared to the effort needed to use the cipher
In such cases effective security could be achieved if it is proven that the effort required (ie work factor in Shannons terms) is beyond the ability of any adversary This means it must be shown that no efficient method (as opposed to the time-consuming brute force method) can be found to break the cipher Since no such showing can be made currently as of today the one-time-pad remains the only theoretically unbreakable cipher
There are a wide variety of cryptanalytic attacks and they can be classified in any of several ways A common distinction turns on what an attacker knows and what capabilities are available In a ciphertext-only attack the cryptanalyst has access only to the ciphertext (good modern cryptosystems are usually effectively immune to ciphertext-only attacks) In a known-plaintext attack the cryptanalyst has access to a ciphertext and its corresponding plaintext (or to many such pairs) In a chosen-plaintext attack the cryptanalyst may choose a plaintext and learn its corresponding ciphertext (perhaps many times) an example is gardening used by the British during WWII
Finally in a chosen-ciphertext attack the cryptanalyst may be able to choose ciphertexts and learn their corresponding plaintexts[10] Also important often overwhelmingly so are mistakes (generally in the design or use of one of the protocols involved see Cryptanalysis of the Enigma for some historical examples of this)
Cryptanalysis of symmetric-key ciphers typically involves looking for attacks against the block ciphers or stream ciphers that are more efficient than any attack that could be against a perfect cipher For example a simple brute force attack against DES requires one known plaintext and 255 decryptions trying approximately half of the possible keys to reach a point at which chances are better than even the key sought will have been found But this may not be enough assurance a linear cryptanalysis attack against DES requires 243 known plaintexts and approximately 243 DES operations[23] This is a considerable improvement on brute force attacks
Public-key algorithms are based on the computational difficulty of various problems The most famous of these is integer factorization (eg the RSA algorithm is
39
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
based on a problem related to integer factoring) but the discrete logarithm problem is also important Much public-key cryptanalysis concerns numerical algorithms for solving these computational problems or some of them efficiently (ie in a practical time)
For instance the best known algorithms for solving the elliptic curve-based version of discrete logarithm are much more time-consuming than the best known algorithms for factoring at least for problems of more or less equivalent size Thus other things being equal to achieve an equivalent strength of attack resistance factoring-based encryption techniques must use larger keys than elliptic curve techniques For this reason public-key cryptosystems based on elliptic curves have become popular since their invention in the mid-1990s
While pure cryptanalysis uses weaknesses in the algorithms themselves other attacks on cryptosystems are based on actual use of the algorithms in real devices and are called side-channel attacks If a cryptanalyst has access to say the amount of time the device took to encrypt a number of plaintexts or report an error in a password or PIN character he may be able to use a timing attack to break a cipher that is otherwise resistant to analysis
An attacker might also study the pattern and length of messages to derive valuable information this is known as traffic analysis[24] and can be quite useful to an alert adversary Poor administration of a cryptosystem such as permitting too short keys will make any system vulnerable regardless of other virtues And of course social engineering and other attacks against the personnel who work with cryptosystems or the messages they handle (eg bribery extortion blackmail espionage torture ) may be the most productive attacks of all
40
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
FIG 54 DECRYPTION BLOCK
41
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
FIG 55 Decryption Operation
SIMULATION RESULTS
42
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
Key Generation Results
43
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
Encryption Results
44
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
Decryption Results
45
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
SYNTHESIS REPORTS
KEY INPUT
RTL SCHEMATIC
GATE LEVEL
SYNTHESIS REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -p
46
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
xc2s15-cs144-6 keyregngc keyregngd
Reading NGO file cxilinxbinvasukeyregngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
Writing NGD file keyregngd
Writing NGDBUILD log file keyregbld
Release 61i Map G23Xilinx Mapping Report File for Design keyreg
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
MAPPING REPORT
Relekeyreg
Design Information------------------Command Line CXilinxbinntmapexe -intstyle ise -p xc2s15-cs144-6 -cm
47
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
area -pr b -k 4 -c 100 -tx off -o keyreg_mapncd keyregngd keyregpcf Target Device x2s15Target Package cs144Target Speed -6Mapper Version spartan2 -- $Revision 116 $ase 61i Map G23Xilinx Mapping Report File for Design Mapped Date Mon Mar 30 124243 2009
Design Summary--------------Number of errors 0Number of warnings 0Logic UtilizationLogic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
Placing amp Routing Report
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization
48
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
Logic Distribution Number of Slices containing only related logic 0 out of 0 0 Number of Slices containing unrelated logic 0 out of 0 0 See NOTES below for an explanation of the effects of unrelated logic Number of bonded IOBs 194 out of 86 225 (OVERMAPPED) IOB Flip Flops 96 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 768Additional JTAG gate count for IOBs 9360Peak Memory Usage 57 MB
KEY REGISTER
Release 61i - xst G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved--gt Parameter TMPDIR set to __projnavCPU 000 067 s | Elapsed 000 100 s --gt Parameter xsthdpdir set to xstCPU 000 067 s | Elapsed 000 100 s --gt Reading design keyregprj
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) HDL Analysis 4) HDL Synthesis 41) HDL Synthesis Report 5) Advanced HDL Synthesis 6) Low Level Synthesis 7) Final Report 71) Device utilization summary 72) TIMING REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keyregprjInput Format mixed
49
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
Ignore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name keyregOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name keyregAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output Yes
50
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
Write Timing Constrain NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keyreglsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture keyreg of Entity keyreg is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltkeyreggt (Architecture ltkeyreggt)Entity ltkeyreggt analyzed Unit ltkeyreggt generated
========================================================================= HDL Synthesis =========================================================================
Synthesizing Unit ltkeyreggt
51
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
Related source file is cxilinxbinvasuKeyRegvhd Found 96-bit register for signal ltDreggt Summary
inferred 96 D-type flip-flop(s)Unit ltkeyreggt synthesized
=========================================================================HDL Synthesis Report
Macro Statistics Registers 1 96-bit register 1
=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltkeyreggt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block keyreg actual ratio is 28
========================================================================= Final Report =========================================================================Final ResultsRTL Top Level Output File Name keyregngrTop Level Output File Name keyreg
52
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 195
Macro Statistics Registers 1 96-bit register 1
Cell Usage BELS 1 LUT1 1 FlipFlopsLatches 96 FDCE 96 Clock Buffers 1 BUFGP 1 IO Buffers 194 IBUF 98 OBUF 96=========================================================================
Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 55 out of 192 28 Number of Slice Flip Flops 96 out of 384 25 Number of 4 input LUTs 1 out of 384 0 Number of bonded IOBs 194 out of 90 215 () Number of GCLKs 1 out of 4 25
WARNINGXst1336 - () More than 100 of Device resources are used
=========================================================================TIMING REPORT
NOTE THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
53
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
GENERATED AFTER PLACE-and-ROUTE
Clock Information-----------------------------------------------------+------------------------+-------+Clock Signal | Clock buffer(FF name) | Load |-----------------------------------+------------------------+-------+Clk | BUFGP | 96 |-----------------------------------+------------------------+-------+
Timing Summary---------------Speed Grade -6
Minimum period No path found Minimum input arrival time before clock 7962ns Maximum output required time after clock 6788ns Maximum combinational path delay No path found
Timing Detail--------------All values displayed in nanoseconds (ns)
-------------------------------------------------------------------------Timing constraint Default OFFSET IN BEFORE for Clock ClkOffset 7962ns (Levels of Logic = 1) Source KeyEna (PAD) Destination Dreg_95 (FF) Destination Clock Clk rising
Data Path KeyEna to Dreg_95 Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ IBUFI-gtO 96 0776 6300 KeyEna_IBUF (KeyEna_IBUF) FDCECE 0886 Dreg_0 ---------------------------------------- Total 7962ns (1662ns logic 6300ns route) (209 logic 791 route)-------------------------------------------------------------------------
Timing constraint Default OFFSET OUT AFTER for Clock ClkOffset 6788ns (Levels of Logic = 1) Source Dreg_95 (FF) Destination KeyOlt95gt (PAD) Source Clock Clk rising
54
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
Data Path Dreg_95 to KeyOlt95gt Gate Net Cellin-gtout fanout Delay Delay Logical Name (Net Name) ---------------------------------------- ------------ FDCEC-gtQ 1 1085 1035 Dreg_95 (Dreg_95) OBUFI-gtO 4668 KeyO_95_OBUF (KeyOlt95gt) ---------------------------------------- Total 6788ns (5753ns logic 1035ns route) (848 logic 152 route)
=========================================================================CPU 359 464 s | Elapsed 400 500 s --gt
Total memory usage is 54400 kilobytes
SBOX
55
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
RTL SCHEMATIC
GATE LEVEL
========================================================================= Synthesis Options Summary =========================================================================---- Source Parameters
56
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
Input File Name sbox8x3prjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name sbox8x3Output Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name sbox8x3Automatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NO
57
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
Global Optimization AllClockNetsRTL Output YesWrite Timing Constraint NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso sbox8x3lsoRead Cores YEScross_clock_analysi NOverilog2001 YESOptimize Instantiated Primitives NO
=========================================================================
WARNINGXst1885 - LSO file is empty default list of libraries is used
========================================================================= HDL Compilation =========================================================================Compiling vhdl file cxilinxbinvasuKeyRegvhd in Library workArchitecture sbox8x3 of Entity sbox8x3 is up to date
========================================================================= HDL Analysis =========================================================================Analyzing Entity ltsbox8x3gt (Architecture ltsbox8x3gt)INFOXst1561 - cxilinxbinvasuKeyRegvhd line 29 Mux is complete default of case is discardedEntity ltsbox8x3gt analyzed Unit ltsbox8x3gt generated
========================================================================= HDL Synthesis =========================================================================
58
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
Synthesizing Unit ltsbox8x3gt Related source file is cxilinxbinvasuKeyRegvhdUnit ltsbox8x3gt synthesized
========================================================================HDL Synthesis Report
Found no macro=========================================================================
========================================================================= Advanced HDL Synthesis =========================================================================
========================================================================= Low Level Synthesis =========================================================================
Optimizing unit ltsbox8x3gt Loading device for application Xst from file 2s15nph in environment CXilinx
Mapping all equationsBuilding and optimizing final netlist Found area constraint ratio of 100 (+ 5) on block sbox8x3 actual ratio is 1
========================================================================= Final Report =========================================================================
59
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
Final ResultsRTL Top Level Output File Name sbox8x3ngrTop Level Output File Name sbox8x3Output Format NGCOptimization Goal SpeedKeep Hierarchy NO
Design Statistics IOs 7
Cell Usage BELS 3 LUT4 3 IO Buffers 7 IBUF 4 OBUF 3===============================================================Device utilization summary---------------------------
Selected Device 2s15cs144-6
Number of Slices 2 out of 192 1 Number of 4 input LUTs 3 out of 384 0 Number of bonded IOBs 7 out of 90 7
Total memory usage is 53376 kilobytes
TRANSLATION REPORT
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 37996 kilobytes
60
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
FLOOR PLANNING
61
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Number of 4 input LUTs 3 out of 384 1Logic Distribution Number of occupied Slices 2 out of 192 1 Number of Slices containing only related logic 2 out of 2 100 Number of Slices containing unrelated logic 0 out of 2 0 See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs 3 out of 384 1 Number of bonded IOBs 7 out of 86 8
Total equivalent gate count for design 18Additional JTAG gate count for IOBs 336Peak Memory Usage 56 MB
Maping Report
Device utilization summary
Number of External IOBs 7 out of 86 8 Number of LOCed External IOBs 0 out of 7 0
Number of SLICEs 2 out of 192 1
The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is 0
The AVERAGE CONNECTION DELAY for this design is 0871 The MAXIMUM PIN DELAY IS 1512
The AVERAGE CONNECTION DELAY on the 10 WORST NETS is 0707
62
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
KEY GENERATION
RTL SCHEMATIC
63
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
GATE LEVEL
64
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name keygenblockprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory ---- Target ParametersOutput File Name keygenblockOutput Format NGCTarget Device xc2s15-6-cs144 ---- Source OptionsTop Module Name keygenblockAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
65
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso keygenblocklsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
TRANSLATION REPORT
Release 61i - ngdbuild G23Copyright (c) 1995-2003 Xilinx Inc All rights reserved
Command Line ngdbuild -intstyle ise -dd cxilinxbinvasu_ngo -i -pxc2s15-cs144-6 keygenblockngc keygenblockngd
Reading NGO file cxilinxbinvasukeygenblockngc Reading component libraries for design expansion
Checking timing specifications Checking expanded design
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
Writing NGD file keygenblockngd
Writing NGDBUILD log file keygenblockbld
66
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
MAPPING REPORT
Design Summary--------------Number of errors 0Number of warnings 0Logic Utilization Total Number Slice Registers 419 out of 384 109 (OVERMAPPED) Number used as Flip Flops 415 Number used as Latches 4 Number of 4 input LUTs 1016 out of 384 264 (OVERMAPPED)Logic Distribution Number of occupied Slices 665 out of 192 346(OVERMAPPED) Number of Slices containing only related logic 648 out of 665 97 Number of Slices containing unrelated logic 17 out of 665 2 See NOTES below for an explanation of the effects of unrelated logicTotal Number 4 input LUTs 1066 out of 384 277 (OVERMAPPED) Number used as logic 1016 Number used as a route-thru 50 Number of bonded IOBs 1060 out of 86 1232 (OVERMAPPED) IOB Flip Flops 960 Number of GCLKs 1 out of 4 25 Number of GCLKIOBs 1 out of 4 25
Total equivalent gate count for design 17572Additional JTAG gate count for IOBs 50928Peak Memory Usage 72 MB
ENCRYPTION
67
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
RTL SCHEMATIC
68
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
GATE LEVEL
SYNTHESIS REPORT
69
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name encryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name encryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name encryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
70
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso encryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 42092 kilobytes
71
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
DECRYPTION
GATE LEVEL
72
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
SYNTHESIS REPORT
========================================================================= Synthesis Options Summary =========================================================================---- Source ParametersInput File Name decryptionprjInput Format mixedIgnore Synthesis Constraint File NOVerilog Include Directory
---- Target ParametersOutput File Name decryptionOutput Format NGCTarget Device xc2s15-6-cs144
---- Source OptionsTop Module Name decryptionAutomatic FSM Extraction YESFSM Encoding Algorithm AutoFSM Style lutRAM Extraction YesRAM Style AutoROM Extraction YesROM Style AutoMux Extraction YESMux Style AutoDecoder Extraction YESPriority Encoder Extraction YESShift Register Extraction YESLogical Shifter Extraction YESXOR Collapsing YESResource Sharing YESMultiplier Style lutAutomatic Register Balancing No
---- Target OptionsAdd IO Buffers YESGlobal Maximum Fanout 100Add Generic Clock Buffer(BUFG) 4Register Duplication YESEquivalent register Removal YESSlice Packing YESPack IO Registers into IOBs auto
73
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
---- General OptionsOptimization Goal SpeedOptimization Effort 1Keep Hierarchy NOGlobal Optimization AllClockNetsRTL Output YesWrite Timing Constraints NOHierarchy Separator _Bus Delimiter ltgtCase Specifier maintainSlice Utilization Ratio 100Slice Utilization Ratio Delta 5
---- Other Optionslso decryptionlsoRead Cores YEScross_clock_analysis NOverilog2001 YESOptimize Instantiated Primitives NO
Translation Report
NGDBUILD Design Results Summary Number of errors 0 Number of warnings 0
Total memory usage is 45122 kilobytes
74
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
75
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
ADVANTAGES
SEA is parametric in text key and processor size
It is a low cost encryption routine targeted for the processors with limited instruction set
It is a small encryption routine targeted to any given processor the security of the cipher being adapted in function of its key size
It is also used in applications where the same constrained device has to perform both encryption and decryption
APPLICATIONS
This is a low-cost encryption routine basically designed for processors with a limited instruction set
In wireless communication and mobile computing and networking systems
For the encryption of JPEG2000 images
In scalable video coding
In sensor networks and RFIDrsquos
76
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
CONCLUSION
SEAnb is a scalable encryption algorithm targeted for small embedded applications The plaintext size key size and processor (or word) size are parameters of the design The structure of SEAnb allows a fast evaluation of the cipher efficiency on any RISC machine Its typical performances (encryption + decryption) for present key sizes and processors (eg 128-bit key 1 Mhz 8-bit RISC) are in the range of an encryptiondecryption in a few milliseconds using a few hundreds bytes of ROM One additional advantage of the design is its extreme simplicity Based on the pseudo code provided in this paper it is expected that the implementation of the cipher in assembly can be done within a few hours We note finally that the design criteria of SEAnb do not make it a conservative algorithm by nature Further cryptanalysis efforts are consequently required
This paper presented FPGA implementations of a scalable encryption algorithm for various sets of parameters The presented parametric architecture allows keeping the flexibility of the algorithm by takingadvantage of generic VHDL coding It executes one round per clock cycle computes the round and the key round in parallel and supports both encryption and decryption at a minimal cost Compared to other recent block ciphers SEA exhibits a very small area utilization that comes at the cost of a reduced throughput Consequently it can be considered as an interesting alternative for constrained environments Scopes for further research include low power ASIC implementations purposed for RFIDs as well as further cryptanalysis efforts and security evaluations
77
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78
Bibliography
Reference books
Basic VLSI design 3rd Edition Douglas APucknell Kamran Eshraghian
A VHDL Primer J Bhaskar
Digital Design Morris Mano
Data and Computer Communications William Stalling
Computer Networks Andrew S Tannenbaum
Network Cryptology William Stalling
Reference Websites
IEEE Transactions wwwwikipediacom
wwwwebopediacom
78