6
Cryptography: Circuits and Systems Approach O. Koufopavlou, G. Selimis and N. Sklavos P. Kitsos VLSI Design Lab, Electrical and Computer Engineering Department, University of Patras, Rio 26500, Patras, Greece Email:[email protected] Abstract- Today more and more sensitive data is stored digitally. Bank accounts, medical records and personal emails are some categories that data must keep secure. The science of cryptography tries to encounter the lack of security. Data confidentiality, authentication, non-reputation and data integrity are some of the main parts of cryptography. The evolution of cryptography drove in very complex cryptographic models which they could not be implemented before some years. The use of systems with increasing complexity, which usually are more secure, has as result low throughput rate and more energy consumption. However the evolution of cipher has no practical impact, if it has only theoretical background. Every encryption algorithm should exploit as much as possible the conditions of the specific system without omitting the physical, area and timing limitations. This fact requires new ways in design architectures for secure and reliable Crypto Systems. A main issue in the design of Crypto systems is the reduction of power consumption, especially for portable systems as smart cards. Keywords-cryptography, secret key algorithms, public key, low power, VLSI, hardware I. INTRODUCTION Cryptography is one of the basic countermeasures against system attacks. The fundamental objective of cryptography is to enable two people, usually referred to as Alice and Bob, to communicate over an insecure channel in such a way that an opponent, Oscar, cannot understand what is being said. The point where it makes the science of cryptography a complicated inquiring subject is that a different approach is required for each application. Designers try to discover a golden ratio between a number of system parameters, without become concessions on safety issues. Many times the results of designs lead to inefficient systems. As the requirements for wireless-portable systems increase continuously the difficulties in hardware design are getting more complicated. Wireless Communications have become a very attractive and interesting sector for the provision of electronic services. Mobile networks are available almost anytime, anywhere and the user’s acceptance of wireless hand-held devices is high. The services, are offered, are strongly increasing due to the different large range of the users’ needs. In our days, the wireless communication protocols have specified security layers, which support security with high level strength. These wireless protocols security layers use encryption algorithms, which in many cases have been proved unsuitable and outdated for hardware implementations. The evolution of portable devices requires a completed security system which satisfies the demands of fast and secure transactions. Unfortunately, software-based approaches, special for public-key cryptography, lead to slow implementations that are very inefficient. Then the existence of supplementary hardware is essential. In addition, next generation portable devices have enclosed wireless communication protocols and they can operate only with extremely low power conditions. Then power management is demanded to support cryptographic capabilities. This paper is organized as follows: in section two the main cryptographic categories and their implementation characteristics are described briefly. In the next section the low power designs approaches are presented and in section 4 known low power implementations are presented. New approaches in design are given in section 5. Finally, conclusions and observations are discussed in the last section. II. CRYPTOGRAPHIC ALGORITHMS Last years many cryptographic implementations have been proposed. They are implemented in software or in hardware [2], [3], [4]. The choice of the implementation depends on the application and on the algorithm that is to implement. Hardware implementations are more expensive but they perform better in terms of throughput and power. A. Secret key Algorithms DES, Triple DES and last years AES are commonly found in many cryptographic systems. The software implementations are 10-100 times slower that the hardware. B. Public-key encryption algorithms They are based on modular multiplication-for example, RSA, Diffie-Helman (DH), or the Digital Signature Standard (DSS). RSA signatures and verifications are supported with a choice of 512, 768, or 1024 bit key lengths. The algorithms typically use the Chinese Remainder Theorem (CRT) in order to speed up the processing. The use of public-key algorithms based o elliptic curves 918 0-7803-9314-7/05/$20.00©2005 IEEE 2005 IEEE International Symposium on Signal Processing and Information Technology

Cryptography: circuits and systems approach

Embed Size (px)

Citation preview

Cryptography: Circuits and Systems Approach

O. Koufopavlou, G. Selimis and N. Sklavos P. Kitsos VLSI Design Lab, Electrical and Computer Engineering Department,

University of Patras, Rio 26500, Patras, Greece Email:[email protected]

Abstract- Today more and more sensitive data is stored digitally. Bank accounts, medical records and personal emails are some categories that data must keep secure. The science of cryptography tries to encounter the lack of security. Data confidentiality, authentication, non-reputation and data integrity are some of the main parts of cryptography. The evolution of cryptography drove in very complex cryptographic models which they could not be implemented before some years. The use of systems with increasing complexity, which usually are more secure, has as result low throughput rate and more energy consumption. However the evolution of cipher has no practical impact, if it has only theoretical background. Every encryption algorithm should exploit as much as possible the conditions of the specific system without omitting the physical, area and timing limitations. This fact requires new ways in design architectures for secure and reliable Crypto Systems. A main issue in the design of Crypto systems is the reduction of power consumption, especially for portable systems as smart cards.

Keywords-cryptography, secret key algorithms, public key, low power, VLSI, hardware

I. INTRODUCTION

Cryptography is one of the basic countermeasures against system attacks. The fundamental objective of cryptography is to enable two people, usually referred to as Alice and Bob, to communicate over an insecure channel in such a way that an opponent, Oscar, cannot understand what is being said. The point where it makes the science of cryptography a complicated inquiring subject is that a different approach is required for each application. Designers try to discover a golden ratio between a number of system parameters, without become concessions on safety issues. Many times the results of designs lead to inefficient systems. As the requirements for wireless-portable systems increase continuously the difficulties in hardware design are getting more complicated.

Wireless Communications have become a very attractive and interesting sector for the provision of electronic services. Mobile networks are available almost anytime, anywhere and the user’s acceptance of wireless hand-held devices is high. The services, are offered, are strongly increasing due to the different large range of the users’ needs.

In our days, the wireless communication protocols have specified security layers, which support security with high

level strength. These wireless protocols security layers use encryption algorithms, which in many cases have been proved unsuitable and outdated for hardware implementations.

The evolution of portable devices requires a completed security system which satisfies the demands of fast and secure transactions. Unfortunately, software-based approaches, special for public-key cryptography, lead to slow implementations that are very inefficient. Then the existence of supplementary hardware is essential. In addition, next generation portable devices have enclosed wireless communication protocols and they can operate only with extremely low power conditions. Then power management is demanded to support cryptographic capabilities.

This paper is organized as follows: in section two the main cryptographic categories and their implementation characteristics are described briefly. In the next section the low power designs approaches are presented and in section 4 known low power implementations are presented. New approaches in design are given in section 5. Finally, conclusions and observations are discussed in the last section.

II. CRYPTOGRAPHIC ALGORITHMS

Last years many cryptographic implementations have been proposed. They are implemented in software or in hardware [2], [3], [4]. The choice of the implementation depends on the application and on the algorithm that is to implement. Hardware implementations are more expensive but they perform better in terms of throughput and power.

A. Secret key Algorithms

DES, Triple DES and last years AES are commonly found in many cryptographic systems. The software implementations are 10-100 times slower that the hardware.

B. Public-key encryption algorithms They are based on modular multiplication-for example,

RSA, Diffie-Helman (DH), or the Digital Signature Standard (DSS). RSA signatures and verifications are supported with a choice of 512, 768, or 1024 bit key lengths. The algorithms typically use the Chinese Remainder Theorem (CRT) in order to speed up the processing.

The use of public-key algorithms based o elliptic curves

9180-7803-9314-7/05/$20.00©2005 IEEE

2005 IEEE International Symposium on Signal Processingand Information Technology

is quite novel and not yet extensively used. Two main types of commonly used curves will determine the need for computing power: curves over GF(p) (a Galois Field over the prime p) requiring resources similar to those for standard public-key cryptography; and curves over GF(2^n) (a GF over polynomials of size n), computations don’t require carries (addition/subtraction is an XOR, and multiplication is done without internal carries).

C. Hashing algorithms They commonly found include SHA-1 and MD-5. The

main role of a cryptographic hash function is in the provision of digital signatures. Since hash functions are generally faster than digital signature algorithms, it is typical to compute the digital signature to some entity by computing the signature on the entity’s hash value, which is small compared to the document itself.

D. Random numbers They always required for cryptographic procedures.

Smart cards require random numbers for: 1. Key generation to authenticate the card and terminal 2. Creating padding bytes and blinding values for

encryption, as initial values for transmission sequence counters; and

3. Implementation of algorithmic counter-measures against side-channel attacks.

III. LOW POWER DESIGN APPROACHES IN CRYPTOGRAPHY

Cryptographic algorithms are commonly implemented as software running on a CPU, microcontroller, or DSP, as a fixed function ASIC, or within a programmable logic device (FPGA). The pros and cons of these various methods are briefly discussed below.

A. ASIC Approach Low-power ASIC design appears to be more heavily

researched than high performance ASIC design [1] – perhaps due to low-power techniques being used in many other areas besides cryptographic algorithm design. Typical methodologies include:

Clock gating: The clock tree in a synchronously designed device can consume a large fraction of a chip’s overall energy usage; isolating unused logic sections of the chip on a clock-by-clock basis can significantly reduce this problem.

Asynchronous logic design: Asynchronous design techniques (based on ‘request’/’acknowledge’ handshaking signals rather than clock edges) tend to produce lower power designs, and occasionally provide additional throughput as well.

Variable voltage logic supplies: While dropping an IC’s power supply by a half will tend to make it twice as slow, the power consumption will typically be quartered. In systems that have variable encryption rate needs, this fact can be exploited to very significantly reduce overall power

requirements. Glitch reduction: Since digital gates only dissipate

(significant) power when switching, ‘stage to stage’ coupling (e.g., pipeline stage interconnects) should be designed to produce as few spurious glitches as possible; this can often be accomplished with some redundant gates. This solution is somewhat similar to ‘clock gating,’ but on a smaller scale.

Parallelism and pipelining: Some algorithms are amenable to having their various constituents operations computed in parallel or in a fully-pipelined order. This is often a ‘win’ in that throughput is increased and – at least initially, until control circuitry complexity becomes large – overall power consumption drops. Unfortunately, pipelining doesn’t work with ‘chaining’ variants of encryption algorithms (e.g., for DES, only the electronic codebook – ECB – methodology works with pipelining).

Functional Optimizations: It is tried to optimise system architectures in terms of power.

B. FPGA Approach Only a very small subset of FPGAs is suitable for use in,

e.g., battery powered devices. Nevertheless, low-power design for FPGAs would encompass the same criteria as with ASICs so long as it is ‘allowed’ in the chip. I.e., no commercial FPGA yet allows supports large circuit asynchronous design techniques; clock gating cannot be performed with a granularity of one clock cycle, etc.

C. Software Approach There is an attempt to improve the software

implementations and in this road the processor have to include instructions as rotating the content of register, include cryptographic functions as bit permutation, expansions and substitutions, enabled by a configuration register that determines how to place the bits of one register and fast memory accesses require. Low power techniques are applied in all these instructions

IV. KNOWN LOW POWER IMPLEMENTATIONS IN CRYPTOGRAPHY

A. Asyncronous VLSI implementations Asynchronous VLSI implementations of cryptographic

algorithms IDEA and DES are presented in the papers [5] and [6]. In the paper [5] an asynchronous VLSI implementation of the International Data Encryption Algorithm (IDEA) is presented. In order to evaluate the asynchronous design a synchronous version of the algorithm was also designed. VHDL hardware description language was used in order to describe the algorithm. By using Synopsys commercial available tools the VHDL code was synthesized. After placing and routing both designs were fabricated with 0.6 um CMOS technology. With a system clock of up to 8 MHz and a power supply of 5 Volt the two chips were tested and evaluated comparing with the

919

software implementation of the IDEA algorithm. This new approach proves efficiently the lowest power consumption of the asynchronous implementation compared to the existing synchronous. Therefore the asynchronous chip performs efficiently in WEP (Wireless Encryption Protocols) and high speed networks.

In the paper [6] the authors have designed an asynchronous Data Encryption Standard (DES) data encryption chip. There are many Cryptographic Applications that demand both high speed and low power. In order to meet these requirements the asynchronous hardware design adopted.

B. Variable Voltage Supplies In the paper [7] the overall architecture of the scalable

encryption processor is presented as it is shown in figure 1.

Figure 1. The Scalable Encryption Processor

The processor consists of two main functional blocks: a variable security encryption engine, and a variable output DC/DC converter. The encryption engine utilizes an algorithm known as the Quadratic Residue Generator to generate a cryptographically- secure pseudorandom keystream sequence that is then XORed with a serial data stream to form the encrypted data stream. The variable output DC/DC converter allows us to utilize variable supply techniques which dynamically adjust the supply voltage as the amount of computation varies in order to minimize the energy dissipation. The two blocks are coupled through the use of an external look up table (LUT) that translates the current throughput and security requirements (as specified by the Width input) into a digital word representing the desired supply voltage. The embedded DC/DC converter then translates this digital word into a pulse-width modulated (PWM) signal that is filtered through an external LC filter to create the QRG’s supply voltage. The voltage is also sampled by the converter in order to perform closed loop voltage regulation.

C. Optimised Architectures in Terms of Power In the paper [8] the authors have developed a low-power

S-Box architecture: a multi-stage PPRM (Positive Polarity Reed-Muller form) architecture for compact S-Boxes. It is an improvement of the composite field S-Box, and in this S-Box, the gates are arranged so that: (i) the signal arrival

times at the gates are as close as possible if the depths of the gates from the primary inputs are the same, to avoid generating dynamic hazards, and (ii) the hazard-transparent XOR gates are located after the other gates that may block the hazards, to avoid the propagation of dynamic hazards. The multi-stage PPRM S-Box archives the lowest power consumption of 29 μW at 10 MHz using 0.13 μm 1.5 V CMOS technology, and its circuit size is still much smaller than conventional S-Box implementations whose power consumptions are around 140 μW.

V. NEW APPROACHES IN CRYPTOGRAPHY

A. Basic operations in Cryptography for Cryptographic Processors

Table 1 [10 shows that eight types of operations are used in block ciphers. They are arithmetic operations, logical operations, multiplication, load, extract, concatenation, rotation, and bit permutation.

The arithmetic operations include addition and subtraction. The logical operations includes AND, OR, NOT, or XOR. Both types of operations are normally performed with the arithmetic and logic unit (ALU) in a processor. Multiplication is listed separately because it is more expensive to implement than addition or subtraction, and many processors do not even support it in hardware.

The load operation fetches data from memory. When used in the block cipher, it loads subkeys or performs the substitutions defined in S-Box operations.

Extract is an operation that pulls out a consecutive block of bits from a register. It is used when only a subset of bits in a register needs to be processed. An extract operation can be represented as: Rd = Extract (Rs, s, k), where Rs is the source operand, Rd is the result. k consecutive bits in Rs starting from position s are extracted into Rd. Bit s in Rs becomes the least significant bit in Rd. Other bits in Rd that are not extracted from Rs are set to 0. Let Rs[i] denote bit i in an n-bit register Rs and bit 0 be the least significant bit in Rs.

Concatenation combines the bits in two registers. It can be represented with Rd = Concat(Rs1, n1, Rs2, n2), where Rs1 and Rs2 are two registers to be combined; n1 and n2 specify how many consecutive bits from the lower end are valid in Rs1 and Rs2, respectively. Only the valid bits in Rs1 and Rs2 are placed in the result. If the sum of n1 and n2 exceeds the word size n, only the lower n bits are placed in Rd.

The operations in block ciphers are performed on data blocks of different sizes. For instance, S-Boxes are performed normally on small blocks while permutations are on larger blocks. The extract and concatenation operation are used to divide data blocks into smaller blocks and to combine small blocks into larger blocks, respectively.

The rotation operation moves all bits in the first operand to the left or right by an amount specified by the second operand. The bits moved beyond the word boundary are

920

placed back at the other end. Rotation is a restricted form of bit permutation [9].

Table 2 [11] shows the most common operations that are used in public key cryptography. Diffie –helman, El- Gamal, Digital Signature Algorithm (DSA), RSA and RSA signature are common public key protocols that are used in public key cryptography. Diffie-helman is used in TLS/SSL and SSH. ElGamal is used in DDS. DDS is the US federal standard for digital signatures. RSA is used in many security standards/protocols: S/MIME, IPSec, TLS/SSL,S/WAN, PKCS, IEEE P1363, and WAP. This protocols are based on the hard problems of Discrete Logarithm or Integer Factoring. The dominant operation that is being used is Integer Multiplication.

Elliptic Curve Cryptography offers equal security with the classic public algorithms but with lower complexity. Elliptic Curves are based on the problem of Elliptic-curve Discrete logarithm and the dominant operation is the polynomial multiplication. The Elliptic Curves can support public key protocols as Diffie-Helman, El-Gamal and DSA. RSA does not have an elliptic curve counterpart.

B. Application Specific Instruction Set Processors Energy-efficiency and flexibility are competing goals for

a hardware implementation.. The so-called application-specific instruction set processors (ASIPs) are able to fill the energy-flexibility gap between dedicated hardware and programmable DSPs for a given application.. ASIPs take advantage of user-defined instructions and a user-defined data path optimized for a certain target application. The result of this optimization is a higher

computational performance than general purpose approaches and a better energy-efficiency. This is one reason for the current industrial trend to use more and more customized processors. This trend can be explained from the perspective of both hardware and software designers. From the hardware designers’ point of view, ASIPs considerably facilitate the implementation of tasks that require a high degree of flexibility. This flexibility is needed to track evolving standards and for implementations that are prone to late design changes. Furthermore, the design time is decreased especially due to the high reuse factor of software based implementations. This fact is particularly important for redesigns with the goal to implement distinguishing features in an existing product for competitive reasons. Finally, the ASIP tasks can be modelled with high level languages, which provide a rapid and methodical approach to the design of resource shared hardware. Synthesizable ASIPs are technology-independent and can easily be integrated in any established semi-custom design flow together with other hardware blocks. From the software point of view, ASIPs offer a new degree of freedom for optimization: The design input for ASIPs is both the software implementation in form of a high level language description as well as the ASIP hardware architecture in form of a hardware description language. The new degree of freedom for software designers, the hardware architecture, removes the traditional upper bound in computational performance of conventional fixed processor architectures by introducing

.

Table 1 Basic operations in Block Ciphers[10 DES AES RC6 MARS Serpent Twofish Kasumi RC5 IDEA Arithmetic * * * * * Logical * * * * * * * * * Multiplication * * * Load * * * * * * * * Extract * * * * * * Concetenation * * * * * Rotation * * * * * * bit permutation * * *

Table 2 Basic Operation in Public Key Cryptography[11 Name Typical

key size Based on Dominant

operation Diffie-Hellman El-Gamal DSA

Discrete logarithm

RSA RSA Signature

1024 Integer

factoring

Integer

multiplication

eDH eEl-Gamal eDSA

163

Elliptic-Curve Discrete logarithm

Polynomial multiplication

921

scalability of processor resources. Therefore, oversized and energy wasting fixed processor cores can be replaced by energy-efficient ASIPs to meet the performance constraints of an embedded application. ASIP design is a complex optimization problem requiring expertise in VLSI logic, computer architecture and application software design. The complexity of this design task makes it difficult for the designer to explore a large number of design alternatives in order to find an optimum implementation within a competitive design time. Furthermore, ASIP design for systems with tight energy constraints leads to additional complexity, which aggravates this issue.

There are not any implementations in technical literature for low power cryptographic ASIPs. There are some implementations of programmable processors in terms of performance. The most significant are the following:

CRYPTARRAY [12]: This paper proposes a reconfigurable and scalable architecture, called CRYPTARRAY, in which bus-based communication is replaced by distributed shared memory communication. CRYPTARRAY is organized as a chessboard in which the dark and light squares represent Processing Elements (PE) and shared memory blocks (SMB) respectively. The granularity and resource composition of the PEs is specifically designed to support the computing operations encountered in cryptographic algorithms. Because of the chessboard layout, the architecture can be reconfigured to allow computation to proceed as a pipelined wave in any direction. This organization offers a high computational density in terms of datapath resources and a large number of distributed storage resources that easily support a high degree of parallelism and pipelining. Experimental prototyping of a modest size array on FPGAs shows that this architecture can run at 80.9 MHz producing 26,968,716 computations per second in static reconfiguration mode and approximately 20,226,537 computations per second in dynamic reconfiguration mode. With moderate resource requirements, this array can produce up to 1.02 Gbps bandwidth

CryptoManiac [13]:. In this paper, it is introduced the CryptoManiac processor, a fast and flexible co-processor for cryptographic workloads. The design is extremely efficient; it is presented analysis of a 0.25um physical design that runs the standard Rijndael cipher algorithm 2.25 times faster than a 600MHz Alpha 21264 processor. Moreover, the implementation requires 1/100th the area and power in the same technology. The authors demonstrate that the performance of the design rivals a state-of-the-art dedicated hardware implementation of the 3DES (triple DES) algorithm, while retaining the flexibility to simultaneously support multiple cipher algorithms. Finally, it is defined a scalable system architecture that combines CryptoManiac processing elements to exploit inter-session and inter-packet parallelism available in many communication protocols. Using I/O traces and detailed timing simulation, it is showed that chip multiprocessor configurations can effectively service high

throughput applications including secure web and disk I/O processing.

CRYPTONITE [14]: Depending on the algorithm, the CRYPTONITE architecture presented within this work shows a raw crypto performance of 250 to 780 MBit/sec including round key calculation. It is remarked that usual software and also some hardware implementations do not include round key generation embedded into the ongoing encryption or decryption process but rather operate on precomputed round keys. The achieved throughput is not only in the range of comparable dedicated hardware solutions but even outperforms a number of these. In addition, the proposed architecture shows also superior performance within the sparse field of truly programmable solutions. It shall be remarked again that the algorithm implementations as realized on the CRYPTONITE architecture include round key generation. This very promising performance is resulting from the general architectural concept which was tailored towards the demands of typical cryptographic algorithms and a special memory access technique.

VI. CONCLUSIONS

The state of the art in cryptography is reconfigurable systems expressed with Application Specific Instruction Set Processors. These systems can support all the basic operations of cryptography as result main cryptographic encryption algorithms supported. Finally, cryptographic ASIPs can be optimized in terms of critical design parameters as low power consumption or performance.

REFERENCES [1] Chandrakasan A., Bowhill W. J., Fox F., Design of High –Performance

Microprocessor Circuits, IEEE Press 2001. [2] Dhem J. F. and Feyt N., Hardware and Software Symbiosis Helps Smart

Card Evolution, published in IEEE Micro 21(6): 14-25, 2001. [3] Ho-Won Kim, Yonge Choi and Moo-Seopkim, Design and

Implementation of a Crypto Processor and its Applications to Security System, in proc. of the 2002 International Technical Conference on Circuits/Systems, Computers and Communications.

[4] Naccache D. and Raohi D., Cryptographic Smart Cards in proc. of IEEE Micro Volume 16 , Issue 3, June 1996 Pages: 14 – 24.

[5] Sklavos N. and Koufopavlou O. Asynchronous Low Power VLSI Implementation of the International Data Encryption Algorithm, proceedings of 8th IEEE International Conference on Electronics, Circuits and Systems (ICECS'01), Malta, Vol. III, pp. 1425-1428, September 2-5, 2001.

[6] Pui-Lam Siu, Chiou-Sing Choy, Butas J., Chan C.F. A Low Power Asynchronous DES in proc. of 2001 Circuits and Systems Symposium (ISCAS 2001) 538-541 vol. 4.

[7] Goodman J., Chandrakasan A., Dancy A. P. Design and implementation of a scalable encryption processor with embedded variable DC/DC converter in proceedings of the 36th ACM/IEEE conference on Design automation June 1999.

[8] Morioka S., Satoh A. An Optimized S-Box Circuit Architecture for Low Power AES Design in proc. of CHES 2002, pp.172-186.

[9] J. B. Kam and G. I. Davida, A structured design of substitution-permutation encryption networks, IEEE Transactions on Computers, vol. 28, no. 10, pp. 747–753, 1979.

[10] Zhijie Jerry Shi, Bit Permutations Instructions: Architecture, Implementation, and Cryptographic Properties, phd thesis June 2004.

922

[11] . Murat Fiskiran and Ruby B. Lee, Performance Scaling of Cryptography Algorithms in Servers and Mobile Clients, Proceedings of the Workshop on Building Block Engine Architectures for Computer Networks (BEACON), October 2004.

[12] A. Ejnioui, M. Lomonaco, CRYPTARRAY: A Scalable and Reconfigurable Architecture for Cryptographic Applications” available as UCF Technical Report UCF-ECE-0406.

[13] Lisa Wu, Chris Weaver, Todd Austin CryptoManiac: a fast flexible architecture for secure communication, International Conference on ComputerArchitecture Proceedings of the 28th annual international symposium on Computer architecture, Göteborg, Sweden Pages: 110 - 119 , Year of Publication: 2001

[14] Dino Oliva, Rainer Buchty, Nevin Heintze, AES and the cryptonite cryptoprocessor,, Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems

923