Accelerating in Cryptosystem With Simultaneously Encryptions_Decryptions HW-Threads and Self-Dynamic Reconfiguration

8/22/2019 Accelerating in Cryptosystem With Simultaneously Encryptions_Decryptions HW-Threads and Self-Dynamic Reconfi…

http://slidepdf.com/reader/full/accelerating-in-cryptosystem-with-simultaneously-encryptionsdecryptions-hw-threads 1/7

Accelerating in Cryptosystem with Simultaneously Encryptions/Decryptions

HW-Threads and Self-Dynamic Reconfiguration

Trong-Tuan NGUYEN Van-Cuong NGUYEN Mai-Duyen Le NGUYEN Hung-Manh PHAM

Acronics Systems, Inc Faculty of Electronics & Telecommunications StudentMember,IEEE

Tuan. [email protected] DANANG Universi ty of Technology, Vietnam DUY TAN University, Vietnam [email protected]

[email protected] [email protected]

Abstract - The security information on the Ethernet

environment is always critical problems. With the growing up

processing of new microprocessors model, the hazards from

attacking and stealing the information become the very closely

threats. In this paper, we introduce a proposal the novel

Cryptosystem with Simultaneously Cipher Hardware Threads

and Partial Dynamically Reconfiguration (PDR) engine, with

new features, the novel Cryptosystem enhances the level

security, speeds up to process information, adaptive

applications that real time requirements, to reduce of the power

consumption and FPFA areas…

Keyword - Reconfigurable SoC, Multiple Cryptographic,

Simultaneously Multiple Hardware Threads, FPGA- PDR

I. INTRODUCTION

With the development of information technology, protecting sensitive

information via encryption is becoming more and more important to

daily life. In 2001, the National Institute of Standards and Technology

(NIST) selected the Rijndael algorithm as the Advanced Encryption

Standard (AES) [1], which replaced the Data Encryption Standard

(DES) [2]. Since then, AES has been widely used in various

applications, such as secured communication systems, high-

performance data base servers, digital video/audio recorders. But with

the growing up of semiconductor fabrication, the density of logic

gates and speed processing of ICs are been rapid. So the ability for

attacking and hacking the secured information may be come in reality.

Currently, the AES has been used for almost secured applications but

some researching, is vulnerable by a “related key” attack [3], [4].

Information of military/intelligent fields need requirements of high

level security and authentic. The encrypted image, encrypted stream

videos can be recovered from a few pixels by the methods of edge-

directed bicubic interpolation algorithm, bilateral- filter…that been

mentioned in [9], [10] .

While waiting the new Encryption algorithm that more

confidence than AES, the problem brings forward that needing a

mechanism of multiple cipher or cascade cipher to enhances the level

security of system that several researching workings have been

proposed as [10],[11],[12]. Also current many commercial products at

ASIC level as well as TI AM387xCortex-A8 or Maxim Crypto

MAXQ1850 integrated all the AES, 3DES,RSA,SHA… in their

systems. The researching workings as mentioned at above indicate out

the essential expecting as looking for the methods to enhance the

security level of information. This paper proposes the novel

architecture of Cryptosystem with targets: enhancement the level

security, accelerator processing that adaptive application with real-

time requirements, reduce of the power consumption and FPGA areas

in using…

This paper presents in 5 sections, with hence is Introduction. The

section II mentions the related researching about enhancement

security based current algorithms as AES, 3DES. In this section we

shall have an analyst the advantages and disadvantages those

researching so that we present a solution for new proposed

architecture of Cryptographic in section III. The section IV listed out

the performance evaluations with parameters of throughputs, FPGA

resource, power consumptions and effectively of this proposed vs the

current researching. The final section is for conclusion and future

workings.

II. RELATED WORKS

This section outlines the current researching multiple encryption, the

disadvantages for these solution and new proposal for filling up these

disadvantages. Also, in this section introduces two related issues that

support for new system: ReConOS and PDR engine

A. Current researching

To enhance the security level for information, a lot of

research-workings have proposed, or commercial products released on

market. There are many methods for achieving these targets as well as

mailto:[email protected]








embed several encryption algorithms as TI AM387xCortex-A8 or

Maxim Crypto MAXQ1850 or employs encryptions and authentic on

same of system [7], or Dynamically reconfiguration the Cipher-Key

module when detects the threat is coming [13], in that the system can

self-reconfigure the Cipher- Key module that collative the security

requirements and release out of FPGA platform with current module.

So advantages are listed out as the security level shall enhance a lot,

reduce the FPGA, latency of gate, and power consumption. But

following the architecture of IP Core, all Input/Output data, control

signals, status signals are mapped in register as ịn Fig.1. All

operations of IP Core are under CPU administration. By the CPU

architecture, the Instruction Pointer shall implementation

programming with sequence steps. So it is the cause that not speeds

up the processing of system.

Fig. 1. Hardware Accelerator

Besides that, some application as well as secure electronic transaction

[5], watermaking and identification for IP Core [6], [7] or Remote

configure bit-stream [8] have implemented several algorithms as

encryptions and authentic, several proposals concentrated the multiple

encryption method for enhancement security. In the [5] implements

the multiple encryption the all of secure electronic transactions by the

SHA and MD algorithm. In the [11] there is four famous

cryptographic algorithms to implement multilevel security or creating

the multi cipher text, that included: AES, DES, Rivest Shamir

Adleman and Ceaser or [12] proposed the method with double

encryptions for protecting the sensitive image. Almost researching

works have mentioned at above are presented at algorithm and

programming on CPU platform. The critical problems when deploy

these methods with multiple encryptions, or combination encryptions

and authentic on CPU platform, shall not satisfy the real-time

applications, for example with run-time video conference or the video

on the scene that captures from UAVs to Base. These applications are

very strict requirements that can not implement on CPU platform or

following single hardware threads that presented in [7]. The paper

addresses to accelerate of encryption processing that suitable real time

requirement, reduction for FPGA source and low power consumption

by the simultaneously hardware threads and Self-Dynamic Partial

Reconfiguration engine

B. ReconOS Architecture

ReconOS project, that have been developing by Computer

Engineering Group of University of Paderborn which supports both

software and hardware threads with a single unified programming

model. ReconOS is based on eCos and Linux OS that presented in

Fig.2 .

Fig.

2. The Linux operation [14]

ReconOS system architecture [14] is presented in Fig.3, that all

threads share the same physical memory space. Therefore, hardware

threads have direct access to any location in the system’s memory, or

memory mapped peripherals, if desired.

Fig.3. ReconOS system architecture

There are three sections that build up to the system:

. Delegate thread: Module that concerns the transparency of

thread-to-thread communication and synchronization, regardless of

the execution context (hardware or software) of the respective

communication partners. This enables the designer to easily replace,

for example, a software thread with a functionally equivalent

hardware thread, allowing for rapid design space exploration with

respect to the hardware/software partitioning.

.Hardware thread : consists of at least two VHDL processes:

the synchronization state machine and the actual user logic. The state

transitions in the synchronization state machine are always dependent

on control signals from the OSIF; only after a previous operating

system call “returns”, the next state can be reached. Thus, the

communication with the operating system is purely sequential, while



the processing of the hardware thread itself can be highly parallel. It is

up to the programmer to decompose a hardware thread into a

collection of user logic modules and one synchronization state

machine.

.Hardware/Software interfacing

This module has a mechanism for low-level synchronization and

communication between the hardware circuitry and the operating

system, that called OSIF (Operation System Interfacing).An overview

of the OSIF’s structure and its interfaces to the hardware thread, the

system buses and the FIFO cores is given in Fig.4 [14]

Fig.4. OSIF overview and interfaces

C. Self-Dynamic Partial Reconfiguration

A dynamically reconfigurable system allows to change parts of his

logic resources without disturbing the functioning of the remaining

circuit. This property permits the system to change its behavior

according to external events. The dynamic reconfiguration takes place

in Partially Reconfigurable Region (PRR) which can be partially

reconfigured independently [15]. Designing a dynamically

reconfigurable system always requires the declaration of PRRs. The

partial bit-streams of these zones are stored in an external memory

and they contain all the information about the positions and

functionalities of the considered PRRs. A dynamic reconfigurable

system usually has a central processor connecting to the internal

reconfiguration port (ICAP) and controlling the partial reconfiguration

process by downloading bit-streams onto this port. The ICAP and this

controller are implemented in a static zone (i.e. not reconfigured) of

the FPGA. Except for the dynamic zone which is being reconfigured,

the whole FPGA is still on operation during the entire reconfiguration

process. In one PRR, several Partially Reconfigurable Modules

(PRMs) could be loaded (one at a time). Each PRM is individually

designed and implemented using partial reconfiguration design tools

[15]. All PRMs for a given PRR must be pin compatible with each

other, i.e., have the same port definitions and entity names.

III. PROPOSAL FOR NOVEL ARCHITECTURE OF

CRYPTOSYSTEM WITH SIMULTANEOUSLY

MULTIPLE HARDWARE THREADS

ARCHITECTURE (SMHT) AND DPR ENGINE

In this section, we propose the Cryptosystem architecture with SMHT

and PDR engine. In that, the Cryptosystem has the mechanism for

cipher threads with AES_Core, can operator in simultaneously. Based

on the level security of application, the system creates multiple of

hw_tasks, that each hw_tasks for each AES_Thread. In novel of

Cryptosystem, there are three issues that need for consideration:

A. Hardware design

. OS Synchronization communication module

This RTL module synchronizes between threads with operating

system calls. The state transition in the synchronizations state machine

are always dependent on control signals from the OSIF; only after a

previous operating system call “returns”, the next state can be

reached. For initiation the new Encryption/Decryption transaction,

this module puts a query API reconos_mbox_get() to the OS for

asking the Semaphone ready for new threading. If the system is

available, it shall have an indication signal for initiation the

transaction. The state machine transfers the data in main memory in

Local_ram of hardware thread by API reconos_read_burst(), after the

transfer data is completed then the hardware thread enters the

AES_Initial state by assignation the “Start” signal to AES_Core

(user_logic) core. The “Start” is port map to AES_Core and this core

starts Encryption/Decryption operation. At this time, the OS

Synchronization module continues to query the “Done” signal from

AES_Core, and until the “Done” is asserted on high logic level, the

OS Synchronization releases the Semaphone flag for termination the

current transaction. The Fig. 6 indicates out the “Done” and “Start" of

control signals

Fig. 5. FSM of OS Synchronization Communication



Fig. 6. Control signals for user_logic core

. FSM of AES_Core (user_logic)

Data that been copied in Local Ram, is divided in two areas. In the

first area, included as: Cipher key, Information of Key Length :

128, 192 or 256, Command for encryption or decryption actions,

Length of Frame for cipher operation, first frame of

plaintext/cipher-text. In remain of areas: All remain

plaintext/cipher-text that needs for encryption/ decryption

operations. After receives the “Start” signal from OS

Synchronization module, AES-Core enters encrypt/decrypt

operations. RTL shall decode and implement following commands

inside Local_Ram. The

Cipher operation is quite transition inside AES Hardware Threads.

When the final data that needed to process completing, AES-Core

asserts “Done” signal to OS Synchronization module for

finalization an operation cycle. Fig. 7 presents the FSM of AES-

Core

.

Fig. 7. FSM of AES_Core ( user_logic)

In comparison with common IP cores as Fig.1, this new architecture

shall process all in Hardware_thread. Beside Data is accelerated in

processing parallel by RTL, there are two advantages that be

mentioned as well as :

- Since the CPU had transferred the data in to Local_Ram,

AES_Core gets data this memory for encryption/decryption. At this

time, AES_Core has been working in independent with system. So if

Local_Ram has a large capacity, it shall reduce the handshake with

CPU for getting new data, and increase the processing data in

Local_Ram.

- During the time that AES_Thread has been processing the data in

Local_Ram, the CPU has other free of time-slots for running other

tasks, so performance of system shall increase a lot. These advantages

shall have mention in the analyst and performance evaluations of

section IV

B. Software design

The operations for each hw_task are included as: Burst_Ram_Read,

Burst_Ram_Write and cryptographic processing with Tx.r, Tx.w and

Tx.p in correlative, that indicated in Fig. 8

Fig. 8. The operation of two HW_threads with shared memory

In double encryption/decryption tasks, we use a shared memory for

storage the results out of AES_0 thread and data in for processing of

AES_1 thread.

For guarantees the integrity of data in shared memory, avoids the

confliction of read/write operations in the same time, there have the

communication and synchronization mechanisms for shared memory

as well as: Creates space for shared memory, attaches it to the address

space of a process, lock thread before Writing/Reading for protect

data, operating for Writing/eading data in shared memory, detaches

and destroy shared memory from the current process. Both of two

flow chats, before enters routines of read/write, there has a command

for locking the accessing memory from other threads as detailed in

Fig. 9

Fig. 9. Flow-chats for Writing/Reading data

After transaction completes, the unlock command is released

C. Integrated system

The Cryptosystem is developed on FPGA platform. For accelerating

of system’s processing, some threads that get a lot of CPU’s resource,

shall been mapped in IP Cores. These IP Cores are stored in ACE

Flash as Partial Bit-stream format. The main program that under

PPC/MicroBlazer shall track the threads, attacking for outside

environment to system for hacking the data, under some events:



wrong IP/MAC, Power consumption or

Fig. 10. Flow chat of Software/Hardware Co-operation

electromagnetic radiation [16]. When determinates a threat is coming,

system shall upgrade the level security by double encryption, in which

creates a new hw_thread that indicated in Fig. 11 and dynamic

reconfiguration the AES_Thread with Partial Bitstream into

hw_thread separately. In Fig.12 indicates there are two AES_Threads

that are built for two hw_tasks by DPR engine

Fig. 11. Slots for hw_tasks reconfiguration with OSIF bus

IV. ANALYST AND PERFORMANCE EVALUATIONS:

A. RTL simulation and implementation results:

For verification the design correcting both of RTL code and testing

cycles for encryption/decryption, the materials for testing AES IP

Core shall be based on FIPS197 specification [1]

Fig. 12. Creating the slots for each partial bit –stream AES_Thread

Plaintext = 32 43 f6 a8 88 5a 30 8d 31 31 98 a2 e0 37 07 34

Cipher Key = 2b 7e 15 16 28 ae d2 a6 ab f7 15 88 09 cf 4f 3c

Ciphertext = 39 25 84 1d 02 dc 09 fb dc 11 85 97 19 6a 0b 32

We shall test the AES IP Core with 128-bit Key_in, the other

key_in shall have the same results. Flowing our proposal in this

paper, clock cycles for Key_expander_128 procedure shall be 11

cycles, that include the first clock of key_in_128 is read in and 10

clocks for expanded procedure, and AES Transformation

procedure for each block shall be also 11 cycles that indicated in

Fig.13.

Fig. 13. The AES processes within 11 Clock cycles

B. Analyst performance

With the tradition IP cores’ structure, all Data_in, Data_out,

Control signals, Status signals are mapped into registers in Fig. 1,

following the mechanism of embedded systems, there registers are

controlled by microprocessor.

Fig. 14. AES thread is processed in sequence

Fig. 15. Simultaneously AES- threads (Proposed model)

So when the system has requirement upgrade level security by double

encryptions, all data much be processed completely in the first AES

thread before for next threads. This method brings about large latency

and low throughput of system. On Fig.14 indicates the clarification

about above contents. So in novel structure, both of AES cores are in

active operation the same time. Once Core is getting the data for

beginning the new encryption/decryption while other one is been

processing. Thus, there is no waste-time slot during processing. For

each block of 16 bytes data that needs AES processing, there shall be

passed in three stages, in with named is AES cycles :

AES cycles/1 block = 1 read cycle + 11 processing cycles + 1 write

cycle



In the sequence model at Fig.14 : For streams that need double

encryption with each block, we shall spend cycles of time as : 1

Block = 2 x AES cycles

In the proposed model at Fig.15: Flowing the organization of

Hardware_Thread, the data is transferred in to Local_Ram and the

AES Core shall take these data for begin the new

encryption/decryption. The Local_Ram is setup with 4Kbyte capacity,

with 16 bytes of API Read_Burst, we shall have a full Local Ram with

253 blocks with named is Frame Data.

Data Frame = 253 blocks x 16 bytes = 4Kbytes

In the initial stage, and final stage of cryptography:

1st Frame (or final Frame) = 253 read cycles + (253 x 13) AES cycles

+ 253 write cycles

= 3759 cycles

So each 1 block = 2 Cycles + AES cycles = 15 cycles

In remain stages: There are two AES Cores that take part in

processing cryptographic

If we have mxBlocks that needing to process, the cycles for

completing of each method are calculated as at below:

In the sequence model :

Cycles for mBlocks = 2 x m x AES Processing (*)

In the proposed model :

Cycles for mBlocks = 1st AES Processing + m x AES Processing +

Final AES Processing

= (m + 2 ) x AES Processing (**)

The Fig. 15 shall details for above analysts

The synthesis results, parameters of throughput and performance for

this design are indicated on TABLE 1.

It is evident that our proposed approach has flexibility for

Cryptosystem. When needs low level security thus the Cryptosystem

operates as Stand-alone model with AES_Thread. And having a

requirement for the high level security then system shall dynamically

works as reconfiguration a bit-stream of second AES_Thread and

initials the simultaneously model. Following the TABLE 1, the new

proposal offers the throughput is very high than Sequence double

threads model [13], beside that the efficiently of Throughput per Slice

(TpS) is larger than Sequence model in two times

TABLE. I. THE DETAILED COMPARISON

BETWEEN THREE MODELS: STAND-ALONE,

SEQUENCED AND PROPOSED SYSTEM

V.

V. CONCLUSION AND FUTURE WORKS

In this paper, we have proposed a Cryptosystem system combining

Simultaneously engine and partial reconfiguration scheme to reduce

the required hardware resources and furthermore greatly improve the

bandwidth as well as the security of the implemented encryption

algorithm. We plan to implement a scrambler system to protect the

content of BRAM against attack. The scrambler module which will be

based on the unique device identifier and a pseudo-random number

generator (PRNG) to securely encrypt the key stored in the BRAM,

could furthermore enhance the robustness of the whole system. A

complete investigation of this complex system will also be carefully

studied.

Solutions Parameters Values

AES_ThreadStand-alone

(*)

Maximum Frequency(MHz)

81

Number of Clock 13

Throughput(Mbps) #798

Slice used 11,167

TpS(Mbps/Slice) 71.4

Power consumption(W)

4,472

Sequencedouble threads

model(**)


81

Number of Clock 26

Throughput(Mbps) # 389

Slice used 11,167

TpS(Mbps/slice) 35.7

Power consumption(W)

8.944

Simultaneouslythreadsmodel (***)


81

Number of Clock 15

Throughput(Mbps) # 1296

Slice used 22,304

TpS(Kbps/slice) 58.1

Power Consumption(W)

8.944

Platform Virtex6 – XC6VLX240-1



REFERENCES

[1] NIST,“Advancedencryptionstandard(AES),”Nov.20

,http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf

[2] NIST,“Dataencryptionstandard(DES),”Oct.1999,http

//csrc.nist.gov/publications/fips/fips46-3/fips46-3.pdf

[3] Alan Kaminsky, Michael Kurdziel, Stanisław Radziszowski, “An

Overview of Cryptanalysis Research for the Advanced Encryption

Standard”, http://www.cs.rit.edu/~spr/PUBL/aes.pdf

[4] Alex Biryukov, Dmitry Khovratovich “Related-key Cryptanalysis

of the Full AES-192 and AES-256” http://impic.org/papers/Aes-192-

256.pdf

[5] Himanshu Gupta,”role of multiple encryption in secure electronic

transaction”, International Journal of Network Security & Its

Applications (IJNSA), Vol.3, No.6, November 2011

[6] Daniel Ziener, Jurgen Teich,”Power Signature Watermarking of IP

Cores for FPGAs”, http://citeseerx.ist.psu.edu/viewdoc/summary?

doi=10.1.1.161.1509

[7] Thanh Tran, Pham Ngoc Nam, Tran Hoang Vu, Nguyen Van

Cuong, “A framework for secure remote updating of bitstream on

runtime reconfigurable embedded platforms”,

http://ieeexplore.ieee.org/xpl/articleDetails.jsp?

reload=true&arnumber=6315952

[8] K.-W. Hung W.-C. Siu, “Fast image interpolation using the

bilateral filter” Published in IET Image Processing

[9] Zhou Dengwen, ”An Edge-Directed Bicubic Interpolation

Algorithm” 2010 3rd International Congress on Image and Signal

Processing (CISP2010)

[10] Melek Önen, Refik Molva,”Secure Data Aggregation with

Multiple Encryption”, link.springer.com/chapter/10.1007%2F978-3-

540-69830-2_8

[11] Sairam Natarajan “A Novel Approach for Data Security

Enhancement Using Multi Level Encryption Scheme”, et al, / (IJCSIT)

International Journal of Computer Science and Information

Technologies, Vol. 2 (1) , 2011, 469-473

[12] Jayant Kushwaha,Bhola Nath RoySecure, “Image Data by

Double encryption” International Journal of Computer Applications

(0975 – 8887) Volume 5– No.10, August 2010

[13] Trong-Tuan NGUYEN, Van-Cuong NGUYEN, Hung-Manh

PHAM “Enhance the performance and security of SoC using pipeline

and dynamic partial reconfiguration” The 2012 International

Conference on Integrated Circuits and Devices in Vietnam (ICDV

2012)

Section #6

[14] Enno Lubbers,Marco Platzner, “Communication and

Synchronization in Multithreaded Reconfigurable Computing

Systems”, In Proceedings of the 8th International Conference on

Engineering of Reconfigurable Systems and Algorithms (ERSA), Las

Vegas, July 2008

[15] Xilinx Inc., “Partial Reconfiguration User Guide UG702(v14.1)” April 2012

[16] Daniel Ziener, Jürgen Teich “Power SignatureWatermarking of IP Cores for FPGAs”http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.161.1509

http://www.cs.rit.edu/~spr/PUBL/aes.pdf

http://impic.org/papers/Aes-192-256.pdf


http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.161.1509


http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=6315952


http://link.springer.com/search?facet-author=%22Melek+%C3%96nen%22



http://link.springer.com/search?facet-author=%22Refik+Molva%22



http://www.cs.rit.edu/~spr/PUBL/aes.pdf








http://link.springer.com/search?facet-author=%22Refik+Molva%22



Documents

Accelerating in Cryptosystem With Simultaneously Encryptions_Decryptions HW-Threads and Self-Dynamic Reconfiguration