Upload
tuanvnair
View
215
Download
0
Embed Size (px)
Citation preview
8/22/2019 Accelerating in Cryptosystem With Simultaneously Encryptions_Decryptions HW-Threads and Self-Dynamic Reconfi…
http://slidepdf.com/reader/full/accelerating-in-cryptosystem-with-simultaneously-encryptionsdecryptions-hw-threads 1/7
Accelerating in Cryptosystem with Simultaneously Encryptions/Decryptions
HW-Threads and Self-Dynamic Reconfiguration
Trong-Tuan NGUYEN Van-Cuong NGUYEN Mai-Duyen Le NGUYEN Hung-Manh PHAM
Acronics Systems, Inc Faculty of Electronics & Telecommunications StudentMember,IEEE
Tuan. [email protected] DANANG Universi ty of Technology, Vietnam DUY TAN University, Vietnam [email protected]
[email protected] [email protected]
Abstract - The security information on the Ethernet
environment is always critical problems. With the growing up
processing of new microprocessors model, the hazards from
attacking and stealing the information become the very closely
threats. In this paper, we introduce a proposal the novel
Cryptosystem with Simultaneously Cipher Hardware Threads
and Partial Dynamically Reconfiguration (PDR) engine, with
new features, the novel Cryptosystem enhances the level
security, speeds up to process information, adaptive
applications that real time requirements, to reduce of the power
consumption and FPFA areas…
Keyword - Reconfigurable SoC, Multiple Cryptographic,
Simultaneously Multiple Hardware Threads, FPGA- PDR
I. INTRODUCTION
With the development of information technology, protecting sensitive
information via encryption is becoming more and more important to
daily life. In 2001, the National Institute of Standards and Technology
(NIST) selected the Rijndael algorithm as the Advanced Encryption
Standard (AES) [1], which replaced the Data Encryption Standard
(DES) [2]. Since then, AES has been widely used in various
applications, such as secured communication systems, high-
performance data base servers, digital video/audio recorders. But with
the growing up of semiconductor fabrication, the density of logic
gates and speed processing of ICs are been rapid. So the ability for
attacking and hacking the secured information may be come in reality.
Currently, the AES has been used for almost secured applications but
some researching, is vulnerable by a “related key” attack [3], [4].
Information of military/intelligent fields need requirements of high
level security and authentic. The encrypted image, encrypted stream
videos can be recovered from a few pixels by the methods of edge-
directed bicubic interpolation algorithm, bilateral- filter…that been
mentioned in [9], [10] .
While waiting the new Encryption algorithm that more
confidence than AES, the problem brings forward that needing a
mechanism of multiple cipher or cascade cipher to enhances the level
security of system that several researching workings have been
proposed as [10],[11],[12]. Also current many commercial products at
ASIC level as well as TI AM387xCortex-A8 or Maxim Crypto
MAXQ1850 integrated all the AES, 3DES,RSA,SHA… in their
systems. The researching workings as mentioned at above indicate out
the essential expecting as looking for the methods to enhance the
security level of information. This paper proposes the novel
architecture of Cryptosystem with targets: enhancement the level
security, accelerator processing that adaptive application with real-
time requirements, reduce of the power consumption and FPGA areas
in using…
This paper presents in 5 sections, with hence is Introduction. The
section II mentions the related researching about enhancement
security based current algorithms as AES, 3DES. In this section we
shall have an analyst the advantages and disadvantages those
researching so that we present a solution for new proposed
architecture of Cryptographic in section III. The section IV listed out
the performance evaluations with parameters of throughputs, FPGA
resource, power consumptions and effectively of this proposed vs the
current researching. The final section is for conclusion and future
workings.
II. RELATED WORKS
This section outlines the current researching multiple encryption, the
disadvantages for these solution and new proposal for filling up these
disadvantages. Also, in this section introduces two related issues that
support for new system: ReConOS and PDR engine
A. Current researching
To enhance the security level for information, a lot of
research-workings have proposed, or commercial products released on
market. There are many methods for achieving these targets as well as
8/22/2019 Accelerating in Cryptosystem With Simultaneously Encryptions_Decryptions HW-Threads and Self-Dynamic Reconfi…
http://slidepdf.com/reader/full/accelerating-in-cryptosystem-with-simultaneously-encryptionsdecryptions-hw-threads 2/7
embed several encryption algorithms as TI AM387xCortex-A8 or
Maxim Crypto MAXQ1850 or employs encryptions and authentic on
same of system [7], or Dynamically reconfiguration the Cipher-Key
module when detects the threat is coming [13], in that the system can
self-reconfigure the Cipher- Key module that collative the security
requirements and release out of FPGA platform with current module.
So advantages are listed out as the security level shall enhance a lot,
reduce the FPGA, latency of gate, and power consumption. But
following the architecture of IP Core, all Input/Output data, control
signals, status signals are mapped in register as ịn Fig.1. All
operations of IP Core are under CPU administration. By the CPU
architecture, the Instruction Pointer shall implementation
programming with sequence steps. So it is the cause that not speeds
up the processing of system.
Fig. 1. Hardware Accelerator
Besides that, some application as well as secure electronic transaction
[5], watermaking and identification for IP Core [6], [7] or Remote
configure bit-stream [8] have implemented several algorithms as
encryptions and authentic, several proposals concentrated the multiple
encryption method for enhancement security. In the [5] implements
the multiple encryption the all of secure electronic transactions by the
SHA and MD algorithm. In the [11] there is four famous
cryptographic algorithms to implement multilevel security or creating
the multi cipher text, that included: AES, DES, Rivest Shamir
Adleman and Ceaser or [12] proposed the method with double
encryptions for protecting the sensitive image. Almost researching
works have mentioned at above are presented at algorithm and
programming on CPU platform. The critical problems when deploy
these methods with multiple encryptions, or combination encryptions
and authentic on CPU platform, shall not satisfy the real-time
applications, for example with run-time video conference or the video
on the scene that captures from UAVs to Base. These applications are
very strict requirements that can not implement on CPU platform or
following single hardware threads that presented in [7]. The paper
addresses to accelerate of encryption processing that suitable real time
requirement, reduction for FPGA source and low power consumption
by the simultaneously hardware threads and Self-Dynamic Partial
Reconfiguration engine
B. ReconOS Architecture
ReconOS project, that have been developing by Computer
Engineering Group of University of Paderborn which supports both
software and hardware threads with a single unified programming
model. ReconOS is based on eCos and Linux OS that presented in
Fig.2 .
Fig.
2. The Linux operation [14]
ReconOS system architecture [14] is presented in Fig.3, that all
threads share the same physical memory space. Therefore, hardware
threads have direct access to any location in the system’s memory, or
memory mapped peripherals, if desired.
Fig.3. ReconOS system architecture
There are three sections that build up to the system:
. Delegate thread: Module that concerns the transparency of
thread-to-thread communication and synchronization, regardless of
the execution context (hardware or software) of the respective
communication partners. This enables the designer to easily replace,
for example, a software thread with a functionally equivalent
hardware thread, allowing for rapid design space exploration with
respect to the hardware/software partitioning.
.Hardware thread : consists of at least two VHDL processes:
the synchronization state machine and the actual user logic. The state
transitions in the synchronization state machine are always dependent
on control signals from the OSIF; only after a previous operating
system call “returns”, the next state can be reached. Thus, the
communication with the operating system is purely sequential, while
8/22/2019 Accelerating in Cryptosystem With Simultaneously Encryptions_Decryptions HW-Threads and Self-Dynamic Reconfi…
http://slidepdf.com/reader/full/accelerating-in-cryptosystem-with-simultaneously-encryptionsdecryptions-hw-threads 3/7
the processing of the hardware thread itself can be highly parallel. It is
up to the programmer to decompose a hardware thread into a
collection of user logic modules and one synchronization state
machine.
.Hardware/Software interfacing
This module has a mechanism for low-level synchronization and
communication between the hardware circuitry and the operating
system, that called OSIF (Operation System Interfacing).An overview
of the OSIF’s structure and its interfaces to the hardware thread, the
system buses and the FIFO cores is given in Fig.4 [14]
Fig.4. OSIF overview and interfaces
C. Self-Dynamic Partial Reconfiguration
A dynamically reconfigurable system allows to change parts of his
logic resources without disturbing the functioning of the remaining
circuit. This property permits the system to change its behavior
according to external events. The dynamic reconfiguration takes place
in Partially Reconfigurable Region (PRR) which can be partially
reconfigured independently [15]. Designing a dynamically
reconfigurable system always requires the declaration of PRRs. The
partial bit-streams of these zones are stored in an external memory
and they contain all the information about the positions and
functionalities of the considered PRRs. A dynamic reconfigurable
system usually has a central processor connecting to the internal
reconfiguration port (ICAP) and controlling the partial reconfiguration
process by downloading bit-streams onto this port. The ICAP and this
controller are implemented in a static zone (i.e. not reconfigured) of
the FPGA. Except for the dynamic zone which is being reconfigured,
the whole FPGA is still on operation during the entire reconfiguration
process. In one PRR, several Partially Reconfigurable Modules
(PRMs) could be loaded (one at a time). Each PRM is individually
designed and implemented using partial reconfiguration design tools
[15]. All PRMs for a given PRR must be pin compatible with each
other, i.e., have the same port definitions and entity names.
III. PROPOSAL FOR NOVEL ARCHITECTURE OF
CRYPTOSYSTEM WITH SIMULTANEOUSLY
MULTIPLE HARDWARE THREADS
ARCHITECTURE (SMHT) AND DPR ENGINE
In this section, we propose the Cryptosystem architecture with SMHT
and PDR engine. In that, the Cryptosystem has the mechanism for
cipher threads with AES_Core, can operator in simultaneously. Based
on the level security of application, the system creates multiple of
hw_tasks, that each hw_tasks for each AES_Thread. In novel of
Cryptosystem, there are three issues that need for consideration:
A. Hardware design
. OS Synchronization communication module
This RTL module synchronizes between threads with operating
system calls. The state transition in the synchronizations state machine
are always dependent on control signals from the OSIF; only after a
previous operating system call “returns”, the next state can be
reached. For initiation the new Encryption/Decryption transaction,
this module puts a query API reconos_mbox_get() to the OS for
asking the Semaphone ready for new threading. If the system is
available, it shall have an indication signal for initiation the
transaction. The state machine transfers the data in main memory in
Local_ram of hardware thread by API reconos_read_burst(), after the
transfer data is completed then the hardware thread enters the
AES_Initial state by assignation the “Start” signal to AES_Core
(user_logic) core. The “Start” is port map to AES_Core and this core
starts Encryption/Decryption operation. At this time, the OS
Synchronization module continues to query the “Done” signal from
AES_Core, and until the “Done” is asserted on high logic level, the
OS Synchronization releases the Semaphone flag for termination the
current transaction. The Fig. 6 indicates out the “Done” and “Start" of
control signals
Fig. 5. FSM of OS Synchronization Communication
8/22/2019 Accelerating in Cryptosystem With Simultaneously Encryptions_Decryptions HW-Threads and Self-Dynamic Reconfi…
http://slidepdf.com/reader/full/accelerating-in-cryptosystem-with-simultaneously-encryptionsdecryptions-hw-threads 4/7
Fig. 6. Control signals for user_logic core
. FSM of AES_Core (user_logic)
Data that been copied in Local Ram, is divided in two areas. In the
first area, included as: Cipher key, Information of Key Length :
128, 192 or 256, Command for encryption or decryption actions,
Length of Frame for cipher operation, first frame of
plaintext/cipher-text. In remain of areas: All remain
plaintext/cipher-text that needs for encryption/ decryption
operations. After receives the “Start” signal from OS
Synchronization module, AES-Core enters encrypt/decrypt
operations. RTL shall decode and implement following commands
inside Local_Ram. The
Cipher operation is quite transition inside AES Hardware Threads.
When the final data that needed to process completing, AES-Core
asserts “Done” signal to OS Synchronization module for
finalization an operation cycle. Fig. 7 presents the FSM of AES-
Core
.
Fig. 7. FSM of AES_Core ( user_logic)
In comparison with common IP cores as Fig.1, this new architecture
shall process all in Hardware_thread. Beside Data is accelerated in
processing parallel by RTL, there are two advantages that be
mentioned as well as :
- Since the CPU had transferred the data in to Local_Ram,
AES_Core gets data this memory for encryption/decryption. At this
time, AES_Core has been working in independent with system. So if
Local_Ram has a large capacity, it shall reduce the handshake with
CPU for getting new data, and increase the processing data in
Local_Ram.
- During the time that AES_Thread has been processing the data in
Local_Ram, the CPU has other free of time-slots for running other
tasks, so performance of system shall increase a lot. These advantages
shall have mention in the analyst and performance evaluations of
section IV
B. Software design
The operations for each hw_task are included as: Burst_Ram_Read,
Burst_Ram_Write and cryptographic processing with Tx.r, Tx.w and
Tx.p in correlative, that indicated in Fig. 8
Fig. 8. The operation of two HW_threads with shared memory
In double encryption/decryption tasks, we use a shared memory for
storage the results out of AES_0 thread and data in for processing of
AES_1 thread.
For guarantees the integrity of data in shared memory, avoids the
confliction of read/write operations in the same time, there have the
communication and synchronization mechanisms for shared memory
as well as: Creates space for shared memory, attaches it to the address
space of a process, lock thread before Writing/Reading for protect
data, operating for Writing/eading data in shared memory, detaches
and destroy shared memory from the current process. Both of two
flow chats, before enters routines of read/write, there has a command
for locking the accessing memory from other threads as detailed in
Fig. 9
Fig. 9. Flow-chats for Writing/Reading data
After transaction completes, the unlock command is released
C. Integrated system
The Cryptosystem is developed on FPGA platform. For accelerating
of system’s processing, some threads that get a lot of CPU’s resource,
shall been mapped in IP Cores. These IP Cores are stored in ACE
Flash as Partial Bit-stream format. The main program that under
PPC/MicroBlazer shall track the threads, attacking for outside
environment to system for hacking the data, under some events:
8/22/2019 Accelerating in Cryptosystem With Simultaneously Encryptions_Decryptions HW-Threads and Self-Dynamic Reconfi…
http://slidepdf.com/reader/full/accelerating-in-cryptosystem-with-simultaneously-encryptionsdecryptions-hw-threads 5/7
wrong IP/MAC, Power consumption or
Fig. 10. Flow chat of Software/Hardware Co-operation
electromagnetic radiation [16]. When determinates a threat is coming,
system shall upgrade the level security by double encryption, in which
creates a new hw_thread that indicated in Fig. 11 and dynamic
reconfiguration the AES_Thread with Partial Bitstream into
hw_thread separately. In Fig.12 indicates there are two AES_Threads
that are built for two hw_tasks by DPR engine
Fig. 11. Slots for hw_tasks reconfiguration with OSIF bus
IV. ANALYST AND PERFORMANCE EVALUATIONS:
A. RTL simulation and implementation results:
For verification the design correcting both of RTL code and testing
cycles for encryption/decryption, the materials for testing AES IP
Core shall be based on FIPS197 specification [1]
Fig. 12. Creating the slots for each partial bit –stream AES_Thread
Plaintext = 32 43 f6 a8 88 5a 30 8d 31 31 98 a2 e0 37 07 34
Cipher Key = 2b 7e 15 16 28 ae d2 a6 ab f7 15 88 09 cf 4f 3c
Ciphertext = 39 25 84 1d 02 dc 09 fb dc 11 85 97 19 6a 0b 32
We shall test the AES IP Core with 128-bit Key_in, the other
key_in shall have the same results. Flowing our proposal in this
paper, clock cycles for Key_expander_128 procedure shall be 11
cycles, that include the first clock of key_in_128 is read in and 10
clocks for expanded procedure, and AES Transformation
procedure for each block shall be also 11 cycles that indicated in
Fig.13.
Fig. 13. The AES processes within 11 Clock cycles
B. Analyst performance
With the tradition IP cores’ structure, all Data_in, Data_out,
Control signals, Status signals are mapped into registers in Fig. 1,
following the mechanism of embedded systems, there registers are
controlled by microprocessor.
Fig. 14. AES thread is processed in sequence
Fig. 15. Simultaneously AES- threads (Proposed model)
So when the system has requirement upgrade level security by double
encryptions, all data much be processed completely in the first AES
thread before for next threads. This method brings about large latency
and low throughput of system. On Fig.14 indicates the clarification
about above contents. So in novel structure, both of AES cores are in
active operation the same time. Once Core is getting the data for
beginning the new encryption/decryption while other one is been
processing. Thus, there is no waste-time slot during processing. For
each block of 16 bytes data that needs AES processing, there shall be
passed in three stages, in with named is AES cycles :
AES cycles/1 block = 1 read cycle + 11 processing cycles + 1 write
cycle
8/22/2019 Accelerating in Cryptosystem With Simultaneously Encryptions_Decryptions HW-Threads and Self-Dynamic Reconfi…
http://slidepdf.com/reader/full/accelerating-in-cryptosystem-with-simultaneously-encryptionsdecryptions-hw-threads 6/7
In the sequence model at Fig.14 : For streams that need double
encryption with each block, we shall spend cycles of time as : 1
Block = 2 x AES cycles
In the proposed model at Fig.15: Flowing the organization of
Hardware_Thread, the data is transferred in to Local_Ram and the
AES Core shall take these data for begin the new
encryption/decryption. The Local_Ram is setup with 4Kbyte capacity,
with 16 bytes of API Read_Burst, we shall have a full Local Ram with
253 blocks with named is Frame Data.
Data Frame = 253 blocks x 16 bytes = 4Kbytes
In the initial stage, and final stage of cryptography:
1st Frame (or final Frame) = 253 read cycles + (253 x 13) AES cycles
+ 253 write cycles
= 3759 cycles
So each 1 block = 2 Cycles + AES cycles = 15 cycles
In remain stages: There are two AES Cores that take part in
processing cryptographic
If we have mxBlocks that needing to process, the cycles for
completing of each method are calculated as at below:
In the sequence model :
Cycles for mBlocks = 2 x m x AES Processing (*)
In the proposed model :
Cycles for mBlocks = 1st AES Processing + m x AES Processing +
Final AES Processing
= (m + 2 ) x AES Processing (**)
The Fig. 15 shall details for above analysts
The synthesis results, parameters of throughput and performance for
this design are indicated on TABLE 1.
It is evident that our proposed approach has flexibility for
Cryptosystem. When needs low level security thus the Cryptosystem
operates as Stand-alone model with AES_Thread. And having a
requirement for the high level security then system shall dynamically
works as reconfiguration a bit-stream of second AES_Thread and
initials the simultaneously model. Following the TABLE 1, the new
proposal offers the throughput is very high than Sequence double
threads model [13], beside that the efficiently of Throughput per Slice
(TpS) is larger than Sequence model in two times
TABLE. I. THE DETAILED COMPARISON
BETWEEN THREE MODELS: STAND-ALONE,
SEQUENCED AND PROPOSED SYSTEM
V.
V. CONCLUSION AND FUTURE WORKS
In this paper, we have proposed a Cryptosystem system combining
Simultaneously engine and partial reconfiguration scheme to reduce
the required hardware resources and furthermore greatly improve the
bandwidth as well as the security of the implemented encryption
algorithm. We plan to implement a scrambler system to protect the
content of BRAM against attack. The scrambler module which will be
based on the unique device identifier and a pseudo-random number
generator (PRNG) to securely encrypt the key stored in the BRAM,
could furthermore enhance the robustness of the whole system. A
complete investigation of this complex system will also be carefully
studied.
Solutions Parameters Values
AES_ThreadStand-alone
(*)
Maximum Frequency(MHz)
81
Number of Clock 13
Throughput(Mbps) #798
Slice used 11,167
TpS(Mbps/Slice) 71.4
Power consumption(W)
4,472
Sequencedouble threads
model(**)
Maximum Frequency(MHz)
81
Number of Clock 26
Throughput(Mbps) # 389
Slice used 11,167
TpS(Mbps/slice) 35.7
Power consumption(W)
8.944
Simultaneouslythreadsmodel (***)
Maximum Frequency(MHz)
81
Number of Clock 15
Throughput(Mbps) # 1296
Slice used 22,304
TpS(Kbps/slice) 58.1
Power Consumption(W)
8.944
Platform Virtex6 – XC6VLX240-1
8/22/2019 Accelerating in Cryptosystem With Simultaneously Encryptions_Decryptions HW-Threads and Self-Dynamic Reconfi…
http://slidepdf.com/reader/full/accelerating-in-cryptosystem-with-simultaneously-encryptionsdecryptions-hw-threads 7/7
REFERENCES
[1] NIST,“Advancedencryptionstandard(AES),”Nov.20
,http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf
[2] NIST,“Dataencryptionstandard(DES),”Oct.1999,http
//csrc.nist.gov/publications/fips/fips46-3/fips46-3.pdf
[3] Alan Kaminsky, Michael Kurdziel, Stanisław Radziszowski, “An
Overview of Cryptanalysis Research for the Advanced Encryption
Standard”, http://www.cs.rit.edu/~spr/PUBL/aes.pdf
[4] Alex Biryukov, Dmitry Khovratovich “Related-key Cryptanalysis
of the Full AES-192 and AES-256” http://impic.org/papers/Aes-192-
256.pdf
[5] Himanshu Gupta,”role of multiple encryption in secure electronic
transaction”, International Journal of Network Security & Its
Applications (IJNSA), Vol.3, No.6, November 2011
[6] Daniel Ziener, Jurgen Teich,”Power Signature Watermarking of IP
Cores for FPGAs”, http://citeseerx.ist.psu.edu/viewdoc/summary?
doi=10.1.1.161.1509
[7] Thanh Tran, Pham Ngoc Nam, Tran Hoang Vu, Nguyen Van
Cuong, “A framework for secure remote updating of bitstream on
runtime reconfigurable embedded platforms”,
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?
reload=true&arnumber=6315952
[8] K.-W. Hung W.-C. Siu, “Fast image interpolation using the
bilateral filter” Published in IET Image Processing
[9] Zhou Dengwen, ”An Edge-Directed Bicubic Interpolation
Algorithm” 2010 3rd International Congress on Image and Signal
Processing (CISP2010)
[10] Melek Önen, Refik Molva,”Secure Data Aggregation with
Multiple Encryption”, link.springer.com/chapter/10.1007%2F978-3-
540-69830-2_8
[11] Sairam Natarajan “A Novel Approach for Data Security
Enhancement Using Multi Level Encryption Scheme”, et al, / (IJCSIT)
International Journal of Computer Science and Information
Technologies, Vol. 2 (1) , 2011, 469-473
[12] Jayant Kushwaha,Bhola Nath RoySecure, “Image Data by
Double encryption” International Journal of Computer Applications
(0975 – 8887) Volume 5– No.10, August 2010
[13] Trong-Tuan NGUYEN, Van-Cuong NGUYEN, Hung-Manh
PHAM “Enhance the performance and security of SoC using pipeline
and dynamic partial reconfiguration” The 2012 International
Conference on Integrated Circuits and Devices in Vietnam (ICDV
2012)
Section #6
[14] Enno Lubbers,Marco Platzner, “Communication and
Synchronization in Multithreaded Reconfigurable Computing
Systems”, In Proceedings of the 8th International Conference on
Engineering of Reconfigurable Systems and Algorithms (ERSA), Las
Vegas, July 2008
[15] Xilinx Inc., “Partial Reconfiguration User Guide UG702(v14.1)” April 2012
[16] Daniel Ziener, Jürgen Teich “Power SignatureWatermarking of IP Cores for FPGAs”http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.161.1509