AZynq-basedTestbedforthe ...kgaj/publications/conferences/GMU_ReConFig_2016... ·...

A Zynq-based Testbed for theExperimental Benchmarking of AlgorithmsCompeting in Cryptographic Contests

Farnoud Farahmand, Ekawat Homsirikamol, and Kris GajDepartment of Electrical and Computer Engineering, George Mason University, Fairfax, Virginia 22030, USA

INTRODUCTION

I Hardware performance evaluation of candidates competing incryptographic contests, such as SHA-3 and CAESAR, is very importantfor ranking their suitability for standardization.

I One of the most essential performance metrics is the throughput, whichhighly depends on the algorithm, hardware implementation architecture,coding style, and options of tools. The maximum throughput iscalculated based on the maximum clock frequency supported by eachalgorithm.

I In this project, we have developed a universal testbed, which is capableof measuring the maximum clock frequency experimentally, using aprototyping board. We are targeting cryptographic hardware cores, suchas implementations of SHA-3 candidates. Our testbed is designed usinga Zynq platform and takes advantage of software/hardware co-design.

ARM DualCortex -A9MPCore

Complex

MemoryInterfaces

CommonPeripherals

ProcessingSystem

ProgrammableLogic

Common Peripherals

Custom Peripherals

Common Accelerators

Custom Accelerators

High-BandwidthAMBA -AXI Interfaces

I We measured the maximum clock frequency and the execution time of12 Round 2 SHA-3 candidates experimentally on ZedBoard andcompared the results with the frequencies reported by Xilinx Vivado.

I Experimental benchmarking of cryptographic algorithms has beenperformed previously on different platforms other than Zynq.1. Maximum frequency of SHA-256 has been measured experimentally using the SLAAC-1V board

based on Xilinx Virtex VCV 1000.2. Experimental measurement of the hardware performance of 14 round 2 SHA-3 candidates has been

performed using the SASEBO-GII FPGA board.

SYSTEM DESIGN

I Simplified block diagram of the PL side with the indication oftwo independent clocks:

Output FIFOInput FIFO Hash Core

Processing System

AXI DMA

FIFO Interface

AXI StreamInterface

Clocking wizard

rd_clkwr_clk wr_clk rd_clkclk

UUT_clk

Main Clock

IBlock Diagram of the Testbed with the division intoProgrammable logic (PL), Interconnects, and ProcessingSystem (PS):

S_AXIS_S2MM

M_AXIS_MM2S

S_AXI_LITE

M_AXI_S2MM

M_AXI_MM2S S00_AXI

S01_AXI M00_AXI

M00_AXIM01_AXI S00_AXI

S_AXI_HP0

M_AXI_GP0

FIFO_Out

Input_Stream

Input FIFO

AXI Direct Memory Access

AXI Interconnect

Cortex A9 ARM Cores

Programmable Logic (PL) Interconnects Processing System (PS)

Concat

IRQ_F2P

mm2s_introuts2mm_introut

In0In1

FIFO_Out

FIFO_In

Hash Core

Output_Stream

FIFO_In

Output FIFO

S_AXI_Lite

AXI Timer

S_AXI_Lite

Clocking Wizard

M02_AXIM03_AXIS_AXI_Lite

Control_Bus

UUT_clk

ICustom IPs:

s_axis_tdata

s_axis_tvalid

s_axis_tready

s_axis_aclk

s_axis_aresetn

fifo_dout

fifo_read

fifo_empty

hash_core_aclk

S_AXIS_to_FWFT_FIFO

Input FIFO

m_axis_tdata

m_axis_tvalid

m_axis_tready

m_axis_aclk

m_axis_aresetn

FWFT_FIFO_to_M_AXIS

m_axis_tlastS_AXI Lite

fifo_din

fifo_writefifo_empty

hash_core_aclk

Output FIFO

VERIFICATION METHODOLOGY

I A universal testbench has been developed in the Vivado environment toverify the operation of our testbed using simulation.

I ATG (AXI Traffic Gen) IP has limitation in case of generating specificdata in AXI stream mode through tdata port. As a result, we used aseparate FIFO which is already filled with our desired data and AXIstream ATG only provides control signals.

I AXI Lite ATGs are used to configure Output FIFO and AXI StreamATG.

Universal testbench for Vivado environment

AXI Traffic GeneratorAXI Lite (system init)

AXI Traffic GeneratorAXI Stream

Hash Core

Input FIFOAXI StreamInterface

Output FIFO

AXI StreamInterface

AXI Lite Interface

FIFOFIFO Data

Data Input

RESULTS: Maximum Frequency

I ZedBoard and Vivado 2015.4 have been used for result generation. Alloptions of Vivado design suite including synthesis and implementationsettings are set to default mode.

I On the software side, the bare metal environment and Xilinx SDK areused for running the C code on the ARM core of Zynq.

Maximum clock frequencies obtained using static timinganalysis and the experimental measurement, respectively

Max Freq. Static Timing Analysis [MHz] Max Freq. Experimental [MHz]

Maximum frequencies and throughputs

Algorithm

Max Freq.StaticTimingAnalysis[MHz]

Max Freq.Experimental

ThroughputBased onFormulaand MaxExp. Freq.[Gb/s]

ThroughputBased onExp. HWExe. Time[Gb/s]

BLAKE 76.4 145.4 3.546 3.544CubeHash 152.9 275.8 4.413 4.399ECHO 100.1 101.1 5.999 6.000Fugue 122.9 200.0 3.200 3.191Grøstl 197.2 258.6 6.305 5.821Hamsi 105.0 124.9 1.333 1.332JH 211.6 333.3 4.740 4.726Keccak 102.6 123.1 5.292 5.314Luffa 152.5 247.4 7.037 7.213Shabal 119.7 122.5 0.981 0.983SHAvite-3 119.0 205.7 2.846 2.828Skein 70.6 140.3 3.782 3.772

I Max Freq. Experimental was determined as a worst case value acrossall investigated input sizes from 10 to 5000 kB.

I Throughput Based on Exp. HW Exe. Time was obtained by dividing themessage input size by the actual execution time of hashing in hardware,measured using AXI Timer for the input size equal to 1000 kB.

Formulas for the execution time and throughputNotation: T - clock period in ns, N - number of input blocks

Algorithm I/O Buswidth

Hash Time[cycles]

Throughput[Gbit/s]

BLAKE 64 2 + 8 + 21 · N + 4 512/(21 · T )CubeHash 64 2 + 4 + 16 · N + 160 + 4 256/(16 · T )ECHO 64 3 + 24 + 26 · N + 1 + 4 1536/(26 · T )Fugue 32 2 + 2 · N + 37 + 8 32/(2 · T )Grøstl 64 3 + 21 · N + 4 512/(21 · T )Hamsi 32 3 + 1 + 3 · (N − 1) + 6 + 8 32/(3 · T )JH 64 3 + 8 + 36 · N + 4 512/(36 · T )Keccak 64 3 + 17 + 24 · N + 4 1088/(24 · T )Luffa 64 3 + 4 + 9 · N + 9 + 1 + 4 256/(9 · T )Shabal 64 2 + 64 · N + 64 · 3 + 16 512/(64 · T )SHAvite-3 64 3 + 8 + 37 · N + 4 512/(37 · T )Skein 64 2 + 8 + 19 · N + 4 512/(19 · T )

RESULTS: SpeedUp vs. Software

HW/SW speed up for three different input sizes in KB

819288

104114

105115

BLAKE ECHO Fugue Luffa Shabal SHAvite-3 Skein

10 kB 500 kB 5000 kB

I Only algorithms with optimized software implementation and ARMarchitecture support are shown in this graph.

RESULTS: Data Transaction Overhead

DMA core running at 100 MHz for all algorithms

DMA core running at 150 MHz in case of Luffa and 100 MHzfor all other algorithms

RESULTS: Experiment on Two Different ZedBoards

Max Freq. Experimental Board1 [MHz] Max Freq. Experimental Board2 [MHz]

Avr. Max Freq. Experimental [MHz]

CONCLUSIONS

I The testbed can be used to correctly measure performance of designswith the maximum throughput up to 64 bit · 150 MHz = 9.6 Gbit/s.

I For all the investigated hash functions, the overhead of thecommunication between PS and PL was below 5% for 100 kB messagesand negligible for messages above 500 kB.

I All algorithms have also demonstrated significant speed up vs. theirexecution in software on the same chip, in spite of the substantial speedof the ARM core, operating at 667 MHz.

I Our experiments have also demonstrated that the maximumexperimental clock frequency was always higher than the post-place androute frequency calculated by Vivado using static timing analysis.

I At the same time, somewhat unexpectedly, the spread of ratiosexperimental to post-place and route frequency is very large, rangingfrom 1 to 2. This fact can be explained by a different influence ofparameter variations and operating conditions on the critical path ofeach hash core, due to a different physical location (placement) of thesecritical paths in the FPGA fabric

Acknowledgment

This material is based upon work supported by the National ScienceFoundation under Grant No. 1314540.

Cryptographic Engineering Research Group (CERG) Department of Electrical and Computer Engineering George Mason University http://cryptography.gmu.edu

AZynq-basedTestbedforthe ...kgaj/publications/conferences/GMU_ReConFig_2016... ·...

Documents

Publications & Conferences ao.Univ.-Prof. Dr. Marion Rauner · Publications & Conferences ao.Univ.-Prof. Dr. Marion Rauner School of Business, Economics, and Statistics Institute

I. Publications in journals and conferences

Digest of Current Publications and Events...CONTENT Call for Papers & Conferences..... 4

Dr. K.T. Tamilmani - nmc.ac.in · Publications International National Others Journals Conferences Journals Conferences Books / Chapters

D5.7 PUBLICATIONS, CONFERENCES AND CONCERTATION ACTIONS€¦ · Title: PUBLICATIONS, CONFERENCES AND CONCERTATION ACTIONS Lead beneficiary: IFG.CC Work package: WP5: Dissemination

Publications / Conferences - UZH17bff58f-6585-4c0e... · Publications / Conferences Francis Cheneval, Chair in Political Philosophy, University of Zurich January 2018 Single Author

RESEARCH Conferences & PUBLICATIONS

Implementation Trade-offs of Triple-DES in the SRC ...ece.gmu.edu/~kgaj/publications/conferences/GWU_GMU_MAPLD_2002_1_slides.pdf"Implementation Trade-offs of Triple-DES in the SRC

Home Publications Conferences ... - dermatoglyphics.com

Publications / Conferences - UZHf6072bbf-89fc-4a1c-a677-9a… · Publications / Conferences Francis Cheneval, Chair in Political Philosophy, University of Zurich January 2019 Single

Social Networking Techniques for Ranking Scientific Publications (i.e. Conferences & journals) and Research Scholars

Three conferences in 2007 - Home - PICESmeetings.pices.int/publications/pices-press/volume16/issue1/PPr16... · Three conferences in 2007 ... Drs. Skip McKinnell ... Steven Bogard

Evaluation of SSHRC’s Aid to Occasional Research Conferences & …sshrc-crsh.gc.ca/about-au_sujet/publications/conf_app_e.pdf · 2013-09-06 · Evaluation of SSHRC’s Conferences

Fair%and%Comprehensive%Methodology%for ...kgaj/publications/conferences/GMU_CHES_2010_slides.pdf35 • SHA-3 papers FPL 2010 paper • Results for 512-bit variants ATHENa features

Appendix 3.3.1 Faculty Publications, Conferences, and Presentations

Conferences and Continuing Education | Research and ... · Conferences and Continuing Education | Research and Assessment | Publications ... Campus run-around and a lack of attention

Prof. Dr. Marion G. Müller Publications, Talks & Conferences fileProf. Dr. Marion G. Müller Publication List 1 | P a g e Prof. Dr. Marion G. Müller Publications, Talks & Conferences

Analysis of Attacks and Defense Mechanisms for QoS ...ece.gmu.edu/~kgaj/publications/conferences/GMU_WIS_2005_slides.pdf · Mechanisms for QoS Signaling Protocols in MANETs ... Analysis

EMES CONFERENCES SELECTED PAPERS SERIESemes.net/content/uploads/publications/ECSP-B08-01__Crossan-Van_Til...TOWARDS A CLASSIFICATION FRAMEWORK FOR NOT-FOR- ... EMES Conferences Selected

Publications 117 - Technische Universität München · 118 Publications The number of scientific publications in high impact international journals, as well as contributions to conferences,