35
1 A Universal Hardware API for Authenticated Ciphers Ekawat Homsirikamol, William Diehl, Ahmed Ferozpuri, Farnoud Farahmand, Malik Umar Sharif, and Kris Gaj George Mason University USA http://cryptography.gmu.edu https://cryptography.gmu.edu/athena

A Universal Hardware API for Authenticated Ciphersece.gmu.edu/~kgaj/publications/conferences/GMU_ReConFig_2015... · A Universal Hardware API for Authenticated Ciphers Ekawat Homsirikamol,

Embed Size (px)

Citation preview

1

AUniversalHardwareAPIforAuthenticatedCiphers

EkawatHomsirikamol,WilliamDiehl,AhmedFerozpuri,

FarnoudFarahmand,MalikUmarSharif,andKrisGaj

GeorgeMasonUniversityUSA

http://cryptography.gmu.eduhttps://cryptography.gmu.edu/athena

2

Co-Authors from the GMU CAESAR Team

“Ice” Will Farnoud UmarHomsirikamol Diehl Farahmand Sharif

KrisGaj

3

CAESARContest

2013-2017

4

Secret-Key Ciphers MACs&Hash Functions

Authenticated Ciphers

Cryptographic Transformations

Block Ciphers Stream Ciphers

confidentiality message integrity & authentication

confidentiality + message integrity & authentication

5time

97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17

AES

NESSIE

CRYPTREC

eSTREAM

SHA-3

34 stream 4 HW winnersciphers → + 4 SW winners

51 hash functions → 1 winner

15 block ciphers → 1 winner

IX.1997 X.2000

I.2000 XII.2002

IV.2008

X.2007 X.2012

XI.2004

CAESARI.2013

57 authenticated ciphers → multiple winners

XII.2017

Cryptographic Standard Contests

6

• 2014.03.15: Deadline for first-round submissions• 2014.04.15: Deadline for first-round software• 2015.07.07: Announcement of second-round candidates• 2015.09.15: Deadline for second-round software• 2015.12.15: Deadline for second-round Verilog/VHDL• 2016.03.15: Announcement of third-round candidates• 2016.12.15: Announcement of finalists• 2017.12.15: Announcement of final portfolio

CAESAR Contest Timeline

7

Evaluation Criteria

Security

Software Efficiency Hardware Efficiency

Simplicity

FPGAs ASICs

Flexibility Licensing

µProcessors µControllers

8

Software API vs. Hardware API - Similarities

Software API Hardware API

Function Arguments

ArgumentFormat

Core Ports= Interface

Input/Output Format

9

Software API vs. Hardware API - Differences

Software API Hardware API

All inputs available at the same time

Inputs accepted,outputs produced sequentially

Physically constrained

Abstracted

10

• Hardware API can have a high influence on Area and Throughput/Area ratio of all implementations

• Hardware API typically much more difficult to modify than Software API

• Without a comprehensive hardware API, the comparison of existing and future implementations highly unreliable and potentially unfair

• Need for a uniform hardware API, endorsed by the CAESAR Committee, and adopted by all future implementers

Hardware API and the CAESAR Contest

11

• Popular general-purpose interfaces• ARM: AXI4, AXI4-Lite, AXI4-Stream (Advanced eXtensible Interface)• IBM: PLB (Processor Local Bus), OPB (On-chip Peripheral Bus)• Altera: Avalon• Xilinx: FSL (Fast Simplex Link)• Silicore Corp.: Wishbone (used by opencores.org)

• Hardware APIs used during the SHA-3 Contest• GMU, Virginia Tech, University College Cork, etc.

• Interfaces used so far in the CAESAR competition• minimalistic, algorithm specific• AXI4-Stream-based proposed by ETH

(non-uniform, algorithm-specific user and data ports)

Previous Work

12

Proposed APISpecification

13

Input and Output of an Authenticated Cipher

Npub(PublicMessageNumber),typicallyNonceNsec(SecretMessageNumber)[supportedbyfewalgorithms]

EncNsec– EncryptedSecretMessageNumberAD– AssociatedData

or

Key

TagNpub AD CiphertextNsecEnc

Key

Encryption

Npub Nsec AD Message

Npub AD CiphertextNsecEnc Tag Nsec AD MessageInvalid

Decryption

14

• inputs of arbitrary size in bytes (but a multiple of a byte only)• size of the entire message/ciphertext does not need to be

known before the encryption/decryption starts (unless required by the algorithm itself)

• independent data and key inputs• wide range of data port widths, 8 ≤ w ≤ 256• simple high-level communication protocol• support for the burst mode• possible overlap among processing the current input block,

reading the next input block, and storing the previous output block

Proposed Features (1)

15

• storing decrypted messages internally, until the result of authentication is known

• support for encryption and decryption within the same core, but only one of these two operations performed at a time

• ability to communicate with very simple, passive devices, such as FIFOs

• ease of extension to support existing communication interfaces and protocols, such as• AMBA-AXI4 from ARM - a de-facto standard for the System-on-Chip

buses• PCI Express – high-bandwidth serial communication between PCs

and hardware accelerator boards

Proposed Features (2)

16

Authenticated Encryption with Associated Data –AEAD Interface

clk

sdi

sdi_valid

sdi_ready

sw

pdi

pdi_valid

pdi_ready

w

Public Data Input

Ports

PDI

Secret Data Input

Ports

SDI

clk rst

Data Output

Ports

DOw

do

do_ready

do_valid

AEAD

rst

17

Typical External Circuits (1) - FIFOs

full

pdi_valid

pdi_ready

clk rst

dout

empty

read

wdout

empty

read

rst clkwr_clkrd_clk =

sw

rstclk rd_clkwr_clk =

din

write

pdiw

AEAD

rstclk

SDI

FIFO

FIFO

PDI

rd_clk =rstwr_clkclk

sdi

sdi_valid

sdi_ready

do

do_ready

do_validFIFO

DO

18

Typical External Circuits (2) – AXI4 IPs

rst

w

sw

pdi

pdi_valid

pdi_ready

sdi

sdi_valid

sdi_ready

dout

empty

read

m_axis_tdata

m_axis_tvalid

m_axis_tready

Master

AXI4−Stream

clk rstclk rst

clk rst

clk rst

s_axis_tdata

s_axis_tvalid

s_axis_tready

AXI4−Stream

Slavew

do

do_ready

do_valid

AEAD

clk

SDI

FIFO

19

Input/Output Format

status = FAIL

w−bit

instruction = ACTKEY instruction = ACTKEY

instruction = DEC

seg_0_header

seg_0 = Npub

seg_1_header

seg_2_header

seg_2 = AD

seg_3_header

seg_3 = Ciphertext

seg_1 = Enc Nsec

seg_0 = Npub

seg_2_header

seg_2 = AD

seg_3_header

seg_3 = Message

seg_1_header

seg_1 = Nsec

(a) (b)(d)

(c)

seg_0_header

instruction = ENCseg_0_header

seg_1_header

seg_2_header

seg_2 = Message

seg_1 = AD

seg_0 = Nsec

status = PASS

seg_4_headerseg_4 = Tag

PDI for encryption DO for encryption =PDI for decryption DO for decryption

20

High-Speed Implementations

of AEAD

npubnpub

TAG_SIZE

sdi_ready

sdi

sdi_valid

byp

ass_

full

byp

ass_

wr

KEY_SIZE

W

sdi

sdi_ready

pdi_ready

pdipdi_valid

sdi_valid

write

din

full empty

dout

read

FIFO

Bypass

DBLK_SIZE/8

DBLK_SIZE/8

bdi_decryptbdi_decrypt

nsec_readynsec_ready

bdi_pad_loc

bdi_valid_bytesbdi_valid_bytes

bdi_pad_loc

bdi_sizebdi_size

bdi_read bdi_read

exp_tag_ready exp_tag_ready

bdi_eot

bdi_eoi

bdi_eot

bdi_eoi

bdi_nodatabdi_nodata

BS_BYTES

bdi_proc

bdi_readybdi_ready

bdi_proc

bdi_adbdi_ad

nsec_readnsec_read

npub_readnpub_read

key_updated

key_needs_update

key_ready

key_needs_update

key_ready

key_updated

rdkey_ready

rdkey_read

rdkey_ready

rdkey_read

npub_ready npub_ready

W

W 4 W 3

do

do_ready

do_valid

do_ready

do

tag_ready

tag_write

msg_auth_valid

msg_auth_done

byp

ass_

em

pty

byp

ass_

rd

statusdoutctrldin

au

x_fifo

_sta

tus

au

x_fifo

_d

ou

t

au

x_fifo

_d

in

au

x_fifo

_ctrl

byp

ass_

da

ta

do_validProcessorPost

AUX FIFO

bdo_nsec

bdo_size

bdo_ready

bdo_write

bdo_data tag_dataProcessorPre

Controller

CipherCore

Datapath

CipherCore

W

TAG_SIZE

len_a

len_d

len_a

len_d

exp_tag exp_tag

CTR_AD_SIZE

tag

bdo

bdibdi

keykey

rdkeyrdkey

nsecnsecNSEC_SIZE

AEAD Core

CipherCore

pdi

pdi_valid

pdi_ready

DBLK_SIZE

RDKEY_SIZE

AEAD

SW

msg_auth_done

tag_ready

tag_write

msg_auth_valid

msg_auth_done

bdo_nsec

bdo_ready

bdo_write

bdo_size

DBLK_SIZE

BS_BYTES+1

CTR_D_SIZE

NPUB_SIZE

Block Diagram of a High-Speed Implementation of AEAD

22

PreProcessor:• parsing segment headers• loading and activating keys• Serial-In-Parallel-Out loading of input blocks• padding input blocks• keeping track of the number of data bytes left to process

PostProcessor:• clearing any portions of output blocks not belonging to ciphertext

or plaintext• Parallel-In-Serial-Out conversion of output blocks into words• formatting output words into segments• storing decrypted messages in AUX FIFO, until the result of

authentication is known• generating an error word if authentication fails

PreProcessor and PostProcessor for High-Speed Implementations (1)

23

Features:• Ease of use• No influence on the maximum clock frequency of AEAD

(up to 300 MHz in Virtex 7)• Limited area overhead• Clear separation between the AEAD Core unit and internal FIFOs

• Bypass FIFO – for passing headers and associated data directly to PostProcessor

• AUX FIFO – for temporarily storing unauthenticated messages after decryption

Benefits:• The designers can focus on designing the CipherCore specific to a

given algorithm, without worrying about the functionality common for multiple algorithms

• Full-block width interface of the CipherCore

PreProcessor and PostProcessor for High-Speed Implementations (2)

24

Verification & Result

Generation

25

ManualDesign

HDLCode

Automated OptimizationFPGATools[+ATHENa]

PostP&RNetlist

PostPlace&Route

Results

Functional Verification

GMU Testbench

Timing Verification

GMU Testbench

InformalSpecification TestVectors

Proposed Development & Benchmarking Flow

1. Xilinx ISE + ATHENa2. Vivado + Default Strategies

AETVgenCcode

ImplementationParameters

26

• Universal Testbench supporting any authenticated cipher core following GMU AEAD API

• Change of cipher requires only changing test vector file• A Python script, AETVgen, created to automatically

generate test vector files representing multiple test cases• Encryption and Decryption• Empty Associated Data and/or Empty Message/Ciphertext• Various, randomly selected sizes of AD and Message/Ciphertext• Valid tag and invalid tag cases

• All source codes made available at GMU ATHENa website

Universal Testbench & Automated Test Vector Generation

27

• Generation of results possible for• CipherCore – full block width interface, incomplete functionality• AEAD Core - recommended• AEAD – difficulty with setting BRAM usage to 0 (if desired)

• Use of wrappers:• Out-of-context (OOC) mode available in Xilinx Vivado (no pin limit)• Generic wrappers available in case the number of port bits exceeds

the total number of user pins, when using Xilinx ISE• GMU Wrappers: 5 ports only (clk, rst, sin, sout, piso_mux_sel)

• Recommended Automated Optimization Procedure• ATHENa for Xilinx ISE and Altera Quartus II• 25 default optimization strategies for Xilinx Vivado

Result Generation

28

1. AES-COPA2. Ascon3. CLOC4. Deoxys5. ICEPOLE6. Joltik7. Keyak

Authenticated Ciphers Implemented Using Our Hardware API

8. Minalpher9. OCB10. PAEQ11. POET12. PRIMATEs-HANUMAN13. PRIMATES-GIBBON14. SCREAM

15. AES-GCM

CAESAR Candidates:

Current Standard:

29

Preliminary Results for 13 Round 1 CAESAR Candidates & AES-GCM

30

ATHENa Database of Results for Authenticated Ciphershttp://cryptography.gmu.edu/athena

31

ATHENa Database of Results for Authenticated Ciphers

32

AEAD Core vs. CipherCore Area Overhead for Virtex-6

clk, rst, sin, sout, piso_mux_selclk, rst, sin, sout, piso_mux_sel

Overhead = LUT(AEAD_Core)-LUT(CipherCore)

LUT(AEAD_Core)× 100%

33

Conclusions

clk, rst, sin, sout, piso_mux_selclk, rst, sin, sout, piso_mux_sel

• Complete Hardware API for authenticated ciphers• The Detailed Specification - ePrint 2015/669

• GMU hardware API with supporting materials• Available at: https://cryptography.gmu.edu/athena• Universal testbench and Automated Test Vector Generation• PreProcessor and PostProcessor Units for high-speed

implementations• Universal wrappers and scripts for generating results• Ease of recording and comparing results using ATHENa database

• GMU proposal open for discussion and possible improvements through• Better specification• Better implementation of supporting codes

34

• Support for two-pass algorithms• Pipelining of multiple streams of data • Detection and reporting of input formatting errors• Accepting inputs with padding done in software

Future Work

Comments?

Thank you!

35

Questions?

http://cryptography.gmu.edu/athenaChoose: Download

http://cryptography.gmu.edu

Suggestions?