Upload
dangkhanh
View
216
Download
0
Embed Size (px)
Citation preview
1
AUniversalHardwareAPIforAuthenticatedCiphers
EkawatHomsirikamol,WilliamDiehl,AhmedFerozpuri,
FarnoudFarahmand,MalikUmarSharif,andKrisGaj
GeorgeMasonUniversityUSA
http://cryptography.gmu.eduhttps://cryptography.gmu.edu/athena
2
Co-Authors from the GMU CAESAR Team
“Ice” Will Farnoud UmarHomsirikamol Diehl Farahmand Sharif
KrisGaj
4
Secret-Key Ciphers MACs&Hash Functions
Authenticated Ciphers
Cryptographic Transformations
Block Ciphers Stream Ciphers
confidentiality message integrity & authentication
confidentiality + message integrity & authentication
5time
97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17
AES
NESSIE
CRYPTREC
eSTREAM
SHA-3
34 stream 4 HW winnersciphers → + 4 SW winners
51 hash functions → 1 winner
15 block ciphers → 1 winner
IX.1997 X.2000
I.2000 XII.2002
IV.2008
X.2007 X.2012
XI.2004
CAESARI.2013
57 authenticated ciphers → multiple winners
XII.2017
Cryptographic Standard Contests
6
• 2014.03.15: Deadline for first-round submissions• 2014.04.15: Deadline for first-round software• 2015.07.07: Announcement of second-round candidates• 2015.09.15: Deadline for second-round software• 2015.12.15: Deadline for second-round Verilog/VHDL• 2016.03.15: Announcement of third-round candidates• 2016.12.15: Announcement of finalists• 2017.12.15: Announcement of final portfolio
CAESAR Contest Timeline
7
Evaluation Criteria
Security
Software Efficiency Hardware Efficiency
Simplicity
FPGAs ASICs
Flexibility Licensing
µProcessors µControllers
8
Software API vs. Hardware API - Similarities
Software API Hardware API
Function Arguments
ArgumentFormat
Core Ports= Interface
Input/Output Format
9
Software API vs. Hardware API - Differences
Software API Hardware API
All inputs available at the same time
Inputs accepted,outputs produced sequentially
Physically constrained
Abstracted
10
• Hardware API can have a high influence on Area and Throughput/Area ratio of all implementations
• Hardware API typically much more difficult to modify than Software API
• Without a comprehensive hardware API, the comparison of existing and future implementations highly unreliable and potentially unfair
• Need for a uniform hardware API, endorsed by the CAESAR Committee, and adopted by all future implementers
Hardware API and the CAESAR Contest
11
• Popular general-purpose interfaces• ARM: AXI4, AXI4-Lite, AXI4-Stream (Advanced eXtensible Interface)• IBM: PLB (Processor Local Bus), OPB (On-chip Peripheral Bus)• Altera: Avalon• Xilinx: FSL (Fast Simplex Link)• Silicore Corp.: Wishbone (used by opencores.org)
• Hardware APIs used during the SHA-3 Contest• GMU, Virginia Tech, University College Cork, etc.
• Interfaces used so far in the CAESAR competition• minimalistic, algorithm specific• AXI4-Stream-based proposed by ETH
(non-uniform, algorithm-specific user and data ports)
Previous Work
13
Input and Output of an Authenticated Cipher
Npub(PublicMessageNumber),typicallyNonceNsec(SecretMessageNumber)[supportedbyfewalgorithms]
EncNsec– EncryptedSecretMessageNumberAD– AssociatedData
or
Key
TagNpub AD CiphertextNsecEnc
Key
Encryption
Npub Nsec AD Message
Npub AD CiphertextNsecEnc Tag Nsec AD MessageInvalid
Decryption
14
• inputs of arbitrary size in bytes (but a multiple of a byte only)• size of the entire message/ciphertext does not need to be
known before the encryption/decryption starts (unless required by the algorithm itself)
• independent data and key inputs• wide range of data port widths, 8 ≤ w ≤ 256• simple high-level communication protocol• support for the burst mode• possible overlap among processing the current input block,
reading the next input block, and storing the previous output block
Proposed Features (1)
15
• storing decrypted messages internally, until the result of authentication is known
• support for encryption and decryption within the same core, but only one of these two operations performed at a time
• ability to communicate with very simple, passive devices, such as FIFOs
• ease of extension to support existing communication interfaces and protocols, such as• AMBA-AXI4 from ARM - a de-facto standard for the System-on-Chip
buses• PCI Express – high-bandwidth serial communication between PCs
and hardware accelerator boards
Proposed Features (2)
16
Authenticated Encryption with Associated Data –AEAD Interface
clk
sdi
sdi_valid
sdi_ready
sw
pdi
pdi_valid
pdi_ready
w
Public Data Input
Ports
PDI
Secret Data Input
Ports
SDI
clk rst
Data Output
Ports
DOw
do
do_ready
do_valid
AEAD
rst
17
Typical External Circuits (1) - FIFOs
full
pdi_valid
pdi_ready
clk rst
dout
empty
read
wdout
empty
read
rst clkwr_clkrd_clk =
sw
rstclk rd_clkwr_clk =
din
write
pdiw
AEAD
rstclk
SDI
FIFO
FIFO
PDI
rd_clk =rstwr_clkclk
sdi
sdi_valid
sdi_ready
do
do_ready
do_validFIFO
DO
18
Typical External Circuits (2) – AXI4 IPs
rst
w
sw
pdi
pdi_valid
pdi_ready
sdi
sdi_valid
sdi_ready
dout
empty
read
m_axis_tdata
m_axis_tvalid
m_axis_tready
Master
AXI4−Stream
clk rstclk rst
clk rst
clk rst
s_axis_tdata
s_axis_tvalid
s_axis_tready
AXI4−Stream
Slavew
do
do_ready
do_valid
AEAD
clk
SDI
FIFO
19
Input/Output Format
status = FAIL
w−bit
instruction = ACTKEY instruction = ACTKEY
instruction = DEC
seg_0_header
seg_0 = Npub
seg_1_header
seg_2_header
seg_2 = AD
seg_3_header
seg_3 = Ciphertext
seg_1 = Enc Nsec
seg_0 = Npub
seg_2_header
seg_2 = AD
seg_3_header
seg_3 = Message
seg_1_header
seg_1 = Nsec
(a) (b)(d)
(c)
seg_0_header
instruction = ENCseg_0_header
seg_1_header
seg_2_header
seg_2 = Message
seg_1 = AD
seg_0 = Nsec
status = PASS
seg_4_headerseg_4 = Tag
PDI for encryption DO for encryption =PDI for decryption DO for decryption
npubnpub
TAG_SIZE
sdi_ready
sdi
sdi_valid
byp
ass_
full
byp
ass_
wr
KEY_SIZE
W
sdi
sdi_ready
pdi_ready
pdipdi_valid
sdi_valid
write
din
full empty
dout
read
FIFO
Bypass
DBLK_SIZE/8
DBLK_SIZE/8
bdi_decryptbdi_decrypt
nsec_readynsec_ready
bdi_pad_loc
bdi_valid_bytesbdi_valid_bytes
bdi_pad_loc
bdi_sizebdi_size
bdi_read bdi_read
exp_tag_ready exp_tag_ready
bdi_eot
bdi_eoi
bdi_eot
bdi_eoi
bdi_nodatabdi_nodata
BS_BYTES
bdi_proc
bdi_readybdi_ready
bdi_proc
bdi_adbdi_ad
nsec_readnsec_read
npub_readnpub_read
key_updated
key_needs_update
key_ready
key_needs_update
key_ready
key_updated
rdkey_ready
rdkey_read
rdkey_ready
rdkey_read
npub_ready npub_ready
W
W 4 W 3
do
do_ready
do_valid
do_ready
do
tag_ready
tag_write
msg_auth_valid
msg_auth_done
byp
ass_
em
pty
byp
ass_
rd
statusdoutctrldin
au
x_fifo
_sta
tus
au
x_fifo
_d
ou
t
au
x_fifo
_d
in
au
x_fifo
_ctrl
byp
ass_
da
ta
do_validProcessorPost
AUX FIFO
bdo_nsec
bdo_size
bdo_ready
bdo_write
bdo_data tag_dataProcessorPre
Controller
CipherCore
Datapath
CipherCore
W
TAG_SIZE
len_a
len_d
len_a
len_d
exp_tag exp_tag
CTR_AD_SIZE
tag
bdo
bdibdi
keykey
rdkeyrdkey
nsecnsecNSEC_SIZE
AEAD Core
CipherCore
pdi
pdi_valid
pdi_ready
DBLK_SIZE
RDKEY_SIZE
AEAD
SW
msg_auth_done
tag_ready
tag_write
msg_auth_valid
msg_auth_done
bdo_nsec
bdo_ready
bdo_write
bdo_size
DBLK_SIZE
BS_BYTES+1
CTR_D_SIZE
NPUB_SIZE
Block Diagram of a High-Speed Implementation of AEAD
22
PreProcessor:• parsing segment headers• loading and activating keys• Serial-In-Parallel-Out loading of input blocks• padding input blocks• keeping track of the number of data bytes left to process
PostProcessor:• clearing any portions of output blocks not belonging to ciphertext
or plaintext• Parallel-In-Serial-Out conversion of output blocks into words• formatting output words into segments• storing decrypted messages in AUX FIFO, until the result of
authentication is known• generating an error word if authentication fails
PreProcessor and PostProcessor for High-Speed Implementations (1)
23
Features:• Ease of use• No influence on the maximum clock frequency of AEAD
(up to 300 MHz in Virtex 7)• Limited area overhead• Clear separation between the AEAD Core unit and internal FIFOs
• Bypass FIFO – for passing headers and associated data directly to PostProcessor
• AUX FIFO – for temporarily storing unauthenticated messages after decryption
Benefits:• The designers can focus on designing the CipherCore specific to a
given algorithm, without worrying about the functionality common for multiple algorithms
• Full-block width interface of the CipherCore
PreProcessor and PostProcessor for High-Speed Implementations (2)
25
ManualDesign
HDLCode
Automated OptimizationFPGATools[+ATHENa]
PostP&RNetlist
PostPlace&Route
Results
Functional Verification
GMU Testbench
Timing Verification
GMU Testbench
InformalSpecification TestVectors
Proposed Development & Benchmarking Flow
1. Xilinx ISE + ATHENa2. Vivado + Default Strategies
AETVgenCcode
ImplementationParameters
26
• Universal Testbench supporting any authenticated cipher core following GMU AEAD API
• Change of cipher requires only changing test vector file• A Python script, AETVgen, created to automatically
generate test vector files representing multiple test cases• Encryption and Decryption• Empty Associated Data and/or Empty Message/Ciphertext• Various, randomly selected sizes of AD and Message/Ciphertext• Valid tag and invalid tag cases
• All source codes made available at GMU ATHENa website
Universal Testbench & Automated Test Vector Generation
27
• Generation of results possible for• CipherCore – full block width interface, incomplete functionality• AEAD Core - recommended• AEAD – difficulty with setting BRAM usage to 0 (if desired)
• Use of wrappers:• Out-of-context (OOC) mode available in Xilinx Vivado (no pin limit)• Generic wrappers available in case the number of port bits exceeds
the total number of user pins, when using Xilinx ISE• GMU Wrappers: 5 ports only (clk, rst, sin, sout, piso_mux_sel)
• Recommended Automated Optimization Procedure• ATHENa for Xilinx ISE and Altera Quartus II• 25 default optimization strategies for Xilinx Vivado
Result Generation
28
1. AES-COPA2. Ascon3. CLOC4. Deoxys5. ICEPOLE6. Joltik7. Keyak
Authenticated Ciphers Implemented Using Our Hardware API
8. Minalpher9. OCB10. PAEQ11. POET12. PRIMATEs-HANUMAN13. PRIMATES-GIBBON14. SCREAM
15. AES-GCM
CAESAR Candidates:
Current Standard:
32
AEAD Core vs. CipherCore Area Overhead for Virtex-6
clk, rst, sin, sout, piso_mux_selclk, rst, sin, sout, piso_mux_sel
Overhead = LUT(AEAD_Core)-LUT(CipherCore)
LUT(AEAD_Core)× 100%
33
Conclusions
clk, rst, sin, sout, piso_mux_selclk, rst, sin, sout, piso_mux_sel
• Complete Hardware API for authenticated ciphers• The Detailed Specification - ePrint 2015/669
• GMU hardware API with supporting materials• Available at: https://cryptography.gmu.edu/athena• Universal testbench and Automated Test Vector Generation• PreProcessor and PostProcessor Units for high-speed
implementations• Universal wrappers and scripts for generating results• Ease of recording and comparing results using ATHENa database
• GMU proposal open for discussion and possible improvements through• Better specification• Better implementation of supporting codes
34
• Support for two-pass algorithms• Pipelining of multiple streams of data • Detection and reporting of input formatting errors• Accepting inputs with padding done in software
Future Work