Upload
aleesha-beatrix-owen
View
218
Download
0
Embed Size (px)
Citation preview
Gaj 1 P230/MAPLD 2004
Elliptic Curve Cryptography over GF(2m) on a Reconfigurable Computer:
Polynomial Basis vs. Optimal Normal Basis Representation
Comparative Study
Kris Gaj, Sashisu Bajracharya, Nghi Nguyen, Deapesh Misra
Tarek El-GhazawiThe George Washington University
George Mason University
Gaj 2 P230/MAPLD 2004
Interface
P memory
P memory
. . .
P P . . .
I/O Interface
FPGA memory
FPGA memory
. . .
FPGA FPGA . . .
I/O
Microprocessor systemReconfigurable processor
system
What is a reconfigurable computer?
P230/MAPLD 2004
Why cryptography is a good application for reconfigurable
computers?
• computationally intensive arithmetic operations
• unconventionally long operand sizes (160-2048 bits)
• multiple algorithms, parameters, key sizes, and architectures = need for reconfiguration
Gaj 4 P230/MAPLD 2004
SRC Hardware & Software
Gaj 5 P230/MAPLD 2004
SRC-6E from SRC Computers, Inc.
Gaj 6 P230/MAPLD 2004
SNAP
ComputerMemory(1.5 GB)
P3(1 GHz)
P3(1 GHz)
/ /8000MB/s
MIOC
L2L2
800 MB/s
// 800 MB/s528 MB/s
DDRInterface
PCI-X
ControlFPGA
XC2V6000
800 MB/s
On-Board Memory(24 MB)
/4800 MB/s(6x64 bits)
FPGA 1XC2V6000
FPGA 2XC2V6000
/
4800 MB/s(6x 64 bits)
/
4800 MB/s(6x 64 bits)
2400 MB/s(192 bits)
/
/ /
(108 bits)
ChainPorts 2400 MB/s
(108 bits)
/
528 MB/s
½ MAPBoard
PBoard
8000MB/s
/ /
8bits
flags
64bitsdata
SRC Hardware Architecture
Gaj 7 P230/MAPLD 2004
SRC Programming
HLL (C)
HDL (VHDL)
SRCP system
FPGA system
ApplicationProgrammer
LibraryDeveloper
Gaj 8 P230/MAPLD 2004
SRC Compilation Process
Objectfiles
Application sources Macro sources
MAP CompilerP Compiler
Logic synthesis
Place & Route
Linker
.v files
.bin files
.ngo files
.o files .o files
Applicationexecutable
Configurationbitstreams
HDLsources
Netlists
.c or .f files .vhd or .v files
Objectfiles
Application sources Macro sources
MAP CompilerP Compiler
Logic synthesis
Place & Route
Linker
.v files
.bin files
.ngo files
.o files .o files
Applicationexecutable
Configurationbitstreams
HDLsources
Netlists
.c or .f files .vhd or .v files
Gaj 9 P230/MAPLD 2004
Elliptic Curve Cryptosystems
Gaj 10 P230/MAPLD 2004
Elliptic Curve Cryptosystems
public key (asymmetric) cryptosystems first true alternative for RSA
several times shorter keys
fast and compact implementations,
in particular in hardware
a family of cryptosystems, instead of a single
cryptosystem
Gaj 11 P230/MAPLD 2004
Three Classes of Elliptic Curves
Elliptic curves built over
K = GF(p) K = GF(2m)
Polynomial basisrepresentation
Normal basisrepresentation
Fast in hardware
Arithmetic operations
presentin many libraries
Compact in hardware
m=155 .. 512
Secure m
Our m
m=233
Gaj 12 P230/MAPLD 2004
Basic operations of ECC
Basic operations in Galois Field GF(2m)
Basic operations on points of an Elliptic Curve
• addition and subtraction (xor): x+y, x-y (XOR)
• addition of points: P + Q• doubling a point: 2 P• projective to affine coordinate: P2A
• multiplication, squaring: x y, x2
• inversion: x-1
Complex operations on points of an Elliptic Curve
• scalar multiplication: k P = P + P + …+P
k times
Gaj 13 P230/MAPLD 2004
ECC hierarchy of functions
kP
P+Q 2P projective_to_affine (P2A)
MUL
INV
High level
Medium level
Low level 2
SQRXORLow
level 1
independent of the GF representation
specific to the givenGF representation
Gaj 14 P230/MAPLD 2004
Investigated Partitioning Schemes
Gaj 15 P230/MAPLD 2004
C function for P
C function for MAP
VHDLmacro
SRC Program Partitioning
P system
FPGA system
HLL
HDL
Gaj 16 P230/MAPLD 2004
kPC function for P
C function for MAP
VHDLmacro
H
0
0
H00 Partitioning (μP Software Only)
specific to the givenGF representation
Gaj 17 P230/MAPLD 2004
kP
0
0
H
00H Partitioning (VHDL only)
C function for P
C function for MAP
VHDLmacro
specific to the givenGF representation
Gaj 18 P230/MAPLD 2004
MUL4
C functionfor MAP
VHDL macros
ROTSQR XOR
C function for µP
0
H
L1V_ROTPOW
kP
INV
P2A
P+Q 2P
kP
INV
P2A
P+QP+Q 2P2P
MUL2MUL2 MULMUL
0HL1 Partitioning
independent of the GF representation
specific to the givenGF representation
Gaj 19 P230/MAPLD 2004
MUL
P+Q 2P
INV
P2A
kP
MUL MUL MUL
MUL4
MUL2 MUL POW
FPGA Contents (0HL1)
Gaj 20 P230/MAPLD 2004
MUL4
C functionfor MAP
VHDL macros
ROTSQR XOR
C function for µP
0
H
L2INVINV
kP
P2AP+Q 2P
kP
P2AP+QP+Q 2P2P
MUL2MUL2
0HL2 Partitioning
independent of the GF representation
specific to the givenGF representation
Gaj 21 P230/MAPLD 2004
0HM Partitioning
C function for MAP
VHDL macros
C functionfor µP
0
H
MP+QP+Q 2P2P P2AP2A
kPkP
independent of the GF representation
specific to the givenGF representation
Gaj 22 P230/MAPLD 2004
Results
Gaj 23 P230/MAPLD 2004
Timing Measurements
MAPAlloc.
MAP
FreeDMA
DataOut
DMA
Data In
FPGA
Computation
.c file
.mc file
End-to-End time (SW)
MAPfunction
MAP function
FPGA
Configure
Configuration time
MAP
Allocation
time
MAP
Release
Time
End-to-End time (HW)
Gaj 24 P230/MAPLD 2004
Results for Optimal Normal Basis (Latency)
0
100
200
300
400
500
600
700
800
900
us
0HL1 0HL2 0HM 00H(VHDL)
Different Architectures
End to End Time
FPGA ComputationTime
0HL1 866 37 472 14 394 893 1.460HL2 863 37 469 14 394 895 1.450HM 592 37 201 12 391 1305 100H 592 39 201 17 391 1305 1
(Software)H00
FPGA Computat-ion Time (μs)
Data Transfer
In Time(μs)
End-to-End Time (μs)
772,519
Total Overhead
(μs)
Data Transfer
Out Time (μs)
Speedup vs.
Software
Slow-down vs. VHDL macro
System Level Architecture
Gaj 25 P230/MAPLD 2004
Results for Polynomial Basis (Latency)
0HL2 943 66 560 7 383 33
(Software)H00
FPGA Computat-ion Time
(μs)
Data Transfer
In Time(μs)
End-to-End Time (μs)
31,050
Total Overhead
(μs)
Data Transfer
Out Time (μs)
Speedup vs.
SoftwareSystem Level Architecture
Gaj 26 P230/MAPLD 2004
0HL1 99 1.68 57 1.3 68 2.610HL2 92 1.56 52 1.18 62 2.380HM 75 1.27 48 1.09 39 1.500H 59 1 44 1 26 1
LUT increase vs. pure VHDL
% of FFs
(out of 67,584)
FF count increase vs. pure VHDL
System Level Architecture
% of CLB slices
(out of 33792)
CLB increase vs. pure VHDL
% of LUTs(out of 67,584)
0
10
20
30
40
50
60
70
80
90
100
%
0HL1 0HL2 0HM 00H
Different Architectures
CLB slices
LUT
FF
Results for Optimal Normal Basis (Area)
Gaj 27 P230/MAPLD 2004
0HL2 93 N/A 48 N/A 69 N/A
System Level Architecture
% of CLB slices
(out of 33792)
CLB increase vs. pure VHDL
% of LUTs(out of 67,584)
LUT increase vs. pure VHDL
% of FFs
(out of 67,584)
FF count increase vs. pure VHDL
Results for the Polynomial Basis (Area)
Gaj 28 P230/MAPLD 2004
Algorithm Partitioning
Scheme
VHDL
PB
VHDL
ONB
Macro Wrapper
MAP C Main C
0HL1 N/A 1007 260 371 153
0HL2 714 1291 230 349 153
0HM N/A 1744 160 185 153
00H N/A 1960 36 78 153
Number of lines of code
Gaj 29 P230/MAPLD 2004
ConclusionsAssuming focus on:
Timing Resources
Ease of programming
Gaj 30 P230/MAPLD 2004
Best implementation approaches
Large speedup vs. softwareEase of implementation
Flexibility
MUL4
C functionfor MAP
VHDL macros
ROTSQR XOR
C function for µP
0
H
L1V_ROTPOW
kP
INV
P2A
P+Q 2P
kP
INV
P2A
P+QP+Q 2P2P
MUL2MUL2 MULMULMUL4
C functionfor MAP
VHDL macros
ROTSQR XOR
C function for µP
0
H
L1V_ROTPOW
kP
INV
P2A
P+QP+Q 2P2P
kP
INV
P2A
P+QP+Q 2P2P
MUL2MUL2 MULMULMULMUL MUL4
C functionfor MAP
VHDL macros
ROTSQR XOR
C function for µP
0
H
L2INVINV
kP
P2AP+QP+Q 2P2P
kP
P2AP+QP+Q 2P2P
MUL2MUL2MUL2MUL2
Optimal Normal Basis
OHL1 scheme
Polynomial Basis
OHL2 scheme
Gaj 31 P230/MAPLD 2004
Conclusions
• Elliptic Curve Cryptosystem implementation challenging for reconfigurable computers because of
• optimization for latency rather than throughput• limited amount of parallelism
• Absolute latency and resource utilization similar for Optimal Normal Basis and Polynomial Basis
• Number of lines of VHDL code smaller for the polynomial basis representation
Gaj 32 P230/MAPLD 2004
Conclusions – cont.
because of polynomial basis operations more efficient in software
27 x greater for Optimal Normal Basis compared to Polynomial Basis Representation
From 893 to 1305 times for Optimal Normal BasisFrom 33 times for Polynomial Basis
Speed-up over Intel P3 microprocessor implementation