Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 1 P230/MAPLD 2004

Elliptic Curve Cryptography over GF(2m) on a Reconfigurable Computer:

Polynomial Basis vs. Optimal Normal Basis Representation

Comparative Study

Kris Gaj, Sashisu Bajracharya, Nghi Nguyen, Deapesh Misra

Tarek El-GhazawiThe George Washington University

George Mason University


Interface

P memory

P memory

. . .

P P . . .

I/O Interface

FPGA memory

FPGA memory

. . .

FPGA FPGA . . .

I/O

Microprocessor systemReconfigurable processor

system

What is a reconfigurable computer?

P230/MAPLD 2004

Why cryptography is a good application for reconfigurable

computers?

• computationally intensive arithmetic operations

• unconventionally long operand sizes (160-2048 bits)

• multiple algorithms, parameters, key sizes, and architectures = need for reconfiguration


SRC Hardware & Software


SRC-6E from SRC Computers, Inc.


SNAP

ComputerMemory(1.5 GB)

P3(1 GHz)

P3(1 GHz)

/ /8000MB/s

MIOC

L2L2

800 MB/s

// 800 MB/s528 MB/s

DDRInterface

PCI-X

ControlFPGA

XC2V6000

800 MB/s

On-Board Memory(24 MB)

/4800 MB/s(6x64 bits)

FPGA 1XC2V6000

FPGA 2XC2V6000

/

4800 MB/s(6x 64 bits)

/

4800 MB/s(6x 64 bits)

2400 MB/s(192 bits)

/

/ /

(108 bits)

ChainPorts 2400 MB/s

(108 bits)

/

528 MB/s

½ MAPBoard

PBoard

8000MB/s

/ /

8bits

flags

64bitsdata

SRC Hardware Architecture


SRC Programming

HLL (C)

HDL (VHDL)

SRCP system

FPGA system

ApplicationProgrammer

LibraryDeveloper


SRC Compilation Process

Objectfiles

Application sources Macro sources

MAP CompilerP Compiler

Logic synthesis

Place & Route

Linker

.v files

.bin files

.ngo files

.o files .o files

Applicationexecutable

Configurationbitstreams

HDLsources

Netlists

.c or .f files .vhd or .v files

Objectfiles

Application sources Macro sources

MAP CompilerP Compiler

Logic synthesis

Place & Route

Linker

.v files

.bin files

.ngo files

.o files .o files

Applicationexecutable

Configurationbitstreams

HDLsources

Netlists

.c or .f files .vhd or .v files


Elliptic Curve Cryptosystems


Elliptic Curve Cryptosystems

public key (asymmetric) cryptosystems first true alternative for RSA

several times shorter keys

fast and compact implementations,

in particular in hardware

a family of cryptosystems, instead of a single

cryptosystem


Three Classes of Elliptic Curves

Elliptic curves built over

K = GF(p) K = GF(2m)

Polynomial basisrepresentation

Normal basisrepresentation

Fast in hardware

Arithmetic operations

presentin many libraries

Compact in hardware

m=155 .. 512

Secure m

Our m

m=233


Basic operations of ECC

Basic operations in Galois Field GF(2m)

Basic operations on points of an Elliptic Curve

• addition and subtraction (xor): x+y, x-y (XOR)

• addition of points: P + Q• doubling a point: 2 P• projective to affine coordinate: P2A

• multiplication, squaring: x y, x2

• inversion: x-1

Complex operations on points of an Elliptic Curve

• scalar multiplication: k P = P + P + …+P

k times


ECC hierarchy of functions

kP

P+Q 2P projective_to_affine (P2A)

MUL

INV

High level

Medium level

Low level 2

SQRXORLow

level 1

independent of the GF representation

specific to the givenGF representation


Investigated Partitioning Schemes


C function for P

C function for MAP

VHDLmacro

SRC Program Partitioning

P system

FPGA system

HLL

HDL


kPC function for P

C function for MAP

VHDLmacro

H

0

0

H00 Partitioning (μP Software Only)



kP

0

0

H

00H Partitioning (VHDL only)

C function for P

C function for MAP

VHDLmacro



MUL4

C functionfor MAP

VHDL macros

ROTSQR XOR

C function for µP

0

H

L1V_ROTPOW

kP

INV

P2A

P+Q 2P

kP

INV

P2A

P+QP+Q 2P2P

MUL2MUL2 MULMUL

0HL1 Partitioning




MUL

P+Q 2P

INV

P2A

kP

MUL MUL MUL

MUL4

MUL2 MUL POW

FPGA Contents (0HL1)


MUL4

C functionfor MAP

VHDL macros

ROTSQR XOR

C function for µP

0

H

L2INVINV

kP

P2AP+Q 2P

kP

P2AP+QP+Q 2P2P

MUL2MUL2

0HL2 Partitioning




0HM Partitioning

C function for MAP

VHDL macros

C functionfor µP

0

H

MP+QP+Q 2P2P P2AP2A

kPkP




Results


Timing Measurements

MAPAlloc.

MAP

FreeDMA

DataOut

DMA

Data In

FPGA

Computation

.c file

.mc file

End-to-End time (SW)

MAPfunction

MAP function

FPGA

Configure

Configuration time

MAP

Allocation

time

MAP

Release

Time

End-to-End time (HW)


Results for Optimal Normal Basis (Latency)

0

100

200

300

400

500

600

700

800

900

us

0HL1 0HL2 0HM 00H(VHDL)

Different Architectures

End to End Time

FPGA ComputationTime

0HL1 866 37 472 14 394 893 1.460HL2 863 37 469 14 394 895 1.450HM 592 37 201 12 391 1305 100H 592 39 201 17 391 1305 1

(Software)H00

FPGA Computat-ion Time (μs)

Data Transfer

In Time(μs)

End-to-End Time (μs)

772,519

Total Overhead

(μs)

Data Transfer

Out Time (μs)

Speedup vs.

Software

Slow-down vs. VHDL macro

System Level Architecture


Results for Polynomial Basis (Latency)

0HL2 943 66 560 7 383 33

(Software)H00

FPGA Computat-ion Time

(μs)

Data Transfer

In Time(μs)

End-to-End Time (μs)

31,050

Total Overhead

(μs)

Data Transfer

Out Time (μs)

Speedup vs.

SoftwareSystem Level Architecture


0HL1 99 1.68 57 1.3 68 2.610HL2 92 1.56 52 1.18 62 2.380HM 75 1.27 48 1.09 39 1.500H 59 1 44 1 26 1

LUT increase vs. pure VHDL

% of FFs

(out of 67,584)

FF count increase vs. pure VHDL


% of CLB slices

(out of 33792)

CLB increase vs. pure VHDL

% of LUTs(out of 67,584)

0

10

20

30

40

50

60

70

80

90

100

%

0HL1 0HL2 0HM 00H

Different Architectures

CLB slices

LUT

FF

Results for Optimal Normal Basis (Area)


0HL2 93 N/A 48 N/A 69 N/A


% of CLB slices

(out of 33792)

CLB increase vs. pure VHDL

% of LUTs(out of 67,584)

LUT increase vs. pure VHDL

% of FFs

(out of 67,584)

FF count increase vs. pure VHDL

Results for the Polynomial Basis (Area)


Algorithm Partitioning

Scheme

VHDL

PB

VHDL

ONB

Macro Wrapper

MAP C Main C

0HL1 N/A 1007 260 371 153

0HL2 714 1291 230 349 153

0HM N/A 1744 160 185 153

00H N/A 1960 36 78 153

Number of lines of code


ConclusionsAssuming focus on:

Timing Resources

Ease of programming


Best implementation approaches

Large speedup vs. softwareEase of implementation

Flexibility

MUL4

C functionfor MAP

VHDL macros

ROTSQR XOR

C function for µP

0

H

L1V_ROTPOW

kP

INV

P2A

P+Q 2P

kP

INV

P2A

P+QP+Q 2P2P

MUL2MUL2 MULMULMUL4

C functionfor MAP

VHDL macros

ROTSQR XOR

C function for µP

0

H

L1V_ROTPOW

kP

INV

P2A

P+QP+Q 2P2P

kP

INV

P2A

P+QP+Q 2P2P

MUL2MUL2 MULMULMULMUL MUL4

C functionfor MAP

VHDL macros

ROTSQR XOR

C function for µP

0

H

L2INVINV

kP

P2AP+QP+Q 2P2P

kP

P2AP+QP+Q 2P2P

MUL2MUL2MUL2MUL2

Optimal Normal Basis

OHL1 scheme

Polynomial Basis

OHL2 scheme


Conclusions

• Elliptic Curve Cryptosystem implementation challenging for reconfigurable computers because of

• optimization for latency rather than throughput• limited amount of parallelism

• Absolute latency and resource utilization similar for Optimal Normal Basis and Polynomial Basis

• Number of lines of VHDL code smaller for the polynomial basis representation


Conclusions – cont.

because of polynomial basis operations more efficient in software

27 x greater for Optimal Normal Basis compared to Polynomial Basis Representation

From 893 to 1305 times for Optimal Normal BasisFrom 33 times for Polynomial Basis

Speed-up over Intel P3 microprocessor implementation

Documents

Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative