32
Gaj 1 P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative Study Kris Gaj , Sashisu Bajracharya, Nghi Nguyen, Deapesh Misra Tarek El-Ghazawi The George Washington University George Mason University

Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Embed Size (px)

Citation preview

Page 1: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 1 P230/MAPLD 2004

Elliptic Curve Cryptography over GF(2m) on a Reconfigurable Computer:

Polynomial Basis vs. Optimal Normal Basis Representation

Comparative Study

Kris Gaj, Sashisu Bajracharya, Nghi Nguyen, Deapesh Misra

Tarek El-GhazawiThe George Washington University

George Mason University

Page 2: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 2 P230/MAPLD 2004

Interface

P memory

P memory

. . .

P P . . .

I/O Interface

FPGA memory

FPGA memory

. . .

FPGA FPGA . . .

I/O

Microprocessor systemReconfigurable processor

system

What is a reconfigurable computer?

Page 3: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

P230/MAPLD 2004

Why cryptography is a good application for reconfigurable

computers?

• computationally intensive arithmetic operations

• unconventionally long operand sizes (160-2048 bits)

• multiple algorithms, parameters, key sizes, and architectures = need for reconfiguration

Page 4: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 4 P230/MAPLD 2004

SRC Hardware & Software

Page 5: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 5 P230/MAPLD 2004

SRC-6E from SRC Computers, Inc.

Page 6: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 6 P230/MAPLD 2004

SNAP

ComputerMemory(1.5 GB)

P3(1 GHz)

P3(1 GHz)

/ /8000MB/s

MIOC

L2L2

800 MB/s

// 800 MB/s528 MB/s

DDRInterface

PCI-X

ControlFPGA

XC2V6000

800 MB/s

On-Board Memory(24 MB)

/4800 MB/s(6x64 bits)

FPGA 1XC2V6000

FPGA 2XC2V6000

/

4800 MB/s(6x 64 bits)

/

4800 MB/s(6x 64 bits)

2400 MB/s(192 bits)

/

/ /

(108 bits)

ChainPorts 2400 MB/s

(108 bits)

/

528 MB/s

½ MAPBoard

PBoard

8000MB/s

/ /

8bits

flags

64bitsdata

SRC Hardware Architecture

Page 7: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 7 P230/MAPLD 2004

SRC Programming

HLL (C)

HDL (VHDL)

SRCP system

FPGA system

ApplicationProgrammer

LibraryDeveloper

Page 8: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 8 P230/MAPLD 2004

SRC Compilation Process

Objectfiles

Application sources Macro sources

MAP CompilerP Compiler

Logic synthesis

Place & Route

Linker

.v files

.bin files

.ngo files

.o files .o files

Applicationexecutable

Configurationbitstreams

HDLsources

Netlists

.c or .f files .vhd or .v files

Objectfiles

Application sources Macro sources

MAP CompilerP Compiler

Logic synthesis

Place & Route

Linker

.v files

.bin files

.ngo files

.o files .o files

Applicationexecutable

Configurationbitstreams

HDLsources

Netlists

.c or .f files .vhd or .v files

Page 9: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 9 P230/MAPLD 2004

Elliptic Curve Cryptosystems

Page 10: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 10 P230/MAPLD 2004

Elliptic Curve Cryptosystems

public key (asymmetric) cryptosystems first true alternative for RSA

several times shorter keys

fast and compact implementations,

in particular in hardware

a family of cryptosystems, instead of a single

cryptosystem

Page 11: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 11 P230/MAPLD 2004

Three Classes of Elliptic Curves

Elliptic curves built over

K = GF(p) K = GF(2m)

Polynomial basisrepresentation

Normal basisrepresentation

Fast in hardware

Arithmetic operations

presentin many libraries

Compact in hardware

m=155 .. 512

Secure m

Our m

m=233

Page 12: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 12 P230/MAPLD 2004

Basic operations of ECC

Basic operations in Galois Field GF(2m)

Basic operations on points of an Elliptic Curve

• addition and subtraction (xor): x+y, x-y (XOR)

• addition of points: P + Q• doubling a point: 2 P• projective to affine coordinate: P2A

• multiplication, squaring: x y, x2

• inversion: x-1

Complex operations on points of an Elliptic Curve

• scalar multiplication: k P = P + P + …+P

k times

Page 13: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 13 P230/MAPLD 2004

ECC hierarchy of functions

kP

P+Q 2P projective_to_affine (P2A)

MUL

INV

High level

Medium level

Low level 2

SQRXORLow

level 1

independent of the GF representation

specific to the givenGF representation

Page 14: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 14 P230/MAPLD 2004

Investigated Partitioning Schemes

Page 15: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 15 P230/MAPLD 2004

C function for P

C function for MAP

VHDLmacro

SRC Program Partitioning

P system

FPGA system

HLL

HDL

Page 16: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 16 P230/MAPLD 2004

kPC function for P

C function for MAP

VHDLmacro

H

0

0

H00 Partitioning (μP Software Only)

specific to the givenGF representation

Page 17: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 17 P230/MAPLD 2004

kP

0

0

H

00H Partitioning (VHDL only)

C function for P

C function for MAP

VHDLmacro

specific to the givenGF representation

Page 18: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 18 P230/MAPLD 2004

MUL4

C functionfor MAP

VHDL macros

ROTSQR XOR

C function for µP

0

H

L1V_ROTPOW

kP

INV

P2A

P+Q 2P

kP

INV

P2A

P+QP+Q 2P2P

MUL2MUL2 MULMUL

0HL1 Partitioning

independent of the GF representation

specific to the givenGF representation

Page 19: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 19 P230/MAPLD 2004

MUL

P+Q 2P

INV

P2A

kP

MUL MUL MUL

MUL4

MUL2 MUL POW

FPGA Contents (0HL1)

Page 20: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 20 P230/MAPLD 2004

MUL4

C functionfor MAP

VHDL macros

ROTSQR XOR

C function for µP

0

H

L2INVINV

kP

P2AP+Q 2P

kP

P2AP+QP+Q 2P2P

MUL2MUL2

0HL2 Partitioning

independent of the GF representation

specific to the givenGF representation

Page 21: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 21 P230/MAPLD 2004

0HM Partitioning

C function for MAP

VHDL macros

C functionfor µP

0

H

MP+QP+Q 2P2P P2AP2A

kPkP

independent of the GF representation

specific to the givenGF representation

Page 22: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 22 P230/MAPLD 2004

Results

Page 23: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 23 P230/MAPLD 2004

Timing Measurements

MAPAlloc.

MAP

FreeDMA

DataOut

DMA

Data In

FPGA

Computation

.c file

.mc file

End-to-End time (SW)

MAPfunction

MAP function

FPGA

Configure

Configuration time

MAP

Allocation

time

MAP

Release

Time

End-to-End time (HW)

Page 24: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 24 P230/MAPLD 2004

Results for Optimal Normal Basis (Latency)

0

100

200

300

400

500

600

700

800

900

us

0HL1 0HL2 0HM 00H(VHDL)

Different Architectures

End to End Time

FPGA ComputationTime

0HL1 866 37 472 14 394 893 1.460HL2 863 37 469 14 394 895 1.450HM 592 37 201 12 391 1305 100H 592 39 201 17 391 1305 1

(Software)H00

FPGA Computat-ion Time (μs)

Data Transfer

In Time(μs)

End-to-End Time (μs)

772,519

Total Overhead

(μs)

Data Transfer

Out Time (μs)

Speedup vs.

Software

Slow-down vs. VHDL macro

System Level Architecture

Page 25: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 25 P230/MAPLD 2004

Results for Polynomial Basis (Latency)

0HL2 943 66 560 7 383 33

(Software)H00

FPGA Computat-ion Time

(μs)

Data Transfer

In Time(μs)

End-to-End Time (μs)

31,050

Total Overhead

(μs)

Data Transfer

Out Time (μs)

Speedup vs.

SoftwareSystem Level Architecture

Page 26: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 26 P230/MAPLD 2004

0HL1 99 1.68 57 1.3 68 2.610HL2 92 1.56 52 1.18 62 2.380HM 75 1.27 48 1.09 39 1.500H 59 1 44 1 26 1

LUT increase vs. pure VHDL

% of FFs

(out of 67,584)

FF count increase vs. pure VHDL

System Level Architecture

% of CLB slices

(out of 33792)

CLB increase vs. pure VHDL

% of LUTs(out of 67,584)

0

10

20

30

40

50

60

70

80

90

100

%

0HL1 0HL2 0HM 00H

Different Architectures

CLB slices

LUT

FF

Results for Optimal Normal Basis (Area)

Page 27: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 27 P230/MAPLD 2004

0HL2 93 N/A 48 N/A 69 N/A

System Level Architecture

% of CLB slices

(out of 33792)

CLB increase vs. pure VHDL

% of LUTs(out of 67,584)

LUT increase vs. pure VHDL

% of FFs

(out of 67,584)

FF count increase vs. pure VHDL

Results for the Polynomial Basis (Area)

Page 28: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 28 P230/MAPLD 2004

Algorithm Partitioning

Scheme

VHDL

PB

VHDL

ONB

Macro Wrapper

MAP C Main C

0HL1 N/A 1007 260 371 153

0HL2 714 1291 230 349 153

0HM N/A 1744 160 185 153

00H N/A 1960 36 78 153

Number of lines of code

Page 29: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 29 P230/MAPLD 2004

ConclusionsAssuming focus on:

Timing Resources

Ease of programming

Page 30: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 30 P230/MAPLD 2004

Best implementation approaches

Large speedup vs. softwareEase of implementation

Flexibility

MUL4

C functionfor MAP

VHDL macros

ROTSQR XOR

C function for µP

0

H

L1V_ROTPOW

kP

INV

P2A

P+Q 2P

kP

INV

P2A

P+QP+Q 2P2P

MUL2MUL2 MULMULMUL4

C functionfor MAP

VHDL macros

ROTSQR XOR

C function for µP

0

H

L1V_ROTPOW

kP

INV

P2A

P+QP+Q 2P2P

kP

INV

P2A

P+QP+Q 2P2P

MUL2MUL2 MULMULMULMUL MUL4

C functionfor MAP

VHDL macros

ROTSQR XOR

C function for µP

0

H

L2INVINV

kP

P2AP+QP+Q 2P2P

kP

P2AP+QP+Q 2P2P

MUL2MUL2MUL2MUL2

Optimal Normal Basis

OHL1 scheme

Polynomial Basis

OHL2 scheme

Page 31: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 31 P230/MAPLD 2004

Conclusions

• Elliptic Curve Cryptosystem implementation challenging for reconfigurable computers because of

• optimization for latency rather than throughput• limited amount of parallelism

• Absolute latency and resource utilization similar for Optimal Normal Basis and Polynomial Basis

• Number of lines of VHDL code smaller for the polynomial basis representation

Page 32: Gaj1P230/MAPLD 2004 Elliptic Curve Cryptography over GF(2 m ) on a Reconfigurable Computer: Polynomial Basis vs. Optimal Normal Basis Representation Comparative

Gaj 32 P230/MAPLD 2004

Conclusions – cont.

because of polynomial basis operations more efficient in software

27 x greater for Optimal Normal Basis compared to Polynomial Basis Representation

From 893 to 1305 times for Optimal Normal BasisFrom 33 times for Polynomial Basis

Speed-up over Intel P3 microprocessor implementation