Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
Implementation of ECM Using FPGA devices
ECE646 – Dr. Kris GajMohammed Khaleeluddin Hoang Le Ramakrishna Bachimanchi
ECM Architecture Implementation 2
Introduction
Why factor numbers?Security of RSA relies on difficulty to factor large composites
n = p.q, known n, what is p and q?(in practice: n ~ 1024 bit)
In cryptanalysis:"Find efficient method for factoring (large) integers."
ECM Architecture Implementation 3
Introduction (cont.)
Different algorithms for different purposesBest known method for factoring large integers: GNFSMethods suited for factoring numbers of 100-200 bit, e.g.,
MPQSECM (small factors)Trial division (very, very small factors)
ECM Architecture Implementation 4
Introduction (cont.)
In GNFS, smoothness test of “medium size” integers are required.Why ECM?
Factor integers with relatively small factors (up to 200 bit)Almost ideal for hardware implementation:
Allows for low I/ORequires little memoryEasy to parallelizeClosely related to Elliptic Curve Cryptography (ECC)
ECM Architecture Implementation 5
Elliptic Curve
An elliptic curve is a plane curve defined by an equation of the form
a, b will determine the shape of the curve0 is the point at infinity (is a point which when added to the real number line yields a closed curve called the real projective line)
2 3y x ax b= + +
ECM Architecture Implementation 6
Elliptic Curve Method
Algorithm proposed by [H.W. Lenstra 1985]Principle based on Pollard‘s (p-1)-method
Step 1. Choose an integer k that is the product of primes to small powers.Step 2. Choose an integer a such that 1 < a < n.Step 3. Calculate GCD(a, n). If this is nontrivial, then we have a divisor d of n, so terminate.Step 4. Calculate d = GCD(a^k − 1, n). If d = 1 then go back to Step 1 and choose a different k. If d = n then go back to Step 2 and choose a different a. Otherwise we have a divisor of n, so terminate.Advantage over Pollard‘s (p-1)-method:
If no factor found, simply choose another curveEasy to parallelize
ECM Architecture Implementation 7
Elliptic Curve Method (Cont.)
Phase IComputer Q=k.P whereScalar Multiplication Algorithm
1 1 and logpep B p pk p e B≤= =∏
1 2 1 0
1 2 1 2
1 1 2
2 2
1 1
2 1 2
( , , . . . , , ) (p o in t a t in f in i ty ) ; ( , )
fo r ( 1 d o w n to 0 ) { i f ( 1) ; 2 ; e ls e 2 ; ; }
L L
i
k k k kP z e r o P C C
i LkP P PP P
P PP P P
− −
= == −
=
= +=
== +
ECM Architecture Implementation 8
Elliptic Curve Method (Cont.)
Phase II
Precompute a small table T of multiple k.QRepresent p in the form of p = m*D + k where
Fact:
Compute for all primes and compute the final gcd of N
1 2Compute and check if gcd( , ) 1ii i p Qp Q B p B z N⋅ ∀ ≤ ≤ >
2[1, ] and 2Dk D B∈ ≈
gcd( , ) 1 iff gcd( , ) 1pQ mDQ kQ mDQz N x x z N> − >
( )mDQ kQ kQ mDQx z x z−∏
ECM Architecture Implementation 9
Elliptic Curve Method (Cont.)
Elliptic curves and point arithmetic:Use curves in Montgomery form:
Point Addition:
Point Duplication:
2 3 2 2By z x Ax z xz= + +
2
2
[( )( ) ( )( )]
[( )( ) ( )( )]P Q P Q P P Q Q P P Q Q
P Q P Q P P Q Q P P Q Q
x z x z x z x z x z
z x x z x z x z x z+ −
− −
= − + + + −
= − + − + −
2 2
2 22
22
4 ( ) ( )
( ) ( )
4 [( ) 4 ( 2) / 4]
p P P P P P
P P P P P
P P P P P P P
x z x z x z
x x z x z
z x z x z x z A
= + − −
= + −
= − + +
ECM Architecture Implementation 10
ECM Architecture (operation table)
ADD SUB MUL-I MUL-II
a1=xP+zP s1=xP− zP NOP NOP
a2=xQ+zQ s2=xQ−zQ m1= (xP −zP )2 m2=(xP + zP )2
NOP s3=m2−m1 m3= s1 * a2 m4= s2 * a1
a3= m3+m4 s4=m3−m4 m5= m1 * m2 m6= s3 * c3
a4= m1+m6 NOP m7= a32 m8= s3
2
NOP NOP m9= s3 * a4 m10= s32
* c1
ECM Architecture Implementation 11
ECM Architecture (Global View)
One unit for 1 curveOne control unit forall 20 curves2 multiplier, 1 adder/subtractor, 1 local Memper unit
UNIT 1
A/S
M1
M2
LOCALMEM
UNIT 20
A/S
M1
M2
LOCALMEM
CONTROLUNIT
GLOBALMEM
ECM Architecture Implementation 12
Montgomery Multiplication
An efficient technique for multiplying two integers modulo M.
Replacing the modulus M by another divisor R for which the division step may be faster Iterative process of additions and shifts without involving any division by M (if R is a power of 2)
Conversions to and from Montgomery domain are required using Montgomery Multiplication.
ECM Architecture Implementation 13
Montgomery Multiplication (Cont.)
The algorithm in radix-2
0 0
[0] 0; 0 -1
( [ ] * ) mod 2; (1) [ 1] ( [ ] * * ) 2; (2)
; [ ];
i i
i i
Sfor i to n do
q S i A BS i S i A B q M div
end forreturn S n
=== ++ = + +
ECM Architecture Implementation 14
Montgomery Multiplication (Const.)
The critical delay of the algorithm above occurs in
Reduce propagation delayCPA vs. CSA
[ 1] ( [ ] * * ) 2i iS i S i A B q m div+ = + +
FA FA FA
Xn-1 Yn-1 Zn-1 Xn-2 Yn-2 Zn-2 Xo Yo Zo
Cn-1 Sn-1 Cn-2 Sn-2 Co So
SUM
X Y Z
C S
FA FA FA
Xn-1 Yn-1 Xn-2 Yn-2 Xo Yo
Cout Sn-1 Sn-2 So
SUM
X Y Z
S
Cin
Cout
ECM Architecture Implementation 15
ECM Multiplier Unit (Block Diagram)
MULTIPLIER
A_MB
write
A_M_Choice
start
C read
clk
reset
32 32
32
S1 S2
A (Shift_Reg)
B
CSR42
>>1
>>1
S1in S2in
A
B
zeros zeros
n
nnBB
S1out S2out Bout
carrysum
S1in
S2in
AND
S2out(0)
S1out(0)
Bout(0)
Ai
Ai qi
A1 A2 B C
SUM CARRY
Es Es
loadA
Eb
reg_rst reg_rst
reset
reset
SS1Ess
reset SS2Ess
reset
S1out S2out
qi
nnout
Eb
reset
Ai
A(0)
ECM Architecture Implementation 16
ECM Adder/Subtractor Unit
ADDER/SUBTRACTOR
A_M
B
write
A_M_Choice
add_sub
C read
clk
reset
32 32
32
+
REGset
C1
C2
LUT32X32MEM
<>
addr1 addr2WEL
OP1 OP2
A_M_Choice
T_S
A_M B
sub
Esrst
sign C
read
CinCout
sum1 sum2 EC1
EC2
A_M
ADDER
ECM Architecture Implementation 17
ECM Memories (Global, Local)
GLOBAL MEMORY
M
C1C2C3I2I3
nK
C1C2C3I2I3
K
Kout
R_K
Raddr
Kaddr
Data_in
Rwrite_in
32 32
10
0
1023
0
19
31 0
LOCALMEMORY
GREi
Data_out
B
A_M
C
Aaddr
W EA
W EBi
Baddr
Kout
Ain
W EB
Bin
Aaddr
Baddr
W EA
Aout
Bout
M
C1
C2
C3
I2
I3
I0
I1
31 00
511
32
32
32
32
9
20
9
Implementedusing B-RAM2 blocks for global Mem1 block for local Mem
ECM Architecture Implementation 18
ECM Instruction ROM
Total of 24 instructions32-bit wideImplemented using LUT32x32 ROM
INSTRUCTIONMEMORY (ROM)
MUL2
MUL1
SUB
ADD
MUL2
MUL1
SUB
ADD
Instr_addrInstr
00
31
31 0
23
3
1
2
32 5
ECM Architecture Implementation 19
ECM Control Unit (Phase I)
CONTROL UNIT
add_
sub
Instr Instr_addr Kout
Kaddr
read_add_subread_MUL1read_MUL2
start_MUL1start_MUL2
writ
e_M
UL2
writ
e_M
UL1
A_M
_Cho
ice
writ
e_ad
d_su
b
RstartRwrite_out
GREiWEBi
WEAAaddr
32 5 32
20
20
9
9
10
Baddr
ECM Architecture Implementation 20
ECM Phase I Result
Operation Our Implementation Previous work
Modular Addition 0.34 µS 2.00 µSModular Subtraction 0.34 µS 1.68 µSModular Multiplication 2.72 µS 64.5 µSModular Squaring 2.72 µS 64.5 µSPoint Addition(Phase-I) 14.28 µS 333 µSPoint Doubling(Phase-I) 14.28 µS 330 µSPhase-I 20 mS 912 mS
ECM Architecture Implementation 21
ECM Phase I Result (Cont.)
0
10
20
30
40
50
60
70
Modular Addition Modular Subtraction Modular Multiplication Modular Squaring
Our ImplementationPrevious work
ECM Architecture Implementation 22
ECM Phase I Result (Cont.)
0
50
100
150
200
250
300
350
Point Addition(Phase-I) Point Doubling(Phase-I)
Our ImplementationPrevious work
ECM Architecture Implementation 23
ECM Phase I Result (Cont.)
0
100
200
300
400
500
600
700
800
900
1000
Phase-I
Our ImplementationPrevious work
ECM Architecture Implementation 24
ECM Phase I Result Analysis
Architecture of our multiplier 272 clock cycles vs. 1612 in their case
Faster implementation in adder and subtractor unit
34 cycles vs. 50 cycles in their case Faster system clock frequency
100 MHz vs. 25 MHzTwo multipliers running in parallel
ECM Architecture Implementation 25
ECM Phase II - Proposal
InitializationPre-compute and loadtable of primes and k
Pre-computeCompute k.Q for all kCompute D.Q
ComputeCompute mmin.D.QCompute for all primes and compute the final gcd with NCompute mnextD.Q = mprevD.Q + D.Q
MAIN CONTROL
PHASE 1
PHASE 1INITIALIZATION
k.P &POINT ADDITION PHASE 2
INITIALIZATION
PHASE 2PRE-COMPUTE
PHASE 2COMPUTE
k.P k.P &Point Addition
( )mDQ kQ kQ m DQx z x z−∏
ECM Architecture Implementation 26
Conclusion
Better implementation in term of timeCost of areaScalable implementationFuture work
Complete Phase IIImplement on ASIC and SRC-6
ECM Architecture Implementation 27
Questions?
THANK YOU