Upload
anis-winifred-booth
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Lab Session 2
Design of Elliptic Curve Cryptosystem
Debdeep Mukhopadhyay Chester Rebeiro
Dept. of Computer Science and Engineering
Indian Institute of Technology Kharagpur
INDIA
The Processor Overview
Register Bank: regbank.v (stores temporary data)ROM: stores the curve constant ‘b’ and the base points.Arithmetic Unit: ec_alu.v (performs the underlying field
computations)Control Unit: smul.v (sequences the field computations
for performing the point operations)
Register Bank•Heart of the register file is 8 registers of size 233 bits.• Organized as 3 banks:RA,
RB, RC• Dual port Distributed RAMs
of the Xilinx FPGAs• Asynchronous Read• Synchronous Write• we: write enable signal• Inputs: C0, C1, Qout
• Outputs: A0, A1, A2, A3 and Qin
Module regbank.v
module regbank(clk, cwh, c0, c1, a0, a1, a2, a3);input wire clk; input wire [22:0] cwh; /* control word */input wire [232:0] c0, c1; /* Inputs to regbank from
ALU */output wire [232:0] a0, a1, a2, a3; /* Output from
regbank to ALU */
wire rb1_we;wire [3:0] rb1_addr1, rb2_addr1, rb3_addr1;wire [3:0] rb1_addr2, rb2_addr2, rb3_addr2;wire [232:0] rb1_din, rb2_din, rb3_din;wire [232:0] rb1_dout1, rb2_dout1, rb3_dout1;wire [232:0] rb1_dout2, rb2_dout2, rb3_dout2;wire [232:0] qin, qout;
/* Instances of distributed memory */XC3S_RAM16X233 regbank1(rb1_din, rb1_addr1,
rb1_addr2, rb1_we, clk, rb1_dout1, rb1_dout2);XC3S_RAM16X233_regbank2(rb2_din, rb2_addr1,
rb2_addr2, rb2_we, clk, rb2_dout1, rb2_dout2);XC3S_RAM16X233_D regbank3(rb3_din, rb3_addr1,
rb3_addr2, rb3_we, clk, rb3_dout1, rb3_dout2);
/* Quadblock instance */bquadblk bqb(cwh[20], qin, cwh[19:16], qout);
assign qin = (cwh[21] == 1'b1) ? a1 : a2;
assign rb1_addr1 = {3'b0, cwh[0]};assign rb1_addr2 = {3'b0, cwh[1]};assign rb2_addr1 = {2'b0, cwh[4:3]};assign rb2_addr2 = {2'b0, cwh[6:5]};assign rb3_addr1 = {3'b0, cwh[8]};assign rb3_addr2 = {3'b0, cwh[9]};
/* a0 to a3 are fed to the ALU */assign a0 = (cwh[11] == 1'b0) ? rb1_dout1 : rb2_dout2;assign a2 = (cwh[12] == 1'b0) ? rb2_dout1 : rb1_dout2;assign a1 = (cwh[13] == 1'b0) ? rb3_dout1 : rb2_dout2;assign a3 = rb3_dout2;
/* Select what get written into RAM */assign rb1_we = cwh[2];assign rb1_din = (cwh[22] == 1'b1) ? c0: ((cwh[14] == 1'b0) ? c0 : c1);
assign rb2_we = cwh[7];assign rb2_din = (cwh[22] == 1'b1) ? c1: ((cwh[15] == 1'b0) ? c0 : c1);
assign rb3_we = cwh[10];assign rb3_din = (cwh[22] == 1'b1) ? 233'h1
: (cwh[20] == 1'b1) ? qout: c0;endmodule
The ALU for the ECC Processor
The AU performs:• point operations (doubling and
addition) in Lopez Dahab Projective co-ordinates efficiently.
• Inversion (from projective co-ordinates) to affine co-ordinates.
•The AU has 5 inputs (the outputs of the regbank) and 3 outputs (the inputs of the regbank).
•The computation of the AU has 2 phases:• Point addition and doubling• Inversion
•The AU consists of:• Hybrid Karatsuba Multiplier (used
at all time steps)• Quad block (used in phase 2 for
the final invesion)•The Quadblock has 14 cascade steps (as described before) and computes the quading operation repeatedly (as per the value of c[29…26]
The verilog code for ALU
module ec_alu(cw, a0, a1, a2, a3, c0, c1);input wire [232:0] a0, a1, a2, a3; /* the inputs to
the alu */input wire [9:0] cw; /* the control word */output wire [232:0] c0, c1; /* the alu outputs
*/
/* Temporary results */wire [232:0] a0sq, a0qu;wire [232:0] a1sq, a1qu;wire [232:0] a2sq, a2qu;wire [232:0] sa2, sa4, sa5, sa7, sa8, sa8_1;wire [232:0] sc1;wire [232:0] sd2, sd2_1;
/* Multiplier inputs and output */wire [232:0] minA, minB, mout;
multiplier mul(minA, minB, mout);squarer sq1_p0(a0, a0sq);squarer sq_p1(a1, a1sq);squarer sq_p2(a2, a2sq);
squarer sq2_p2(a2sq, a2qu);squarer sq2_p1(a1sq, a1qu);squarer sq2_p3(a0sq, a0qu);
/* Choose the inputs to the Multiplier */mux8 muxA(a0, a0sq, a2, sa7, sd2, a1, a1qu, 233'd0, cw[2:0], minA);mux8 muxB(a1, a1sq, sa4, sa8, sd2_1, a3, a2qu,a1qu, cw[5:3], minB);
/* Choose the outputs of the ALU */mux4 muxC(mout, sa2, a1sq, sc1, cw[7:6], c0); mux4 muxD(sa8_1, sa5, a1qu, sd2, cw[9:8], c1);
assign sa2 = mout ^ a2;assign sa4 = a1sq ^ a2;assign sa5 = mout ^ a2sq ^ a0;assign sa7 = a0 ^ a2;assign sa8 = a1 ^ a3;assign sa8_1 = mout ^ a0;
assign sc1 = mout ^ a3;
assign sd2 = a0qu ^ a1;assign sd2_1 = a2sq ^ a3 ^ a1;
endmodule
Control Unit
The CU is hardwired.It generates 33 control signals, which
determine the flow of data.c[0…9]: controls the input to the
multiplier and the output C0 and C1 of the AU
c[26…29]: select line of the multiplexers inside the quad block
Remaining control lines are for read and write of the registers in the register file
Projective Point Arithmetic
Point Doubling:◦ Input : (X1,Y1,Z1),
Output: (X4, Y4, Z4)
◦ Constraint: One multiplier Can perform one
multiplication per clock cycle
Hence needs four time steps
Projective Point Doubling Sequencing
Projective Point Addition
• Input : P=(X1,Y1,Z1), Q=(x2,y2)
• Output: (X3, Y3, Z3)
◦ Constraint: One multiplier Can perform one
multiplication per clock cycle
Hence needs eight time steps
Projective Point Addition Sequencing
The State Machine for the CU
Design of the CU
The verilog code for the CU/* Output Logic */always @(state) begin
case(state)6'd0: begin cwl <= 10'h000; /* Init L2R Step 1 */cwh <= 23'h4x8484;end6'd1: begincwl <= 10'h000;cwh <= 23'h4x808D; /* Init L2R Step 2 */end6'd2: begincwl <= 10'hx; /* Init L2R Step 3 */cwh <= 23'h4xx098;end/* The Doubling*/6'd3: begin /* Double Step 1 */cwl <= 10'h209;cwh <= 23'h0x8490;end6'd4: begin /* Double Step 1a */cwl <= 10'h002;cwh <= 23'h0x20F0;end6'd5: begin /* Double Step 2 */cwl <= 10'h324;cwh <= 23'h0x6544;end6'd6: begin /* Double Step 3 */cwl <= 10'hxC0;cwh <= 23'h0x0ac0;end
/* The Addition States */6'd7: begin /* Addition Step 1 */
cwl <= 10'h048;cwh <= 23'h0x08a0;
end6'd8: begin /* Addition Step 2 */
cwl <= 10'h002;cwh <= 23'h0x5006;
end6'd9: begin /* Addition Step 3 */
cwl <= 10'h028;cwh <= 23'h0x0090;
end6'd10: begin /* Addition Step 4 */
cwl <= 10'h011;cwh <= 23'h0x0214;
end6'd11: begin /* Addition Step 5 */
cwl <= 10'h102;cwh <= 23'h0x6544;
end6'd12: begin /* Addition Step 6 */
cwl <= 10'h08A;cwh <= 23'h0xB4D2;
end6'd13: begin /* Addition Step 7 */
cwl <= 10'h00B;cwh <= 23'h0x18A2;
end6'd14: begin /* Addition Step 8 */
cwl <= 10'h058;cwh <= 23'h0x0ac0;
end
The final Inversion Step
The verilog snippet/* The final Inverse : Starting the Itoh Tsujii
here*/6'd15: begin /* Inv 1 */cwl <= 10'hx0D; cwh <= 23'h0x04x0; /* The first a=a^3 */end6'd16: begin /* Inv 2 */ cwl <= 10'hx06;cwh <= 23'h0x0090;end6'd17: begin /* Inv 3 */cwl <= 10'hx35;cwh <= 23'h0x0090;end6'd18: begin /* Inv 4-1 */ cwl <= 10'hx;cwh <= 23'h130510;end6'd19: begin /* Inv 4-2 */cwl <= 10'hx02;cwh <= 23'h0x0190;end6'd20: begin /* Inv 5 */cwl <= 10'hx35;cwh <= 23'h0x0090;endend
6'd21: begin /* Inv 6-1 */cwl <= 10'hx;cwh <= 23'h170510;
end6'd22: begin /* Inv 6-2 */
cwl <= 10'hx02;cwh <= 23'h0x0190;
end6'd23: begin /* Inv 7-1 */
cwl <= 10'hx;cwh <= 23'h1E0510;
6'd24: begin /* Inv 7-2 */cwl <= 10'hx02;cwh <= 23'h0x0190;
end6'd25: begin /* Inv 8 */
cwl <= 10'hx35;cwh <= 23'h0x0090;
end6'd26: begin /* Inv 9-1 */
cwl <= 10'hx;cwh <= 23'h1E0510;
end6'd27: begin /* Inv 9-2 */
cwl <= 10'hx;cwh <= 23'h3E0500;
end6'd28: begin
cwl <= 10'hx3A; /* Inv 9-3 */
cwh <= 23'h0x0190;end
endmodule
The scalar multiplier
The verilog snippet
/* the next state logic */always @(state) begin
case(state)/* Init states */6'd0: nextstate <= 6'd1;6'd1: nextstate <= 6'd2;6'd2: nextstate <= 6'd3;/* Double States */6'd3: nextstate <= 6'd4;6'd4: nextstate <= 6'd5;6'd5: nextstate <= 6'd6;6'd6: begin
if(k[`KEYMSB] == 1'b1)nextstate <= 6'd7; /* Do Addition and doubling if K0 is
1 */else if (ef == 1'b0)nextstate <= 6'd15; /* k[0]=0 and we are in the last
iteration, goto end */elsenextstate <= 6'd3; /* Skip addition and do next
doubling */end
/* Addition States */6'd7: nextstate <= 6'd8;6'd8: nextstate <= 6'd9;6'd9: nextstate <= 6'd10;6'd10: nextstate <= 6'd11;6'd11: nextstate <= 6'd12;6'd12: nextstate <= 6'd13;6'd13: nextstate <= 6'd14;6'd14: begin
if(ef == 1'b1)nextstate <= 6'd3;
elsenextstate <= 6'd15;
end/* The Itoh Tsujii States */6'd15: nextstate <= 6'd16;6'd16: nextstate <= 6'd17;6'd17: nextstate <= 6'd18;6'd18: nextstate <= 6'd19;6'd19: nextstate <= 6'd20;6'd20: nextstate <= 6'd21;6'd21: nextstate <= 6'd22;6'd22: nextstate <= 6'd23;6'd23: nextstate <= 6'd24;6'd24: nextstate <= 6'd25;6'd25: nextstate <= 6'd26;6'd26: nextstate <= 6'd27;6'd27: nextstate <= 6'd28;6'd28: nextstate <= 6'd29;6'd29: nextstate <= 6'd30;6'd30: nextstate <= 6'd31;6'd31: nextstate <= 6'd32;6'd32: nextstate <= 6'd33;6'd33: nextstate <= 6'd34;6'd34: nextstate <= 6'd35;6'd35: nextstate <= 6'd36;6'd36: nextstate <= 6'd37;6'd37: nextstate <= 6'd38;6'd38: nextstate <= 6'd38;default: nextstate <= 6'bx;endcase
end
IO Interface of the scalar multiplier
/* The input to regbank is either constants or the results from ALU */
assign c0r = (cwh[22] == 1'b0) ? c0a : `BASEPOINT_X;
assign c1r = (cwh[22] == 1'b0) ? c1a : yconstants;
assign yconstants = (state != 6'd2) ? `BASEPOINT_Y : `CURVECONSTANT_B;
Output
/* Store the results after converting back into affine coordinates */
assign sx = (key != 233'b1) ? a0 : `BASEPOINT_X;
assign sy = (key != 233'b1) ? a2 : `BASEPOINT_Y;
assign done = (state == 6'd38) ? 1 : 0; /* Set done to 1 if multiplication is complete */
The Module counter
`define KEYSIZE 32`define KEYMSB 31 module counter (clk, nrst, e);input wire clk; /* clock used for the counter */input wire nrst; /* active low reset */output wire e; /* set to 0 if count = 0 */
reg [7:0] count; /* ...and the register which actually decrements */
/* activate e when count reaches 0 */assign e = |count;
always @(posedge clk or negedge nrst) beginif(nrst == 1'b0)
count <= 8'd`KEYMSB;else
count <= count - 1'b1;end
endmodule
This module computes the number of key bits which are processed.• Before the
leading 1 is detected, it just shifts the key (at a fast clock)
• After, the leading 1 is detected, it updates at the start of doubling.
The clocks in the design
/* Shift register for the key. The current bit is always the MSB */
always @(posedge clk) beginif (nrst == 1'b0)k <= key;else if (start == 1'b0) /* if start=0, shift every clock cycle */k[`KEYMSB:0] <= {k[(`KEYMSB-1):0], 1'b0};else if (state == 6'd4) /* if start=1, shift once very iteration of multiplier */k[`KEYMSB:0] <= {k[(`KEYMSB-1):0], 1'b0};else /* else, don't shift */k[`KEYMSB:0] <= k[`KEYMSB:0];
end
/* Detect the first 1 */assign tstart = start;
/* The counter clock is either a fast clock (clk) used during
* leading 1 detection, or a slow clock used during multiplication
*/assign mainclk = (start == 1'b0) ? clk : cck;
always @(posedge clk) beginif (nrst == 1'b0)
start <= key[`KEYMSB];
elsestart <= tstart |
k[(`KEYMSB - 1)];end
/* Generate clock signal for counter */always @(posedge clk) begin
if (nrst == 1'b0) begincck <= 1'b0;
endelse begin
case(state) 6'd3: cck <= ~cck;6'd4: cck <= ~cck;default: cck <= cck;endcase
endend
Testing of the design
Top level Interface of the design:◦ module ecsmul(clk, nrst, key, sx, sy, done);
Test-bench:◦ module tb();
reg clk;reg nrst;wire done;reg [31:0] kmem [7:0];reg [255:0] k; wire [232:0] sx, sy;reg [232:0] res[1:0];integer i;
A testbench
initial begin$readmemh("../../scratch/tv.txt", kmem);
#50for(i=0; i<8; i = i+1) begin
$display("%d %h\n", i, kmem[i]);endk = {kmem[0], kmem[1], kmem[2], kmem[3], kmem[4], kmem[5], kmem[6], kmem[7]};
// k = `KEYSIZE'hC7;$display("%h\n", k[`KEYSIZE - 1:0]);clk = 1'b0;nrst = 1'b1;
#10nrst = 1'b0;#110 nrst = 1'b1;#100000
$display("%h\n%h", sx, sy);res[0] = sx;res[1] = sy;
#1$writememh("../../scratch/res_verilog.txt", res);$finish;
end
initial begin
$dumpfile ("dump.vcd");$dumpvars;$dumpon;
#100000 $dumpoff;end
ecsmul ec_mul(clk, nrst, k[`KEYMSB:0], sx, sy, done);
always begin#100 clk =~clk;end
endmodule
Verification of Correctness
We use the elliptic curve code by M. Rosing in the book Implementation of Elliptic Curve Cryptography
Go to directory ld_rosing/polymath/basics/
In file elliptic_poly.cLines 1232 to 1239 contain the scalar t1.e[0] is
most significant word and t1.e[7] is the least significant Word.
Save file and exitOn a linux machine run ‘make’ to create the
executableExecute ‘elliptic_poly’