78
Reconfigurable Computing Finite State Machines School of Electrical and Information Engineering http://www.ee.usyd.edu.au/~phwl Philip Leong ([email protected]) Tortise: You don't need to give an infinite number of monkeys an infinite amount of time to write Hamlet. A finite number of monkeys and a finite amount of time will do just fine. And like my father used to say, never use an infinite number of monkeys when a finite number will do. - Mike Schiraldi, in a sci.math posting

Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong ([email protected])Published

  • Upload
    dothuan

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Reconfigurable Computing Finite State Machines

School of Electrical and Information Engineering

http://www.ee.usyd.edu.au/~phwl

Philip Leong ([email protected])

Tortise: You don't need to give an infinite number of monkeys an infinite amount of time to write Hamlet. A finite number of monkeys and a finite amount of time will do just fine. And like my father used to say, never use an infinite number of monkeys when a finite number will do.

- Mike Schiraldi, in a sci.math posting

Page 2: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Lecture Recordings

http://www.ee.usyd.edu.au/people/philip.leong/hit-rc14.html Slides often change just prior to lecture!

Video recording of lecture

Page 3: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Introduction

› Most practical digital designs built from datapath+control -  control implemented as a finite state machine

› Describe how to implement finite state machines -  how to construct a FSM

-  how to optimize for time and area

3

Page 4: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Simple Example: A memory controller

4

Page 5: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

State Transition Table

5

Current State Input Condition

Next State Output

Idle ready Decision OE=0,WE=0 Decision ~rw

rw Write Read

OE=0,WE=0

Write ready Idle OE=0,WE=1 Read ready Idle OE=1,WE=0

•  Note that output is only a function of the state (Moore machine) •  If input condition not met, state does not change

Next State Logic Output LogicState Registersinputs next_state current_state outputs

Page 6: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

State Assignment

6

Current State Input Condition

Next State Output

Idle ready Decision OE=0,WE=0 Decision ~rw

rw Write Read

OE=0,WE=0

Write ready Idle OE=0,WE=1 Read ready Idle OE=1,WE=0

(s1,s0) (n1,n0) Output 00 01 OE=0,WE=0 01 10

11 OE=0,WE=0

10 00 OE=0,WE=1 11 00 OE=1,WE=0

•  n1 = ~s1.~s0.ready •  n0 = ~s0.~s0.ready+ ~s1.s0.rw •  OE = s1.s0 •  WE = s1.~s0

Page 7: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Implementation

7

•  n1 = ~s1.~s0.ready •  n0 = ~s0.~s0.ready + ~s1.s0.rw •  OE = s1.s0 •  WE = s1.~s0

Page 8: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Speed

8

Next State Logic Output LogicState Registersinputs next_state current_state outputs

› Clock period constraint due to setup time -  tclk ≥ tlogic + tsu+ tcq

-  where tlogic is the next state logic delay

-  tsu is the register setup time (time before the clock edge that data must be stable)

-  tcq is the clock to output delay (time after clock edge q might be unstable)

› Clock period constraint due to hold time -  tH ≤ tlogic + tcq ≤ tclk

-  This is most extreme in the case of a shift register

tcq tco tSU

Page 9: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Optimization for speed/area

› Often there is no need -  FPGA usually fast enough so almost any description is acceptable

-  in this case use the most straightforward implementation (easiest to write/read/debug)

-  otherwise giving timing constraints often does the job

› Optimization -  FSM dependent

-  most efficient FSM depends on # states, complexity of logic etc

-  architecture dependent -  different for FPGA/CPLD/ASIC

-  speed max freq?, smallest clock to output delay? -  interrelated but we can optimize one in favor of another

9

Page 10: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Other FSM architectures

› We will look at 4 different implementations -  outputs decoded from state bits combinatorially

-  output decoded in parallel output registers

-  outputs encoded within state bits

-  one hot encoding

10

Page 11: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Outputs decoded from state bits combinatorially

›  This is the technique described earlier

› Output logic is a combinatorial function of the state -  suffers from an additional delay from output of state bits to the output signals

Next State Logic Output LogicState Registersinputs next_state current_state outputs

11

Page 12: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Glitching

› Consider a state change from 1111 -> 0000 -  The 4 bits could change at different speeds e.g. 1111 -> 1011 -> 0011 -> 0000

-  Output logic could produce a number of spurious values during current_state transient

-  Wastes power and could affect correct operation

12

Next State Logic Output LogicState Registersinputs next_state current_state outputs

Page 13: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Output decoded in parallel output registers

› Decode the outputs before the state bits are registered i.e. at the output of the nextstate

› Outputs are generated earlier than the previous version this is because the output combinatorial logic delay is removed

›  Area larger because we have additional registers for the output ›  Speed we have two combinatorial delays to the output so maximum speed

may be affected ›  Avoids glitching of outputs

Next State Logic State Registersinputs next_state current_state

outputsOutput Logic Output Registers

13

Page 14: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Outputs encoded within state bits

›  An example is a counter - the output is also the state

›  Find a state encoding which directly generates the output -  design more difficult

-  we need to find such an encoding

-  code more complicated and difficult to read

-  can resolve clashes by adding outputs to differentiate similar bits

-  e.g. for the memory controller can add ST0 to differentiate the IDLE and DECISION states

›  Efficiency depends on the particular circuit -  may be more/less efficient in area/speed, best way is to try it for a critical design

Next State Logic State Registersinputs next_state

current_state

outputs

14

Page 15: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Gray code

› Gray code is arranged so that only one output signal changes (no glitching)

› Can code FSMs so that the state transitions follow Gray codes

› Conversion simple g[i] = b[i] xor b[i+1] (b[4] = 0)

15

Page 16: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

One hot encoding

› Use n flip-flops to represent an n-state FSM › Significantly reduces the logic for outputs and nextstate

-  why?

-  also reduces logic to generate outputs

-  at the expense of more registers

› FPGAs -  rich in registers

-  for max speed and small to medium FSMs often the best choice

16

Page 17: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Mealy machines

› All FSMs we have discussed are Moore machines since outputs are functions only of the current state -  for Mealy FSMs, outputs functions of current state and inputs

-  output can usually change 1 cycle earlier than Moore

-  can have fewer states than Moore

Next State Logic Output LogicState Registersinputs current_state outputsnext_state

17

Page 18: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Fault tolerance

› What happens if for whatever reason the FSM gets into an undefined state -  fault tolerant designs can recover

-  alternatively could enter an error state that somehow reports

-  explicit “don’t care”

18

Page 19: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Fault tolerance for one-hot designs

›  There are many more possible illegal states for 1-hot designs -  could detect with logic

-  this is quite expensive, affects performance and area

-  could be pipelined

19

Page 20: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

BlockRAM Applications, State Machine

Store next address in the ROM

Conditional jump info is being entered as

additional address inputs

One RAM can be split in two: e.g. One 18K dual-port = two 9K BlockRAMs

with independent everything:

( R/W, clock, address, data content, aspect ratio )

High speed, independent of complexity

Slide courtesy of Peter Alfke 20

Page 21: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Fast State Machine in one BlockRAM

256 states, 4-way branch (or 128 states, 8-way) 36 optional parallel outputs from the second port

High speed operation, independent of code

8 bits 8 + 1 bitsBranchControl 2 bits

Output

BlockROM

1K x 9

256 x 368 bits 36 bits

Slide courtesy of Peter Alfke 21

Page 22: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Metastability

› Recall all registers have setup (tSU) and hold (tH) times. Violating these can lead to metastability in which the specified clock-to-output delay (tCO) is violated.

22 Source: Altera

Page 23: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Mechanism

23

›  The mechanism is analagous to a ball dropped on a hill. If setup and hold times are met, the ball is dropped near a stable state.

›  Due to noise, it is eventually resolved but this takes an indeterminate amount of time.

Source: Altera

Page 24: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Modeling Metastability

24

›  The mean time between failure can be modeled as

-  where C1 and C2 are process constants

-  fCLK, fDATA are clock frequency of receiving signal and toggle rate of asynchronous input

-  tMET is timing slack available beyond the register’s tCO, for a potentially metastable signal to resolve to a known value

-  tMET/C2 has largest effect and increases in transistor speed improve metastability MTBF

Page 25: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Synchronization Chains

25

›  Metastability can be virtually eliminated (with an increase in latency) via synchronization chains. tMET is sum of the output timing slacks for each register in the chain -  e.g. if tMET is 50 ps, increasing to 1 ns improves MTBF by e20=5×108 times

›  Signals on different clock domains can be handled with a dual-clock FIFO which contains synchronizers

›  Never connect asynchronous input to multiple flip-flops as they can resolve at different speeds

Page 26: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Clock skew

›  The clock signal can arrive at different times, this is called clock skew

› Clock subject to variation, this is called clock jitter

› While quite well controlled, skew and jitter reduce maximum clock frequency -  tclk ≥ tlogic + tsu+ tcq + tskew + tjitter

26

Page 27: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Conclusions

›  Studied many different ways to implement FSMs

› Mostly we use -  outputs decoded combinatorially from state registers

-  one-hot (particularly good for high speed small-medium FSMs on FPGAs)

-  RAM/ROM

› Consider metastability when dealing with asynchronous inputs

27

Page 28: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

References

›  An excellent tutorial on HDL, particularly the “Technology Independent Coding Style” chapter http://www.actel.com/documents/hdlcode_ug.pdf

›  Altera White Paper, Understanding Metastability in FPGAs http://www.altera.com/literature/wp/wp-01082-quartus-ii-metastability.pdf

28

Page 29: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Reconfigurable Computing Shift Registers and Random Number Generators

School of Electrical and Information Engineering

http://www.ee.usyd.edu.au/~phwl

Philip Leong ([email protected])

Page 30: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Introduction

› Efficient FPGA design is often nonobvious -  with practice, you can learn when to use the special features

› This lecture -  Shift registers -  Linear feedback shift register (LFSR) -  Using RAM for shift registers -  True random number generator

Page 31: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Shift register

›  Basic building block in digital circuit design

›  Features -  high speed due to simple logic & interconnection

-  low interconnection costs

Page 32: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Shift registers

›  Applications -  delay line

-  serial-parallel conversion

-  parallel-serial conversion

-  boundary scan

-  serial buses

-  bit serial signal processing (more about this in future lectures)

Page 33: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Straightforward Implementation

module shift_1x64 (clk, shift, sr_in, sr_out,); input clk, shift; input sr_in; output sr_out; reg [63:0] sr; always@(posedge clk) begin if (shift == 1'b1) begin sr[63:1] <= sr[62:0]; sr[0] <= sr_in; end end assign sr_out = sr[63]; endmodule How many LEs for an N bit SR?

Page 34: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Straightforward Implementation

VHDL

if (clk’event and clk = ‘1’) then

sr <= srin & sr(sr’high downto (sr’low+1));

end if;

How many LEs for an N bit SR?

VERILOG module shift_1x64 (clk, shift, sr_in, sr_out,); input clk, shift; input sr_in; output sr_out; reg [63:0] sr; always@(posedge clk) begin if (shift == 1'b1) begin sr[63:1] <= sr[62:0]; sr[0] <= sr_in; end end assign sr_out = sr[63]; endmodule

Page 35: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

RAM

› Remember -  Xilinx LUT as a memory has better density than a LUT

-  Shift registers only need serial access to the memories

-  i.e. do not need all of the outputs in parallel, only need the first input and the last output

Page 36: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Shift register using RAM (4-LUT example)

Page 37: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Address generation: counters

› Ripple carry counter

›  Synchronous counter

› Gray code counter

›  LFSR

› What are the advantages and disadvantages of each?

Page 38: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Counters

Page 39: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Gray code

Page 40: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

LFSR

›  Linear feedback shift register -  XNOR of certain outputs are fed back to the input

-  it is maximum-length if it has a period of 2^N-1

-  lets look at a length 4 maximum length LFSR

Page 41: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

4 bit Maximal Length LFSR

Page 42: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

4 bit LFSR

› How does it count?

Page 43: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

4 bit LFSR

›  0, 1, 3, 7, E, D, B, 6, C, 9, 2, 5, A, 4, 8, 0

›  period is 2^4-1 = 15

›  can be used as pseudorandom number generators

›  note it only has a 2 input XOR gate as the critical path! -  Compare with an adder’s carry chain

Page 44: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Larger maximal length LFSRs (see XAPP052 for more)

› N=4 (4,3)

› N=8 (8,6,5,4)

› N=16 (16,15,13,4)

› N=32 (32,22,2,1)

› N=64 (64,63,61,60)

› N=128 (128,126,101,99)

› N=168 (168,166,153,151)

Page 45: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Primitive polynomials

› A polynomial P of degree m over a finite field Fp is primitive, iff P is monic, P(0)≠0 and ord(P(x))=pm-1. -  Monic means the coefficient of the term with the highest power is 1 -  If P(0)≠0, ord(P) is smallest integer e for which P(x) divides xe-1 -  E.g. P(x)=x4+x3+1 is primitive

› The feedback values for an XOR based maximum length LFSR correspond to the primitive polynomial

› Programs which compute primitive polynomials -  Web based but only supports small n

http://wims.unice.fr/wims/wims.cgi?session=VX756ADCFA.2&+lang=en&+module=tool%2Falgebra%2Fprimpoly.en

Page 46: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published
Page 47: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

LFSR counters

› Maximal LFSR period is 2^N-1 -  sometimes we want the period < 2^N-1

-  Arbitrary period counter with shorter period can be made by adding a synchronous reset circuit

Page 48: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Virtex SRL

›  Virtex has a macro called “Shift Register Lookup Table” (SRL) for implementing SRs

Page 49: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

LUT as a mux

Page 50: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

LUT as an addressable SR

•  Length of SR can be controlled using the D output

Page 51: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Virtex 5

› Virtex 5-7 series slice has 4x 6-input LUTs, each LUT can be a 64x1 or 32x2 RAM

› SRL16x2 and SRL32 primitives are available per LUT (in SLICEM)

Page 52: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

SRLC32E

Page 53: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

True Random Number Generator

›  Introduction

› Ring Oscillator based noise source

›  Alternating Sequence Generator

› Results

› Conclusion

Page 54: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Introduction

› Random number generators (RNGs) are in widespread use, main applications are in -  Simulation

-  Cryptography

›  Propose random number generators for these applications -  Physical RNG (PRNG) based on oscillator phase noise

Page 55: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

What is a random number generator?

›  A cryptographically secure random bit generator (CSRBG) is one which produces sequences for which there is no polynomial time algorithm which, on input of the first l bits of the output sequences, can predict the (l+1)st bit of s with a probability significantly greater than 0.5

Page 56: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Applications

› Designed for embedded FPGA applications -  Small

-  Fast

›  For cryptographic applications -  Secure

Page 57: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Physical random number generators

› Oscillator sampling

› Direct amplification of transistor/resistor noise

› Discrete time chaos

Page 58: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Oscillator phase noise source

•  One method to produce a real random number source in an FPGA is to sample a high frequency signal with a low quality low frequency clock

•  If the phase noise of the low frequency oscillator is of the same order as the period of the high frequency clock, output is quite random

Page 59: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Ring Oscillator based noise source

Page 60: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Increased Randomness via Clock Doubling

Page 61: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Poker Test (ONS only)

›  Checks that each m-bit string occurs the correct number of times ›  Passed if the result is between 1.03 and 57.4

From Menezes, “Handbook of Applied Cryptography”

Page 62: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Poker Test (ONS only)

Page 63: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

PRNG

›  At this point, have a mediocre physical RNG that doesn’t pass Poker test › Can turn into a good RNG by reducing sampling clock rate and passing through

a parity filter to deskew non-uniform distribution (Tsoi, Leung and Leong, FCCM03) -  slow

› How can we make a fast one?

Page 64: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Alternating Step Generator

•  Selected among many stream ciphers because of its compact implementation

•  Recent research has shown weaknesses in this cipher (but any other can be used)

Page 65: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Modified Alternating Step Generator

Page 66: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

LFSRs

› 127 and 129 bit LFSRs based on primitive polynomials with approx equal 1 and 0 coefficients -  LFSR1(x) = x127 + x102 + x61 + x59 + x58 + x57 + x56 + x55 + x54 + x53 +

x52 + x51 + x50 + x49 +x48 + x47 + x46 + x45 + x44 + x43 + x42 + x41 + x40 + x39 + x38 + x37 + x36 + x35 +x34 + x33 + x32 + x31 + x30 + x29 + x28 + x27 + x26 + x25 + x24 + x23 + x22 + x21 +x20 + x19 + x18 + x17 + x16 + x15 + x14 + x13 + x12 + x11 + x10 + x9 + x8 + x7 +x6 + x5 + x4 + x3 + x2 + x + 1

-  LFSR2(x) = x129 + x119 + x62 + x61 + x60 + x59 + x58 + x57 + x56 + x55 + x54 + x53 + x52 + x51 +x50 + x49 + x48 + x47 + x46 + x45 + x44 + x43 + x42 + x41 + x40 + x39 + x38 + x37 +x36 + x35 + x34 + x33 + x32 + x31 + x30 + x29 + x28 + x27 + x26 + x25 + x24 + x23 +x22 + x21 + x20 + x19 + x18 + x17 + x16 + x15 + x14 + x13 + x12 + x11 + x10 + x9+x8 + x7 + x6 + x5 + x4 + x3 + x2 + x + 1

Page 67: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Implementation

› Results for Virtex XCV300E-8 (including computer interface) - Period 7.482ns, 129 slices (4%), 4 BRAM (12%) - High frequency clock can be > 400 MHz (133 MHz used) - Ring oscillator freq > 800 MHz

› Reported results are for no clock doubler but doubled version passes tests as well

› Passes Crush, NIST and Diehard tests

Page 68: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

NIST and Diehard Tests

Page 69: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

TRNG Summary

› Combines weak RNG with high speed stream cipher to produce a physical noise source

› Clock doubler increases oscillator phase noise › Small area, high output rate, good statistical properties › ASG makes a very low cost implementation, other stream ciphers can be used

Page 70: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Generalized LFSRs

›  A variation useful for cryptographic applications -  Feedback taps are programmable (can be updated after synthesis)

-  Polynomial has high weight so it has many feedback taps requiring high fan-in XNOR gate

› Question: how do we design such a circuit?

Page 71: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Block diagram

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

...

PROGRAMMABLE LUT implementing required XNOR

CLK

CE

DIN

Q1 Q3Q2 Q15

F(Q1,Q2,…,Q15)

Page 72: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Implementing the LUT

LUT 1 LUT 2 LUT 3 LUT 4

LUT 1

Q1 Q4Q3Q2 Q13 Q16Q15Q14Q9 Q12Q11Q10Q5 Q8Q7Q6

F(Q1,Q2,…,Q16)

•  SR used to program the LUTs to implement the required feedback arrangement WITHOUT CHANGING THE BITSTREAM

•  Can be reduced from 2 levels of logic to 1 by pipelining

Page 73: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Generalized LFSR Floorplan

•  Serpentine arrangement reduces wire lengths

LFSR Cell LFSR CellLFSR CellLFSR Cell

LFSR Cell LFSR CellLFSR CellLFSR Cell

LFSR Cell LFSR CellLFSR CellLFSR Cell

LFSR Cell LFSR CellLFSR CellLFSR Cell

Stage 1 pipeline

Page 74: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

LFSR cell

Q

QSET

CLR

DSR input SR output

Q

QSET

CLR

D

From SouthConnections

x3

u1x2

x1

f(x1...xn)

To left handtop cell

Leftmost cell hasone input from

each of theflip-flops on the

first row

x3

x4

u1 x2

x1

f(x1...xn)Row 1 Leftmost cell only

Row 1 Non-leftmost cell only

All cells•  RAMs used to customize LFSR polynomial (can be loaded at any time)

•  Leftmost stage has no pipelining

Page 75: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Pipelining Logic

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

•  Must compensate for the extra latency introduced by pipelining

Page 76: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Conclusions

›  If we can use Xilinx distributed RAM it gives higher better density -  can be used for shift registers, counters & PRNG

-  Virtex has a SRL macro for even better efficiency

›  LFSR -  generate random numbers -  high speed nonbinary counters and dividers -  Can construct rrue random number generator using oscillator phase noise and the ASG

Page 77: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

References

›  XAPP052 “Efficient Shift Registers, LFSR Counters and Long PseudoRandom Sequence Generators” and XAPP210 “LFSRs in Virtex Devices” (http://www.xilinx.com/support/documentation/application_notes/xapp052.pdf)

›  K.H. Tsoi, Ka Ho Leung, and Philip H.W. Leong. A high performance physical random number generator. IEE Proc. Computers & Digital Techniques, 1(4):349–352, July 2007. (http://www.cse.cuhk.edu.hk/~phwl/mt/public/archives/papers/prng_cdt07.pdf)

Page 78: Reconfigurable Computing - University of Sydney Computing Finite State Machines School of Electrical and Information Engineering phwl Philip Leong (philip.leong@sydney.edu.au)Published

Review Questions

› Make sure you understand how the SRL is implemented in an FPGA

› Make sure you can implement an LFSR in VHDL/Verilog

› When is an LFSR more efficient than a counter for counting applications?