35
Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Embed Size (px)

Citation preview

Page 1: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Advanced Topics on FPGA ApplicationsScreen A

Wu, Jinyuan

Fermilab

IEEE NSS 2007 Refresher Course

Supplemental Materials

Oct, 2007

Page 2: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

2

Doublet Matching,Hash Sorter

Page 3: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

3

Hit MatchingSoftware FPGA

Typical

FPGA Resource Saving Approaches

O(n2)for(){

for(){…}

}

O(n)*O(N)Comparator

Array

Hash Sorter

O(n)*O(N): in RAM

O(n3)for(){

for(){

for(){…}

}

}

O(n)*O(N2)CAM,

Hugh Trans.

Tiny Triplet Finder

O(n)*O(N*logN)

O(n4)for(){ for(){

for(){ for()

{…}

}}}

Page 4: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

4

Hash Sorter

K

K

D

K

D

Pass 1: Data in Group 1 are

stored in the hash sorter bins based on key number K.

Pass 2: Data in Group 2 are

fetched though and paired up with corresponding Group 1 data with same key number K.

Group 1

Group 2

Page 5: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

5

Hash Sorter

K

Page 6: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

6

Hash Sorter Implementation

Pipelined structure: Single clock cycle push or pop

Single clock cycle fast reset

Page 7: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

7

An Example of Track Recognition: Event

We explain the track recognition process using this 20-track example.

Page 8: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

8

Tangent Angle Measurements

There are various techniques to measure the tangent angle of the track segment (or “doublet”, or “cluster”).

Sometimes extra “ghost” segments may exist. The ghost segments may be resolved in track recognition

process later.

Page 9: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

9

A Large Curvature Track

A soft track hits large region. A global algorithm is better suited.

The “high-pT” approximation is not valid globally. Exact track equation is needed.R

r

0

)sin(2 0 Rr

)sin(2

20

R

r

)(2 0 Rr

Parameter:Initial angle

Measure the tangent angle..

Parameter:Radius of curvature

Page 10: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

10

An Example of Track Recognition: Clustering

0

c0

clustering

The doublets in clusters are grouped together.

The “ghost” doublets are gone.

The 9-bin scheme The 4-bin schemeFor doublets on the seeding super layer in this bin…

search for coincident in these 9 bins.

For doublets on the seeding super layer in this bin…

search for coincident in these 4 bins.

Page 11: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

11

FPGA Block Diagram

Hash sorters for 0

Hash sorters for c0

)sin(5025

2

0

0

r

cm

R

cmc

Page 12: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

12

Without Full Track Recognition

Two track parameters can be calculated for each doublet.

Useful trigger primitives can be found without full track recognition.

For example…

)sin(5025

2

0

0

r

cm

R

cmc

Page 13: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

13

Triplet Finding,Tiny Triplet Finder

Page 14: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

14

Hit MatchingSoftware FPGA

Typical

FPGA Resource Saving Approaches

O(n2)for(){

for(){…}

}

O(n)*O(N)Comparator

Array

Hash Sorter

O(n)*O(N): in RAM

O(n3)for(){

for(){

for(){…}

}

}

O(n)*O(N2)AM, CAM,

Hugh Trans.

Tiny Triplet Finder

O(n)*O(N*logN)

O(n4)for(){ for(){

for(){ for()

{…}

}}}

Page 15: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

15

Hits, Hit Data & Triplets

• Hit data come out of the detector planes in random order.

• Hit data from 3 planes generated by same particle tracks are organized together to form triplets.

Page 16: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

16

TTF OperationsPhase I: Filling Bit Arrays

Note: Flipped Bit Order

Physical Planes

Bit Array/Shifters

For any hit… Fill a corresponding logic cell.

• xA+ xC = 2 xB

• xA= - xC + constant

Page 17: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

17

TTF Operations Phase II: Making Match

For any center plane hit…

Logically shift the

bit array.

Perform bit-wise AND in this range.

Triplet is found.

Physical Planes

Bit Array/Shifters

Page 18: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

18

Tiny Triplet FinderReuse Coincident Logic via Shifting Hit Patterns

C1

C2

C3

One set of coincident logic is implemented.

For an arbitrary hit on C3, rotate, i.e., shift the hit patterns for C1 and C2 to search for coincidence.

Page 19: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

19

Tiny Triplet Finder for Circular Tracks

*R1/R3

*R2/R3Triplet Map Output To Decoder

Bit

Arr

ay

Shifter

Bit

Arr

ay

ShifterBit-wise Coincident Logic

0

16

32

48

64

80

96

112

128

0 16 32 48 64 80 96 112 128

1. Fill the C1 and C2 bit arrays. (n1 clock cycles)

2. Loop over C3 hits, shift bit arrays and check for coincidence. (n3 clock cycles)

Also works with more than 3 layers

Page 20: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

20

Tiny? Yes, Tiny! – Logic Cell Usage:

AM, CAM, Hough Transform

etc., O(N2)

Tiny Triplet FinderO(N*logN)

Page 21: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

21

Complex Triplet Fining Problems

Page 22: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

22

Options of Sequence Control

Page 23: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

23

Micro-computing vs. Reconfigurable Computing

In microprocessor, the users specify program on fixed logic circuits. In FPGA, the users specify logic circuits (as well as program). The FPGA computing needs not to follow microprocessor architectures. (But useful

experiences can be borrowed.) The usefulness of FPGA reconfigurable computing is still to be fully appreciated.

(100+3-4)*5+7 =?

100

34

57Control:

Data: 100,3,4,5,7

LD (-) (+)(*)(+)

CPUFPGAData

ProgramConfiguration

DataProgram

Page 24: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

24

ELMS– Enclosed Loop Micro-Sequencer

Loop & Return Logic + Stack

Conditional Branch Logic

ProgramCounter

ROM128x

36bits

AReset

CLK Con

trol

Sig

nals

PC Control Signals Opration00 000000000000000 01 001000100011010 LD R1, #n02 000010001000000 LD R2, #addr_a03 000000000000100 LD R3, #addr_X04 000000010001000 LD R7, #005 000000000100001 BckA1 LD R4, (R2)06 000100000010000 INC R207 000001000100000 LD R5, (R3)08 000100010000001 INC R309 001001000100000 MUL R6, R4, R50a 000000010001000 EndA1 ADD R7, R7, R60b 000010000010000 DEC R10c 000000100000100 BRNZ BckA1

Special in ELMSSupports FOR loops at machine code level

PC+ROM is a good sequencer in FPGA.

Adding Conditional Branch Logic allows the program to loop back.

Loop & Return Logic + Stack is a special feature in ELMS that supports FOR loops at machine code level.

Allows jump back as in microprocessors

Page 25: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

25

Software: Using Spread Sheet as Compiler

Page 26: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

26

What’s Good about ELMSNo ALU => Small Resource Usage

ProgramDATA

Memory

PrincetonArchitecture

HarvardArchitecture

FermilabArchitecture(?)

ProgramControl

ALU

ProgramMemory

ProgramControl

ALUDATAMemory

ProgramMemory

Sequencer(ELMS)

Data Processor

DATAMemory

The Princeton Architecture is more suitable at system level while Harvard Architecture is better suited at micro-structure level.

Regular microprocessors cannot run looped program without an ALU.

The ALU takes large amount of resource while may not be efficiently utilized for data processing tasks in FPGA.

The ELMS can run nested loop program without an ALU.

Further separation of Program and data is therefore possible.

The ELMS is kept small.

Page 27: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

27

Recursive Structure

Page 28: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

28

The Digitizer Card for the Fermilab Beam Loss Monitor System

Beam loss input signals from ion chambers are integrated and digitized.

Sliding sums are accumulated and compared with pre-loaded thresholds.

Over threshold in several places causes beam abort based on pre-defined setting.

Beam loss signals are filtered and “de-rippled” for display purposes.

Sequence is controlled by “Seq128” block.

ADC21s/sample

RAM

FastSliding Sum

A>B

SlowSliding Sum

Very SlowSliding Sum

ImmediateSliding Sum Threshold I

AbortLogic

A>BThreshold F

A>BThreshold S

A>BThreshold V

CICSums

De-rippleProcess

Ion ChamberInput

Seq128

Page 29: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

29

Filter Functions

SlidingSum

Cascaded IntegratorComb (CIC) Sum of 2nd Order

)1(

][1][Km

mk

kxms

)12(

][][][Kn

nk

kxkhny

The CIC sum is a sliding sum of sliding sums.

The frequency response of CIC sum is a sinc2(x) function that has 2nd order zeros and better stop band suppression. -0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20

x

sinc(x)

sinc 2̂(x)

First Zero @ 360 Hz

Frequency

21s/sample124 samples

Page 30: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

30

Filter Implementation

RecursiveImplementation

Recursive != IIR

Non-RecursiveImplementation

Finite Impulse Respond (FIR)

Infinite Impulse Respond (IIR)

Possible

Yes

Yes

NO

ResourceFriendly

x[n]

s[n]

+s[n]

-x[n-K]

x[n]

The non-recursive implementation needs: 124 memory fetches, 124 additions and more ops for longer sum lengths.

The recursive implementation needs: 1 memory fetch, 2 add/sub operations regardless sum length.

SlidingSum

Page 31: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

31

BLM DC Process Sequencing

The processes of calculating sliding sums and CIC sums are fully sequenced. The de-ripple processor is flat for the process path. But it operates sequentially for 4

channels.

+ SlidingSum 1

(-)

+u[n]

-2x[n-K]

x[n]

+y[n]

x[n-2K] +u[n-L]

-2x[n-L-K]

+y[n-L]

x[n-L-2K]

x[n-L]

If |y[n]-y[n-L]|>MaxDY for entire period, then PG++.WF

PG=0WF

PG=1

PG

---

WF-WM DR=y[n]-(WF-WM)

MaxDYDecimation

Counter

+ SlidingSum 4

(-)SlidingSum 2

SlidingSum 3

Fully Sequencing

PartiallyFlat

Page 32: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

32

The EndThanks

Page 33: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

33

Resource Saving Tricks

Loop Reduction Tricks:The number of computations in a given task is reduced by (1) using fewer iterations in loops or/and (2) using fewer operations in each iteration.

Non-Loop Reduction Tricks:The number of computations in a given task is unchanged. The FPGA resource is saved by (1) reusing the resources multiple times via sequencing or/and (2) using transistor-saving resources such as RAM.

Page 34: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

34

Resource Saving TricksLoop-Reduction

Multiplier-less (ML) Approaches

Recursive Implementation of FIR Filter

FFT: O(n)*O(log(N))

Tiny Triplet Finder: O(n)*O(N*log(N))

+

s[n]

-x[n-K]

x[n]

+y[n]

-s[n-K]

x[n]

y[n]

*h1*h2

*h[K]

X

<<

+/-

*R1/R3

*R2/R3

Bit

Arr

ay

Shifter

Bit

Arr

ay

ShifterBit-wise Coincident Logic

Page 35: Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab

IEEE NSS Refresher Course, Supplemental Materials

35

Resource Saving TricksNon-Loop-Reduction

Sequencing: Using RAM: Hash Sorter/Histogram

OP1

Initialization

OP2 OP3 OP4

OP1 OP2 OP3 OP4

OP1 OP2 OP3 OP4

OP1 OP2 OP3 OP4

Initialization 1Initialization 2Initialization 3

OP1OP2OP3OP4

OP1OP2OP3OP4

OP1OP2OP3OP4

OP1OP2OP3OP4