41
FPGA Intra-cluster Routing Crossbar Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223

FPGA Intra-cluster Routing Crossbar Design

  • Upload
    rianna

  • View
    64

  • Download
    1

Embed Size (px)

DESCRIPTION

FPGA Intra-cluster Routing Crossbar Design. Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223. Generating Highly Routable Sparse Crossbars for PLDs. Guy Lemieux, Paul Leventis , David Lewis International Symposium on FPGAs, 2000 . - PowerPoint PPT Presentation

Citation preview

Page 1: FPGA Intra-cluster Routing Crossbar Design

FPGA Intra-cluster Routing Crossbar Design

Dr. Philip BriskDepartment of Computer Science and Engineering

University of California, Riverside

CS 223

Page 2: FPGA Intra-cluster Routing Crossbar Design

Generating Highly Routable Sparse Crossbars for PLDs

Guy Lemieux, Paul Leventis, David LewisInternational Symposium on FPGAs, 2000

Page 3: FPGA Intra-cluster Routing Crossbar Design

Basic Notation

Page 4: FPGA Intra-cluster Routing Crossbar Design

Fully Populated Crossbar

• Full capacity – can connect as many signals as the number of outputs

• Flexibility – Can connect any input to any output

Page 5: FPGA Intra-cluster Routing Crossbar Design

Full-capacity Minimal Crossbars

• Full capacity• Reduced Flexibility: you lose the ability to

connect any input to any output• p = m(m – n + 1)

switches

Page 6: FPGA Intra-cluster Routing Crossbar Design

Full-capacity Minimal Crossbars

• Area savings is minimal if n >> m

Page 7: FPGA Intra-cluster Routing Crossbar Design

Perfect and Sparse Crossbars

• Perfect crossbars– Can disjointly route any m-sized subset of the n inputs

to the m outputs– Both full and full-capacity minimal crossbars are perfect

• Sparse crossbars– Has p < m(m – n + 1) switches– Cannot be perfect

Page 8: FPGA Intra-cluster Routing Crossbar Design

Bipartite Graph Representation

I1 I2 I3 I4 I5 I6

O1

O2

O3

O4

I1

I2

I3

I4

I5

I6

O1

O2

O3

O4

Page 9: FPGA Intra-cluster Routing Crossbar Design

Evaluation Challenge

• How “routable” is a given crossbar?

– Build an FPGA, map 20+ applications, observe results• Slow, highly subject to the application mix

– Monte Carlo Test• Generate random test vectors• Route each test vector on the crossbar (network flow)• Report number of successes as a percentage• A highly routable sparse crossbar has a >= 95% success rate

Page 10: FPGA Intra-cluster Routing Crossbar Design

Hall’s Theorm• Given a bipartite graph G = (V, E)– X, Y are the bipartite independent sets of G

G has a matching of X onto Y if and only if

N(v) is the set of neighbors of vertex v N(S) is the set of neighbors of all vertices in S

• Leverage Hall’s Theorem to generate routable sparse crossbars!

Page 11: FPGA Intra-cluster Routing Crossbar Design

Practical Issues

• Cannot enumerate all subsets of m inputs• N(x) should be approximately equal for all

input vertices x in X– Otherwise, any subset containing a large number

of low-degree vertices is unlikely to be routable• N(y) should be approximately equal for all

output vertices y in Y– Symmetric argument

Page 12: FPGA Intra-cluster Routing Crossbar Design

Hamming Distance and Coding Theory

• Represent N(v) as a bitvector bv– bv[i] = 1 if v fans out to Oi

• Hamming Distance– d(bv1, bv2)

• Strategy– Maximize d(bvi, bvj) for every pair of distinct vertices vi

and vj

Page 13: FPGA Intra-cluster Routing Crossbar Design

Switch Placement Optimizer

• Start with initial switch placement• Generate random swap of switch positions– Accept the swap if there is an improvement– Otherwise, reject the swap

• Stop after a fixed number of swap candidates (e.g., 10K) fails to find an improvement

• Objective is to minimize:

Page 14: FPGA Intra-cluster Routing Crossbar Design

Example

Identical Hamming costs before and after the swap

Before: cannot route {1, 2, 3}After: reduces Hamming costs

Page 15: FPGA Intra-cluster Routing Crossbar Design

168x24 Crossbar, 10K Test Vectors

Page 16: FPGA Intra-cluster Routing Crossbar Design

Altera Flex 8000 HP Plasma Hextant

Page 17: FPGA Intra-cluster Routing Crossbar Design

# Switches vs. Routability

Page 18: FPGA Intra-cluster Routing Crossbar Design

Using Sparse Crossbars within LUT Clusters

Guy Lemieux, David LewisInternational Symposium on FPGAs, 2001

Page 19: FPGA Intra-cluster Routing Crossbar Design

Five Questions1. Will depopulation save area, require greater routing area, or create

unroutable architectures?2. Will depopulation reduce or increase routing delays?3. What amount of depopulation is reasonable?4. How much area or delay reduction can be attained, if any?5. What are the other effects of depopulating the cluster?

Page 20: FPGA Intra-cluster Routing Crossbar Design

Architecture and Parameters

Page 21: FPGA Intra-cluster Routing Crossbar Design

Results

Page 22: FPGA Intra-cluster Routing Crossbar Design

Designing Efficient Input Interconnect Blocks for LUT Clusters Using

Counting and Entropy

Wenyi Feng and Sinan KaptanogluACM Transactions on Reconfigurable Technology and

Systems (TRETS), 1(1): article #6, March, 2008

Note: Paper is from Actel (now Microsemi)

Page 23: FPGA Intra-cluster Routing Crossbar Design

Count Configurations (Details Omitted)

784 Configurations 312 Configurations 256 Configurations

Page 24: FPGA Intra-cluster Routing Crossbar Design

Routing Requirement Vector (RRV)• An ordered list of N subsets

containing K distinct signals

• The ith subset is K distinct signals to route to the ith K-LUT

• Total number of RRVs for the crossbar:

M inputsKN outputs

Page 25: FPGA Intra-cluster Routing Crossbar Design

Entropy of an Intra-cluster Routing Crossbar

• H = lg(# routable RRVs)– Accounts for equivalence of LUT inputs

• Why Entropy?– # routable RRVs is huge– Minimum number of configuration bits to program the crossbar– Inversely correlated with usage of global routing muxes (details

omitted)• If we reduce the routability of the crossbar, we will end up

programming more global routing muxes to compensate for the entropy loss

Page 26: FPGA Intra-cluster Routing Crossbar Design

Conceptual Idea

intra-cluster crossbar

global routing

Page 27: FPGA Intra-cluster Routing Crossbar Design

Theorem

• Let P and L be the number of muxes and switches in a crossbar– The entropy is at most Plg(L/P)– The entropy per switch is at most log(L/P) / (L/P) – These bounds are achieved only when each mux

has size L/P and each configuration realizes a unique RRV

• Proof omitted because I DO NOT HATE YOU!

Page 28: FPGA Intra-cluster Routing Crossbar Design

What are we doing here?

• Lemieux and Lewis– Routability: Monte Carlo simulations– Area: Count switches

• Feng and Kaptanoglu– Routability: Crossbar entropy– Area: Entropy per switch– Caveat: Focus only on crossbars where we can count

routable, non-redundant RRVs!

Page 29: FPGA Intra-cluster Routing Crossbar Design

Type-1 Crossbar

• 1-level– L2 muxes are driven

directly by crossbar input signals

– #routable RRVs depends on L2 crossbar topology

• Not area-efficient due to big L2 muxes

• Xilinx Virtex-style

Page 30: FPGA Intra-cluster Routing Crossbar Design

Type-2 Crossbar

• 2-level– L1 is sparsely

populated– L2 is fully populated

• Fully populated L2 reduces area efficiency

• VPR– Fc,in determines L1

population density

Page 31: FPGA Intra-cluster Routing Crossbar Design

Type-3 Crossbar

• 2-level, Partitioned– L1 partition Pi only drives

L2 partition Oi

– From input m to LUT input n, all paths go through muxes in Pi and Oi exclusively

– #Routable RRVs is the product of #Routable RRVs for each disjoint sub-crossbar

Page 32: FPGA Intra-cluster Routing Crossbar Design

Proposed Type-3 Crossbar and Generation Algorithm

• Each sub-crossbar is Type-2• Can count #routable RRVs (Details omitted)

Page 33: FPGA Intra-cluster Routing Crossbar Design

Entropy vs. # Switches

Page 34: FPGA Intra-cluster Routing Crossbar Design

Entropy vs. Global Routing Mux Usage

Page 35: FPGA Intra-cluster Routing Crossbar Design

The Bottom Line…

• Who cares…– Theoretical properties are cute– Actel/Microsemi did not use these crossbars in

their FPGAs

• Practical observation…– The cheaper you make the intra-cluster routing

crossbar, the more expensive the global routing…

Page 36: FPGA Intra-cluster Routing Crossbar Design

A 65nm flash-based FPGA fabric optimized for low cost and power

Jonathan W. Greene, et al.International Symposium on FPGAs, 2011

Note: Paper is from Microsemi (Feng and Kaptanoglu are co-authors)

Page 37: FPGA Intra-cluster Routing Crossbar Design

Corporate Secrets Divulged• They used a Clos Network– Three parameters: m, n, r

Page 38: FPGA Intra-cluster Routing Crossbar Design

Clos Network Properties

• Used when the physical circuit switching needs to exceed the capacity of the largest feasible single crossbar

• Much cheaper than a fully populated nxn crossbar

Page 39: FPGA Intra-cluster Routing Crossbar Design

Strict-sense Nonblocking Clos Network(m > 2n – 1)

• An unused input on an ingress switch can always be connected to an unused output on an egress switch, without reconfiguration!

Page 40: FPGA Intra-cluster Routing Crossbar Design

Rearrangeably Nonblocking Clos Network(m > n)

• An unused input on an ingress switch can always be connected to an unused output on an egress switch, but reconfiguration may be necessary!

Page 41: FPGA Intra-cluster Routing Crossbar Design

Recursive Clos Network Design• Scalable to any ODD

number of stages– Replace center crossbar with

a 3-stage Clos Network