23
1

1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton

Embed Size (px)

Citation preview

Page 1: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton

11

Page 2: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton

NATURE: Non-Volatile Nanotube RAM based Field-Programmable

Gate Arrays

NATURE: Non-Volatile Nanotube RAM based Field-Programmable

Gate ArraysWei Zhang†, Niraj K. Jha† and Li Shang ‡

†Dept. of Electrical EngineeringPrinceton University

‡ Dept. of Electrical and Computer Engineering

Queen’s University

Page 3: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton

33

A Hybrid CMOS/NAnoTUbe REconfigurable Architecture

A Hybrid CMOS/NAnoTUbe REconfigurable Architecture

Motivation

Background on CNT and NRAM

Architecture of NATURE

Logic Folding

Experimental Results

Conclusions

Page 4: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton

44

MotivationMotivation

Moore’s Law: What’s Next?Moore’s Law: What’s Next?Carbon nanotubes (CNTs)Carbon nanotubes (CNTs)Nanowires Nanowires Single electron devicesSingle electron devices......

Challenges in nano-circuits/architecturesChallenges in nano-circuits/architecturesLack of a mature fabrication processLack of a mature fabrication processDefects and run-time failuresDefects and run-time failures

Reconfigurable architectures, such as an Reconfigurable architectures, such as an FPGA, favoredFPGA, favored

Regular structures ease fabricationFault tolerance through reconfiguration

Page 5: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton

55

Motivation (Contd.)Motivation (Contd.)

Problems of existing reconfigurable architecturesProblems of existing reconfigurable architecturesHigh reconfiguration time overheadHigh reconfiguration time overheadLow area efficiencyLow area efficiency

Some recent works on programmable nanofabricsSome recent works on programmable nanofabrics

Molecular logic array (Goldstein et al. [ICCAD 2002])Molecular logic array (Goldstein et al. [ICCAD 2002])

Nanowire PLA (Dehon et al. [FPGA 2004])Nanowire PLA (Dehon et al. [FPGA 2004])

CMOS/nanowire hybrid architecture CMOL (Strukov CMOS/nanowire hybrid architecture CMOL (Strukov et al. [Nanotechnology 2005])et al. [Nanotechnology 2005])

Fabrication problem not yet solvedFabrication problem not yet solved

Page 6: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton

66

NATURE

CMOS fabricationcompatible

CMOS fabricationcompatible NRAM-basedNRAM-based

Run-timereconfiguration

Run-timereconfiguration

Temporallogic folding

Temporallogic folding

Designflexibility

Designflexibility Logic

density

Logicdensity

Advantages of NATUREAdvantages of NATUREAdvantages of NATUREAdvantages of NATURE

Hybrid design leverages beneficial aspects of both CMOS and CNT technologies

NRAMs are distributed in NATURE to store multi-context reconfiguration bits

Fine-grain reconfiguration (even cycle-by-cycle)

Enables temporal logic folding

Flexibility to perform area-performance trade-offsOne-to-two orders of magnitude increase in logic density

Page 7: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton

77

BackgroundBackground

Carbon nanotube (CNT)Metallic or semiconductingSingle-wall or multi-wallDiameter: 1-100nmLength: up to millimetersBallistic transportExcellent thermal conductivityVery high current densityHigh chemical stabilityRobust to environment

Source: Euronanotrade

Page 8: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton

88

Background (Contd.)Background (Contd.)

Non-volatile nanotube random-access memory (NRAM)

Mechanically bent or not: determines bistable on/off statesFully CMOS-compatible manufacturing processPrototype chip: 10 Gbit NRAMWill be ready for the market in the near future

Source: Nantero

Page 9: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton

99

NRAMsNRAMs

Properties of NRAMsNon-volatileSimilar speed to SRAM Similar density to DRAMChemically and mechanically stable

NATURE not tied to NRAMs Phase change RAM Magnetoresistive RAM Ferroelectric RAM

Page 10: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton

1010

Length-1wire

Length-4wire Long wire Switch boxLB

Switchmatrix SMB

S1

S1

Long wireLength-4 wire

Length-1 wire

Direct link

S1

S1 S1: Switch box between length-1 wires

S2: Switch box betweenlength-4 wires

Switch matrix: Local routingnetwork

Connection block Switch block

Architecture of NATUREArchitecture of NATURE Architecture of NATUREArchitecture of NATURE

Island-style logic blocks (LBs) connected by various levels of interconnects

An LB contains a super macroblock (SMB) and a local switch matrix

Page 11: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton

1111

Architecture of a Super Macroblock (SMB)Architecture of a Super Macroblock (SMB)Architecture of a Super Macroblock (SMB)Architecture of a Super Macroblock (SMB)

n1 macroblocks (MBs) comprise an SMB, here n1 = 4

MB MB

48 to 16 crossbar

48 to 16crossbar

NRAM

MB

48 to 16crossbar

NRAMNRAM MB

SRAMbits

SRAMbits

---- 1

6---

- 16

---- 1

6

---- 1

6

CLK and Global signals

48 to 16crossbar

---- 8

---- 8

---- 8

---- 8

---- 1

44

---- 1

44

---- 1

44

NRAM

SRAMbits

SRAMbits

---- 1

44

CLK and Global signals

ReconfigurationbitsReconfiguration

bits

From Switch matrix

From Switch matrix

From Switch matrix

32 Outputsof SMB

Page 12: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton

1212

Architecture of a Macroblock (MB)Architecture of a Macroblock (MB)Architecture of a Macroblock (MB)Architecture of a Macroblock (MB)

n2 logic elements (LEs) comprise an MB, here n2 = 4

NRAM LE LE

12 to 4crossbar

12 to 4crossbar

NRAM

LE

12 to 4crossbar

NRAMNRAM LE

48 SRAMbits

48 SRAMbits

48 SRAMbits

48 SRAMbits

---- 4 ---

- 4

---- 4

---- 4

---- 1

7

---- 1

7

---- 1

7

---- 1

7

12 to 4crossbar

---- 2

---- 2

---- 2

---- 2

CLK and Global signals

---- 4

8

---- 4

8

---- 4

8

---- 4

8

8 Outputsof MB

CLK and Global signals

Inputs to MB

Inputs to MB

Inputs to MB

Reconfiguration bits

Reconfiguration bits

Page 13: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton

1313

Logic Element and InterconnectLogic Element and Interconnect

An LE implements a computation and contains:

An m-input look-up table (LUT) A flip-flop A pass transistor

Interconnect Mixed wire segment scheme25%, 50% and 25% distribution for length-1, length-4 and long wiresDirect links from one LB to its 4 neighbors

m-inputLUT DFF

CLK

SRAM cells

SMB

MB MB MB MB NRAM

---- 2

0One input

---- 4

Length-164 tracks

---- 2

---- 4

---- 8

(a)

Direct link128 tracks

Length-4128 tracks

Long wire64 tracks

Page 14: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton

1414

Support for ReconfigurationSupport for Reconfiguration

Reconfiguration time short: 160ps

Area overhead of NRAMsk: no. of reconfiguration sets per NRAM, assume k = 16Area overhead: 20.5% per LB, assuming 100nm technology for CMOS logic and nanotube lengthLogic density = k (conf. copies) x area per configuration = 16*(1-0.205)=12.75

Appropriate value for k obtained through design space exploration

Word line decoder

Bit

line

deco

der

ReadVoltage

SRAMCell

Pulldown Resistor

NRAM Structure

Electrode

Page 15: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton

1515

Temporal Logic FoldingTemporal Logic FoldingTemporal Logic FoldingTemporal Logic Folding

Basic idea: one can use NRAM-enabled run-time reconfiguration to realize different Boolean functions in the same logic element (LE) every few cycles

ab

c

d

e

f

g

h

OUT

LUT1

LUT2

LUT3

LUT1

a

b

c

e

OUTf

h

LUT3

d

gLUT

2

i = abc’

i l

LUT1

i

l = (i’+e’+f’)h’

OUT = d’g’+l

lOUT

NRAMCycle 1

Cycle 2

Cycle 3

Page 16: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton

1616

ExampleExample

Without logic folding

Num of LEs= 6

Delay= 4 LE delays+Interconnectdelay

Num of LEs= 2

Delay=4*clock_period

With logic folding

LE1 LE2

LE3

x0 x1 x2 x3 y0 y1 y2 y3

LE4 LE5

a0

b0 c0

LE6

Out

LE1 LE2

LE1

x0 x1 x2 x3 y0 y1 y2 y3

LE1 LE2

a0

b0 c0

LE1

Out

Reconfiguration

Clock period=LE delay +Reconfiguration+Interconnectdelay

Page 17: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton

1717

Folding LevelsFolding Levels

Logic folding can be performed at different levels of granularity, providing flexibility to perform area-performance trade-offs

A level-p folding implies reconfiguration of the LE after the execution of p LUT computations

(a) level-1 folding (b) level-2 folding

a0

y0 y1 y2 y3

b0 c0

z0 z1 z2

d0 g0

x0 x1 x2 x3

e0

x0 x1 x2 x3

f0

y0 y1 y2 y3

h0

Macroblock1

LUT node

Outputd

i0

a2 a3 a4 a6

Reconfiguration

Reconfiguration

a0

y0 y1 y2 y3

b0 c0

z0 z1 z2

d0e0

x0 x1 x2 x3

f0

y0 y1 y2 y3

g0

x0 x1 x2 x3

h0

d

i0

a2 a3 a4 a6

Macroblock1 Macroblock2

Output

Page 18: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton

1818

Choosing the Folding LevelChoosing the Folding Level

Advantages of logic foldingSignificant flexibility for performing area-performance trade-offsAbility to map much larger circuits using the same number of LEsSignificant improvement in the area/circuit delay productReduction in the need for global routing

Folding level

Clock period increases:Routing delay increasesNumber of clock cycles decreasesReconfiguration time decreases

Total delay typically decreases

Number of LEs increases Area increases

Page 19: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton

1919

Experimental SetupExperimental Setup

Instance of architecture: 4 MBs in an SMB, 4 LEs in an MB, and LEs contain a 4-input LUT

Number of reconfiguration copies k varied in order to compare implementations corresponding to selected folding levels: level-1, level-2, level-4 and no logic folding

Results based on 100nm CMOS technology parameters

Page 20: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton

2020

-0.1

0.1

0.3

0.5

0.7

0.9

1.1

1.3

1.5

pm

1

sct

cm16

3a

z4m

l

cc poler8

cord

ic

lal

ldd

9sym

ml

alu

2

(normalized to level-1)

Delay (ns) for different folding levels

Level-1 Level-2 Level-4 No-folding

0.1

1

10

pm

1

sct

cm16

3a

z4m

l

cc poler8

cord

ic

lal

ldd

9sym

ml

alu

2

(normalized to level-1)

#LEs * Delay for different folding levels

Level-1 Level-2 Level-4 No-folding

Experimental ResultsExperimental ResultsExperimental ResultsExperimental Results

Average area-time product advantage = 2X

Maximum area-time product advantage = 3X

Page 21: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton

2121

-0.1

0.1

0.3

0.5

0.7

0.9

1.1

1.3

1.5

16-R

CA

32-R

CA

64-R

CA

16-C

LA

32-C

LA

64-C

LA

16-C

SA

32-C

SA

64-C

SA

8-M

UL

16-M

UL

32-M

UL

(normalized to level-1)

Delay (ns) for different folding levels

Level-1 Level-2 Level-4 No-folding

16-RCA: 16-bit ripple carry adder 16-CLA: 16-bit carry lookahead adder

16-CSA: 16-bit carry select adder 8-MUL: 8-bit multiplier

0.1

1

10

100

16-R

CA

32-R

CA

64-R

CA

16-C

LA

32-C

LA

64-C

LA

16-C

SA

32-C

SA

64-C

SA

8-M

UL

16-M

UL

32-M

UL

(normalized to level-1)

#LEs * Delay for different folding levels

Level-1 Level-2 Level-4 No-folding

Experimental Results (Contd.)Experimental Results (Contd.)Experimental Results (Contd.)Experimental Results (Contd.)

Average area-time product advantage = 13X

Maximum area-time product advantage = 35X

Page 22: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton

2222

Experimental Results (Contd.)Experimental Results (Contd.)

Flexibility in performing area-performance trade-off

For area-time (AT) product, larger the circuit depth, more the advantages of level-1 folding relative to no folding

For the 64-bit ripple-carry adder, this advantage is about 35X

LE utilization and logic density very high, with a reduced need for a deep interconnect hierarchy

Page 23: 1. NATURE: Non-Volatile Nanotube RAM based Field-Programmable Gate Arrays Wei Zhang†, Niraj K. Jha† and Li Shang ‡ †Dept. of Electrical Engineering Princeton

2323

ConclusionsConclusions

NATURE: A novel high-performance run-time reconfigurable architecture

Introduction of NRAMs into the architecture enables cycle-by-cycle reconfiguration and logic folding

Choice of different folding levels allows the flexibility of performing area-performance trade-offs

Logic density and area-time product improved significantly

Can be very useful for cost-conscious embedded systems and future FPGA improvement