8
mrFPGA: A Novel FPGA Architecture with Memristor-Based Reconfiguration Jason Cong Bingjun Xiao Department of Computer Science University of California, Los Angeles {cong, xiao}@cs.ucla.edu AbstractIn this paper, we introduce a novel FPGA architecture with memristor-based reconfiguration (mrFPGA). The proposed architecture is based on the existing CMOS-compatible memristor fabrication process. The programmable interconnects of mrFPGA use only memristors and metal wires so that the interconnects can be fabricated over logic blocks, resulting in significant reduction of overall area and interconnect delay but without using a 3D die- stacking process. Using memristors to build up the interconnects can also provide capacitance shielding from unused routing paths and reduce interconnect delay further. Moreover we propose an improved architecture that allows adaptive buffer insertion in interconnects to achieve more speedup. Compared to the fixed buffer pattern in conventional FPGAs, the positions of inserted buffers in mrFPGA are optimized on demand. A complete CAD flow is provided for mrFPGA, with an advanced P&R tool named mrVPR that was developed for mrFPGA. The tool can deal with the novel routing structure of mrFPGA, the memristor shielding effect, and the algorithm for optimal buffer insertion. We evaluate the area, perfor- mance and power consumption of mrFPGA based on the 20 largest MCNC benchmark circuits. Results show that mrFPGA achieves 5.18x area savings, 2.28x speedup and 1.63x power savings. Further improvement is expected with combination of 3D technologies and mrFPGA. Keywords-FPGA; memristor; ASIC; reconfiguration; I. I NTRODUCTION The performance/power efficiency of an application implemented in an application specific integrated circuit (ASIC) can be as much as six orders of magnitude higher than its counterpart coded in CPU [1]. However the rapid increase of non-recurring engineering cost and design cycle of an ASIC in nanometer technologies makes it impractical to implement most applications in ASIC. This makes the Field Programmable Gate Array (FPGA) increasingly more popular. FPGA is a desirable type of integrated circuit that can be reconfigured to realize a large range of arbitrary functions according to customer demands. The programmability makes FPGA a quick and reusable hardware implementation platform for specific applications. Compared to ASIC, FPGA has sacrificed its performance to some extent for programmability [2] but FPGA can still have orders of magnitude improvement in performance/power efficiency compared to CPU [1, 3]. The flexibility and performance of FPGA compared to ASIC and CPU makes FPGA an important component in the realm of customizable heterogeneous computing platforms [3]. It is well known that the programmable routing structure in FPGA is the principal source of the performance inferiority of FPGA when compared to ASIC [4–7]. It is reported that the programmable interconnects in FPGA can account for up to 90% of the total area [4], up to 80% of the total delay [5, 6] and up to 85% of the total power consumption [7]. The study in [2] measures the gap between FPGA and ASIC. It shows that the area, delay and power consumption of FPGA are 21 times, 4 times and 12 times as much as those of ASIC respectively. We see that if FPGA routing structure (the dominant part in FPGA) gains some improvement, the gap between FPGAs and ASICs will be significantly reduced. Conventional FPGA suffers a lot from its programmable intercon- nects partly due to extensive use of SRAM-based programming bits, multiplexers (or pass transistors), and buffers in the interconnects. One SRAM cell contains as many as six transistors, but can store only one-bit data. The low density of SRAM-based storage increases the area overhead of FPGA programmability and consequently, leads to longer routing paths and larger interconnect delay. In addition SRAM is a type of volatile memory – this means that it contributes to excessive power consumption during stand-by. Emerging technologies, especially emerging non-volatile memory (NVM) technologies, lead to opportunities of circuit improvement. Emerging NVMs typically have a smaller cell size than that of SRAMs. They also have the desirable property of non-volatility, which means that they can be turned off during stand-by to save power. Among them, memristor, also often recognized as resistive random-access memory (RRAM), is considered to be the most promising one. Since HP Labs presented the first experimental realization of memristor in 2008 [8], rapid progress on the fabrication of high-quality memristor has been achieved in the past few years. The current leading memristor technology for the time being has outperformed many other emerging NVMs in some important aspects and performs similar to them in the other aspects. It has been demonstrated that memristor is scalable below 30nm [9], can be fabricated with device size as small as F 2 (F is the feature size) with full compatibility of CMOS [10], and can be programmed within 5ns at 180nm technology node [11]. Due to the advantages of its device property over SRAM and other emerging NVMs, numerous research efforts have been made to adopt memristors for system- level integration [12–15]. However almost all of these studies use memristors as a new type of memory. This paper presents a novel FPGA architecture with memristor- based reconfiguration (mrFPGA). With the novel use of memristors in mrFPGA interconnects rather than memory cells, the proposed architecture reduces the gap between FPGAs and ASICs in area, delay and power. This paper is organized as follows. Section II introduces the conventional FPGA architecture and review recent research work on FPGAs with emerging technologies. Section III presents the unique structure of memristor and its electrical characteristics and circuit model. Section IV describes the architecture of mrFPGA from high-level overview to detailed design. Section V provides a complete CAD flow and evaluation method for mrFPGA. Section VI presents detailed evaluation results of mrFPGA using the largest twenty MCNC benchmarks. Finally we draw some conclusions in Section VII. 1 978-1-4577-0994-4/11/$26.00 c 2011 IEEE

mrFPGA: A Novel FPGA Architecture with Memristor-Based ...cong/papers/nano11.pdfmrFPGA: A Novel FPGA Architecture with Memristor-Based Reconfiguration Jason Cong Bingjun Xiao Department

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

  • mrFPGA: A Novel FPGA Architecture with Memristor-Based

    Reconfiguration

    Jason Cong Bingjun Xiao

    Department of Computer Science

    University of California, Los Angeles

    {cong, xiao}@cs.ucla.edu

    Abstract— In this paper, we introduce a novel FPGA architecturewith memristor-based reconfiguration (mrFPGA). The proposedarchitecture is based on the existing CMOS-compatible memristorfabrication process. The programmable interconnects of mrFPGAuse only memristors and metal wires so that the interconnects canbe fabricated over logic blocks, resulting in significant reductionof overall area and interconnect delay but without using a 3D die-stacking process. Using memristors to build up the interconnects canalso provide capacitance shielding from unused routing paths andreduce interconnect delay further. Moreover we propose an improvedarchitecture that allows adaptive buffer insertion in interconnectsto achieve more speedup. Compared to the fixed buffer pattern inconventional FPGAs, the positions of inserted buffers in mrFPGAare optimized on demand. A complete CAD flow is provided formrFPGA, with an advanced P&R tool named mrVPR that wasdeveloped for mrFPGA. The tool can deal with the novel routingstructure of mrFPGA, the memristor shielding effect, and thealgorithm for optimal buffer insertion. We evaluate the area, perfor-mance and power consumption of mrFPGA based on the 20 largestMCNC benchmark circuits. Results show that mrFPGA achieves5.18x area savings, 2.28x speedup and 1.63x power savings. Furtherimprovement is expected with combination of 3D technologies andmrFPGA.

    Keywords-FPGA; memristor; ASIC; reconfiguration;

    I. INTRODUCTION

    The performance/power efficiency of an application implemented in

    an application specific integrated circuit (ASIC) can be as much as

    six orders of magnitude higher than its counterpart coded in CPU

    [1]. However the rapid increase of non-recurring engineering cost

    and design cycle of an ASIC in nanometer technologies makes it

    impractical to implement most applications in ASIC. This makes

    the Field Programmable Gate Array (FPGA) increasingly more

    popular. FPGA is a desirable type of integrated circuit that can be

    reconfigured to realize a large range of arbitrary functions according

    to customer demands. The programmability makes FPGA a quick and

    reusable hardware implementation platform for specific applications.

    Compared to ASIC, FPGA has sacrificed its performance to some

    extent for programmability [2] but FPGA can still have orders of

    magnitude improvement in performance/power efficiency compared

    to CPU [1, 3]. The flexibility and performance of FPGA compared to

    ASIC and CPU makes FPGA an important component in the realm

    of customizable heterogeneous computing platforms [3].

    It is well known that the programmable routing structure in FPGA

    is the principal source of the performance inferiority of FPGA

    when compared to ASIC [4–7]. It is reported that the programmable

    interconnects in FPGA can account for up to 90% of the total area [4],up to 80% of the total delay [5, 6] and up to 85% of the total powerconsumption [7]. The study in [2] measures the gap between FPGA

    and ASIC. It shows that the area, delay and power consumption of

    FPGA are ∼21 times, ∼4 times and ∼12 times as much as those

    of ASIC respectively. We see that if FPGA routing structure (the

    dominant part in FPGA) gains some improvement, the gap between

    FPGAs and ASICs will be significantly reduced.

    Conventional FPGA suffers a lot from its programmable intercon-

    nects partly due to extensive use of SRAM-based programming bits,

    multiplexers (or pass transistors), and buffers in the interconnects.

    One SRAM cell contains as many as six transistors, but can store

    only one-bit data. The low density of SRAM-based storage increases

    the area overhead of FPGA programmability and consequently, leads

    to longer routing paths and larger interconnect delay. In addition

    SRAM is a type of volatile memory – this means that it contributes

    to excessive power consumption during stand-by.

    Emerging technologies, especially emerging non-volatile memory

    (NVM) technologies, lead to opportunities of circuit improvement.

    Emerging NVMs typically have a smaller cell size than that of

    SRAMs. They also have the desirable property of non-volatility,

    which means that they can be turned off during stand-by to save

    power. Among them, memristor, also often recognized as resistive

    random-access memory (RRAM), is considered to be the most

    promising one. Since HP Labs presented the first experimental

    realization of memristor in 2008 [8], rapid progress on the fabrication

    of high-quality memristor has been achieved in the past few years.

    The current leading memristor technology for the time being has

    outperformed many other emerging NVMs in some important aspects

    and performs similar to them in the other aspects. It has been

    demonstrated that memristor is scalable below 30nm [9], can be

    fabricated with device size as small as F2 (F is the feature size) with

    full compatibility of CMOS [10], and can be programmed within

    5ns at 180nm technology node [11]. Due to the advantages of its

    device property over SRAM and other emerging NVMs, numerous

    research efforts have been made to adopt memristors for system-

    level integration [12–15]. However almost all of these studies use

    memristors as a new type of memory.

    This paper presents a novel FPGA architecture with memristor-

    based reconfiguration (mrFPGA). With the novel use of memristors

    in mrFPGA interconnects rather than memory cells, the proposed

    architecture reduces the gap between FPGAs and ASICs in area,

    delay and power.

    This paper is organized as follows. Section II introduces the

    conventional FPGA architecture and review recent research work

    on FPGAs with emerging technologies. Section III presents the

    unique structure of memristor and its electrical characteristics and

    circuit model. Section IV describes the architecture of mrFPGA

    from high-level overview to detailed design. Section V provides a

    complete CAD flow and evaluation method for mrFPGA. Section VI

    presents detailed evaluation results of mrFPGA using the largest

    twenty MCNC benchmarks. Finally we draw some conclusions in

    Section VII.

    1978-1-4577-0994-4/11/$26.00 c©2011 IEEE

  • II. BACKGROUND AND RELATED WORK

    A. Conventional FPGA

    Fig. 1 shows a conventional FPGA architecture. It is made up of

    Fig. 1: Conventional FPGA architecture.

    an array of tiles, and each tile consists of one logic block (LB), two

    connection blocks (CB) and one switch block (SB). Each logic block

    contains a cluster of basic logic elements (BLEs), typically look-

    up tables (LUTs), to provide customizable logic functions. LBs are

    connected to the routing channels through CBs and the segmented

    routing channels are connected with each other through SBs. Fig. 1

    also depicts two typical circuits designs of CBs and SBs based on

    multiplexers (MUXs) and buffers described in [16]. The selector pins

    of each multiplexer are connected to a group of 6-transistor SRAM

    cells to have their connectivity determined. The circuits in Fig. 1

    have copies of up to the number of pins per side of LBs in a single

    CB, and up to the number of tracks per channel in a single SB.

    We see that CBs and SBs make up of the interconnects of FPGAs

    with much larger area and higher complexity compared to the direct

    interconnects of ASIC.

    B. Recent Work on FPGAs using Emerging Technologies

    With the recent development of emerging technologies, a number

    of novel FPGA architectures based on these technologies have been

    proposed in the past few years. A 3D FPGA architecture was pro-

    posed in [17]. It partitions the transistors in the routing structure and

    logic structure of conventional FPGA into multiple active layers and

    stacks the layers via monolithic stacking, a 3D integration technology

    the author proposes to use as shown in Fig. 2a. It provides 1.7x

    performance gain due to the smaller tile area and shorter interconnect

    distance between tiles. However 3D stacking without proper thermal

    solution, will result in the excessive increase of heat density and the

    corresponding degradation of performance [19]. Also applications

    of monolithic stacking are currently limited due to the reason that

    high temperature required by the fabrication of transistors in an

    upper layer is likely to destroy the metals wires and transistors

    which have been fabricated in the layers below [18]. In [14] and

    [18] two FPGA architectures are proposed using emerging NVMs

    and through-silicon-via (TSV) based 3D integration. Similar to the

    monolithic stacking in [17], they also move all the transistors in the

    routing structure of FPGA to the top die over the logic structure

    die with TSV connections between them as shown in Fig. 2b. In

    addition they replace SRAM in FPGA with emerging NVMs, such

    as phase-change RAM (PCRAM) [18] or memristor [14], to save the

    area usage of storing one programming bit and stand-by power as

    shown in Fig. 2c. 1.1x and 2x overall performance gains before and

    after 3D integration respectively are reported in these two works.

    SB SBCB

    SB SBCB

    CB CB

    LB LB LB

    LB LB LB

    LB LB LB

    Memory Layer

    Switch Layer

    CMOS Layer

    3D-FPGA

    (a) Monolithic stacking [17].

    ���

    ���

    �����

    ��

    ������

    ��

    � �� ����

    ��

    �� ��

    ��

    ��

    ��

    ��

    �� ��

    ���

    �����

    �� ��

    ��

    ��

    ��

    ��

    ��

    ��

    � �� ����

    ��

    �� ��

    ��

    ��

    ��

    ��

    �� ��

    ��

    ��

    ��

    ��

    ��

    ��

    � ��

    � ��

    ���

    ���

    ���

    ���

    ���

    ���

    ��

    �� ��

    (b) 3D integration with TSVs [18].

    (c) Replace SRAMs with emerg-ing NVMs [14].

    (d) Nanowire crossbar [19].

    Fig. 2: Recent work on FPGA using emerging technologies.

    However one potential problem is that the maximum density of

    TSVs currently achievable might fail to satisfy the dense connections

    required between the routing structure die and logic structure die.

    In [19], a 3D CMOS/Nanomaterial hybrid FPGA was proposed.

    Again, it separates the transistors of interconnects from those of logic

    blocks and redistributes them into different dies as shown in Fig. 2d.

    The difference is that it uses nanowire crossbars and face-to-face

    3D integration technology to provide connections between the dies.

    It provides a 2.6x performance gain compared to the conventional

    FPGA architecture with certain power overhead brought by the large

    capacitance of crossbar array. In addition the crossbar structure may

    consist of a certain percentage of defective components [19]. To sum-

    marize all, these works try to stack interconnects over logic blocks for

    the sake of reduction of area and interconnect delay. However they

    all suffer from a common problem. They still require transistors in

    interconnects. To achieve the stacked architectures, these works have

    to reorganize the interconnects and logic blocks into separate dies

    and introduce 3D die-stacking (or less mature monolithic stacking).

    Otherwise the high temperature during fabrication of the transistors

    in an upper layer would ruin the existing transistors and metal wires

    below in the same die. Die-stacking introduces undesirable problems

    in terms of manufacturability, reliability and limit of the vertical

    integration density. Some researchers replace the pass transistors

    in the routing structure of FPGA with emerging memory elements

    integrated in metal layers with back-end-of-line (BEOL) compatible

    fabrication [20, 21]. But the improvement is limited due to the small

    portion of pass transistors in the routing structure. There are also

    studies on new reconfigurable circuit structures, such as CMOL [22]

    and CNTFET-based fine-grain cell matrices [23], with the aim of

    taking the place of FPGA. Lots of efforts are still needed to realize

    these revolutionary ideas in products.

    III. MEMRISTOR TECHNOLOGY

    In this work, we propose to use memristors and metal wires but no

    transistors to build up the routing structure in mrFPGA. We first intro-

    2 2011 IEEE/ACM International Symposium on Nanoscale Architectures

  • duce the memristor technology in this section. CMOS-compatibility

    is key property to integrate memristor into the mrFPGA architecture

    and several works have demonstrated CMOS-compatible memristor

    fabrication processes [10, 24, 25]. For example, Fig. 3 is a schematic

    view and cross-sectional transmission electron microscopy (TEM)

    image of memristor fabricated by a CMOS-compatible process [24]

    for example. As shown in Fig. 3a memristor is a two-terminal

    (a) Schematic view. (b) TEM image.

    Fig. 3: Fabrication structure of memristor integrated with CMOS in

    the same die [24].

    resistance-like device which can be fabricated between the top metal

    layer and the other metal layers on top of the transistor layer in the

    same die. As explained in [24, 25], since no high temperature takes

    place during their memristor fabrication processes, the processes can

    be embedded after back-end-of-line process and will not ruin the

    transistors and metal wires which have been fabricated below the

    memristors, as shown in Fig. 3b. We also see from Fig. 3b that

    the structure of memristor is quite simple, resulting in the easy

    achievement of small cell size. In spite of the simple structure of

    memristor, it achieves the storage of one-bit data successfully. Mem-

    ristor can be programmed by applying specific programming voltage

    or current at its two terminals to switch its resistance value at normal

    operation between low state and high state. The resistance values at

    the two states are usually more than two orders of magnitude apart

    [24], or even up to six orders of magnitude apart [25]. In addition

    memristor can keep the resistance value set before being powered off

    without supply voltage. With this programmable property, memristor

    is used as a promising type of NVM with two different states: low

    resistance state (LRS) and high resistance state (HRS). In our work

    the important thing is that memristor acts like a programmable switch

    which can be reconfigured to determine whether its two terminals are

    connected or not. If the two terminals need to be connected to each

    other, we just program the memristor to LRS. Otherwise, we program

    the memristor to HRS. This special mechanism is quite different from

    those of conventional memories and provides a unique opportunity

    to build up a new FPGA routing structure with the novel use of

    memristors.

    IV. mrFPGA STRUCTURE

    A. Basic Structure

    As discussed in the previous section, there is opportunity to build

    up an FPGA routing structure based on memristors only in addition

    to metal wires. Also memristor-based routing structure of FPGA

    will be placed over the transistor layer in the same die, just like

    the routing structure of ASICs. The basic schematic view of this

    efficient FPGA architecture with memristor-based reconfiguration

    (mrFPGA) is shown in Fig. 4. With the organization of logic blocks

    and programmable interconnects in Fig. 4, the overall mrFPGA

    architecture will become a highly compact array as shown in Fig. 5a.

    From the figure we see that the area of mrFPGA will be reduced

    Fig. 4: mrFPGA architecture: The programmable routing structure

    is built up with metal wires and memristor which are fabricated on

    top of logic blocks in the same die according to existing memristor

    fabrication structures.

    to the total area of the logic blocks only, which takes only 10%to 20% of the conventional FPGA area [4]. Fig. 5b is a detaileddesign of the connection blocks and switch blocks in mrFPGA using

    memristors and metal wires only. In this design we use five metal

    layers M5 to M9 without resource conflict. The circuits of logicblocks in mrFPGA will use the metal layers below them, i.e., M1to M4, which are sufficient for practical FPGAs, e.g. Virtex-6 [26].The memristor layer is designed to locate between the M9 and M8

    layers. We see that the positions of the layers are in the same order as

    the memristor fabrication structure reported in [24] so as to guarantee

    the feasibility of mrFPGA fabrication. Though memristors are located

    close to the top layer in this design, electrical signals do not always

    have to go through all metal layers to reach memristors, since switch

    blocks are in M5 ∼ M9. The signals need to reach M1 or M2 from

    the memristor layer only when they access the pins of logic blocks.

    Moreover, as stated in Section I, the size of a memristor cell can

    be close to or even smaller than the width of a metal wire [9, 10].

    So memristors can easily fit into the design in Fig. 5b without extra

    area overhead. The memristor array can be programmed efficiently

    using V/2 biasing scheme and two-step write operation as proposedin [15].

    B. Interconnects with Adaptive Buffer Insertion

    One potential problem of the basic mrFPGA routing structure is

    that there are no buffers in the structure. It leads to a quadratic

    increase of interconnect delay with the increase of the interconnect

    distance. This problem would undermine the benefit brought by the

    significant reduction of the overall area and interconnect delay in

    the case of large FPGAs. To address this problem, we propose an

    improved FPGA architecture which allows adaptive buffer insertion

    in the interconnects as shown in Fig. 6. A limited number of buffers

    are prefabricated in routing channels and can be connected to the

    tracks in channels via memristors. Whether to insert the prefabricated

    buffers or not and which track uses a buffer depend on the placement

    result of the circuit to implement on mrFPGA. For a routing path

    long enough to exhibit the quadratic increase of RC delay, we will

    let some buffers be connected to the path at proper positions. To get

    the optimal solution of insertion choice for each buffer in the routing

    network, we borrow an idea from the buffer placement algorithm in

    ASIC [27]. This algorithm can decide the best positions of buffers in

    the interconnects of an ASIC using dynamic programming to achieve

    the minimum interconnect delay. It constructs delay/capacitance pairs

    which correspond to different buffer options in a routing tree of an

    ASIC via a depth-first search and produces the optimal option in

    time complexity of O(B2) where B is the number of legal bufferpositions. For mrFPGA applications, we let the buffer at certain

    position be inserted to the routing track if the algorithm decides to

    2011 IEEE/ACM International Symposium on Nanoscale Architectures 3

  • (a) (b)

    Fig. 5: The proposed mrFPGA architecture. (a) Overview of mrFPGA where the FPGA area is contributed by logic blocks only, and (b) A

    detailed design of the connection blocks and switch blocks in mrFPGA using the memristors and metal wire resources in existing memristor

    fabrication structures.

    Fig. 6: An improved mrFPGA architecture with adaptive buffer

    insertion in the programmable interconnects.

    place a buffer at this position. Then whether or not to insert one

    buffer at one routing node in mrFPGA is optimized on demand. In

    contrast in conventional FPGA the connections between buffers and

    routing tracks are predetermined during fabrication. A case study in

    Fig. 7 shows the benefit brought by the adaptive buffer insertion in

    mrFPGA when compared to the fixed buffer pattern in conventional

    FPGA. To simplify the problem, we assume in this case that the RC

    delay of a wire with the length of one block is 0.5RwireCwire = 1ns,and that the buffers in interconnects are considered as ideal buffers

    with infinite drive, no input/output capacitance, and a fixed intrinsic

    delay of 9ns. The optimal length of a wire between two adjacent

    buffers would be

    k =

    rTbuffer

    0.5RwireCwire= 3

    In conventional FPGA, the pattern of pre-fabricated buffers in inter-

    connects will follow this result to distribute buffers evenly with a

    distance of three blocks (as shown in Fig. 7). The problem is that

    it does not know the starting point of each net (the offset can be

    0, 1 or 2). So it just staggers the patterns among different tracks

    Fig. 7: Comparison between fixed buffer patten in conventional FPGA

    and adaptive buffer insertion in mrFPGA.

    in the same channel as shown in Fig. 7. When the output pin of

    a logic block happens to be connected to a wire segment that is

    buffered right after the connection node, just like the block “start”

    in Fig. 7 connected to track3 in the upper channel, the routing result

    will deviate from the optimal. The problem can be even worse during

    the switch from one track to another. A timing-driven design, such

    as [28], has to always buffer the switches between tracks to avoid the

    potential large RC delay caused by unbuffered connection between

    the two longest unbuffered track segments, in this case as large as

    36ns if the two track1s in the channels are connected without a buffer

    for example. It drives the routing result farther from the optimal. The

    table in Fig. 7 lists all the possible delays from the logic block “start”

    to the logic block “end” according to the settings in a conventional

    FPGA discussed above. The first column is the track ID in the upper

    channel, and the first row is the track ID in the right channel. The

    buffers at the output pin of the “start” block and the input pin of

    the “end” block are not counted in the delay calculation. As shown

    4 2011 IEEE/ACM International Symposium on Nanoscale Architectures

  • in Fig. 7 the worst case can be 22% slower than the best case in aconventional FPGA. On the contrary, the adaptive buffer insertion in

    mrFPGA will not suffer from the deviation from the optimal buffer

    placement in interconnects as does the conventional FPGA. We see

    from Fig. 7 that in mrFPGA buffers are inserted just exactly one per

    three blocks, resulting in a total delay of only 45ns. It is even better

    than the best case in a conventional FPGA.

    V. CAD FLOW AND EVALUATION OF mrFPGA

    A. CAD Flow

    To implement a circuit design in FPGA, a complete CAD flow is

    needed. The input of the flow is the design to be implemented and the

    output of the flow includes the placement and routing (P&R) result

    used for programming the design into FPGA as well as an estimation

    of area, delay and power consumption for the implementation of the

    design in FPGA. We propose to use a timing-driven CAD flow for our

    mrFPGA as shown in Fig. 8. A circuit design first goes through the

    Fig. 8: CAD flow for mrFPGA.

    design tool ABC [29] to perform logic optimization and technology

    mapping. A mapped netlist consisting of K-LUTs will be obtained.Then the mapped netlist is fed into the design tool T-VPACK [30]

    to pack LUTs to logic blocks and then into the design tool mrVPR

    developed by us to accomplish P&R. At last we use the generated

    basic-cell (BC) netlist with delay and capacitance information to run

    the power estimation tool fpgaEVA LP2 [7]. Since we try to reuse

    most parts of the existing CAD flow for conventional FPGAs, the

    development of mrFPGA CAD flow is quite easy. Only the P&R

    tool is specifically developed for mrFPGA. We will introduce the

    features of this advanced tool in the last part of this section.

    B. Equivalent Circuit Model for Interconnect

    To perform the timing and power analysis for mrFPGA, we first

    develop an equivalent circuit model for the routing structure of

    mrFPGA. Fig. 9 shows the equivalent circuit of a representative

    routing path from the output pin of a logic block to the input pin

    of another logic block in mrFPGA and makes a comparison with

    that of a conventional FPGA. In the circuit model, the MUXs in

    connection blocks and switch blocks are replaced by the memristors

    at low resistance state (LRS). The wire segments are still modeled

    as distributed RC lines. Notice that the overall resistance value

    and capacitance value of wire segments have changed due to the

    reduction of the tile area in mrFPGA. Also, according to the mrFPGA

    architecture in Fig. 5a, the length of a wire segment is one tile shorter

    than its counterpart in a conventional FPGA. The easiest way to see

    Fig. 9: Extracted equivalent circuit model of the routing structure of

    mrFPGA and comparison with conventional FPGA.

    it is to think about the wire segment with the length of one tile.

    In mrFPGA the wire segment with the length of one tile is just the

    short segment between two logic blocks, of which the length is close

    to zero compared to the tile side. However the length of one tile

    does not disappear. It is counted in the switch block according to the

    mrFPGA architecture shown in Fig. 5b.

    C. Shielding Effect of Memristor

    During the development of the circuit models for the routing structure

    of mrFPGA, we find out that memristor has a shielding effect to

    help reduce the interconnect delay of mrFPGA further. The shielding

    effect of memristors at a high resistance state (HRS) can remove the

    downstream capacitance of unused paths from the total capacitance

    at one node and then reduce the RC delay to reach this node. Fig. 10

    shows an example of the effect in a switch block when compared

    to a conventional FPGA. In the conventional FPGA, in spite of the

    (a) Conventional FPGA. (b) mrFPGA.

    Fig. 10: Memristors provide capacitance shielding from unused paths.

    fact that the lower two paths are not used as shown in Fig. 10a, the

    downstream capacitances of these two paths are still added to the total

    capacitance at node A. In contrast, in mrFPGA only the downstream

    capacitance of a path in use will be added to the total capacitance at

    node A. The huge resistance value of memristors at HRS provides

    capacitance shielding from unused paths. We use HSPICE simulation

    to verify whether the resistance of memristor at HRS is high enough

    to be qualified as an ideal open switch for the shielding effect. In the

    mrFPGA circuit shown in Fig. 10b, we set R as 1×103Ω, and C andCload as 1×10

    −12F. For the actual memristor resistance value, we set

    Rlrs as 1×103Ω according to [24, 25] and sweep Rhrs from 1×10

    3Ω,the same value as Rlrs, to 1× 10

    6Ω. At this starting point of sweep,the three paths are equivalent and the three downstream capacitances

    2011 IEEE/ACM International Symposium on Nanoscale Architectures 5

  • will be all added to node A in the path from node “start” to node

    “end” just like a conventional FPGA. We want to see how large Rhrsneeds to be to make the path delay from node “start” to node “end”

    close enough to that in the case where Rhrs are replaced by idealopen switches. The result obtained by the HSPICE simulations is

    shown in Fig. 11. We see that with the increase of Rhrs, the delay

    103

    104

    105

    106

    3

    3.5

    4

    4.5

    5

    5.5

    6

    Rhrs

    (Ω)

    dela

    y (n

    s)

    shielded by open switch

    shielded by memristor

    Actual Memristor Region

    Fig. 11: The impact of the memristor shielding effect on the path

    delay.

    approaches to the value with Rhrs replaced by ideal open switches.The resistance value of an actual memristor at HRS is more than two

    orders of magnitude higher than that at LRS as reported in [24, 25].

    We mark the actual memristor region according to this criterion and

    see that the delay within this region is very close to the ideal situation.

    This proves that the shielding effect of memristor can help reduce

    path delay further in mrFPGA.

    D. mrVPR: an advanced P&R Tool for mrFPGA

    To deal with the novel routing structure of mrFPGA in the P&R step

    of CAD flow, we develop an advanced tool named mrVPR (VPR for

    memristor-based reconfiguration) on the base of the commonly-used

    FPGA P&R tool VPR [30]. The main contributions of this tool are

    (a) VPR for conventional FPGA (b) Our mrVPR for mrFPGA

    Fig. 12: Overview of routing resource in the two tools VPR and

    mrVPR.

    as follows:

    1) mrVPR can deal with the routing graph of mrFPGA as shown

    in Fig. 12b. In Fig. 12b the gray blocks are I/O blocks and

    logic blocks of FPGA and the diagonal wires are the routing

    paths available in switch blocks. We see that the switch blocks

    and connection blocks in mrVPR are placed over logic blocks

    and provide connections for logic blocks nearby in a different

    way from that of a conventional FPGA shown in Fig. 12a.

    2) mrVPR can reflect the memristor shielding effect. In VPR all

    the device capacitances with prefabricated connections to a

    node will be added to the capacitance of that node regardless

    of whether the paths which these devices belong to will be

    included in the routing result or not. In mrVPR, only when

    a routing path is used, the capacitances on that path will be

    added to the corresponding routing nodes.

    3) mrVPR is integrated with the algorithm to generate the optimal

    option for adaptive buffer insertion. We have implemented the

    algorithm in mrVPR and it shows where buffers are needed for

    insertion in the final routing result as shown in Fig. 13. We

    see that the positions for buffers to insert are marked with the

    black delta shape. We also see that the interconnect view of

    the routing result in Fig. 13b is quite similar to that of ASIC.

    We expect to see the good performance of mrFPGA in the next

    section.

    (a) Routing resource view. (b) Routing result view

    Fig. 13: View of the optimal option for buffer insertion in mrVPR.

    VI. EXPERIMENTAL RESULTS

    A. Settings

    We choose Xilinx Virtex-6 FPGA platform as our conventional FPGA

    model baseline in experiments and conduct evaluation of mrFPGA

    at the same technology node. The architecture logic specification

    of Virtex-6 needed by the CAD flow in Fig. 8 are collected from

    Virtex-6 FPGA data sheets (Configurable Logic Block part) [26].

    As for the architecture routing specification of Virtex-6, such as the

    channel width and the distribution of wire segment lengths (Single,

    Double, Quad, Long and Global), are extracted with the help of

    FPGA Editor in Xilinx ISE environment. The RC delay values of

    each type of wire segments are calculated from the unit resistance

    and capacitance values in the lookup tables of ITRS2010 [31]. The

    timing parameters of buffers inserted in mrFPGA routing structure,

    including driving resistance, input capacitance and intrinsic delay, are

    obtained via HSPICE simulation using PTM device models [32, 33].

    The memristor model is extracted from the measurement results of

    the memristors fabricated by the process in [24], the same process on

    which mrFPGA is based. The 20 largest MCNC benchmark circuits

    are used as the input of the CAD flow for conventional FPGA and

    mrFPGA respectively and comparisons are made on the output of the

    CAD flow from the aspects of area, delay and power.

    B. Evaluation Results

    Fig. 14 shows the comparison of the tile area in three FPGA

    architectures: the baseline Virtex-6 FPGA, its mrFPGA counterpart

    6 2011 IEEE/ACM International Symposium on Nanoscale Architectures

  • without buffer insertion (a fictitious case to show the impact of

    buffer insertion) and mrFPGA with buffer insertion respectively. As

    (a) Virtex-6 baseline. Total area = 13993μm2.

    (b) mrFPGA unbuffered. (c) mrFPGA. Total area = 2547μm2.

    Fig. 14: Area comparison of a single tile in three architectures (unit:

    μm2). (LB: logic block; CB: connection block; SB: switch block;Buf: pre-fabricated buffers to insert on demand in mrFPGA).

    discussed in the previous sections, in the case of mrFPGA unbuffered,

    the area contribution of the interconnects is reduced to zero since the

    interconnects are purely made up of memristors and metal layers

    located on top of logic blocks. For mrFPGA with buffer insertion,

    we have to count the area of pre-fabricated buffers in the channels

    according to the architecture in Fig. 6. The area contributed by buffer

    insertion in mrFPGA is pessimistically calculated as the case of a

    uniform distribution of pre-fabricated buffers over channels, with the

    number of buffers per channel set as the maximum number of buffers

    required by mrVPR to insert in one channel of mrFPGA over the 20

    benchmarks. We see in Fig. 14 that the area overhead brought by

    pre-fabricated buffers is low compared to the interconnect area of

    conventional FPGA. That is mainly due to the on-demand property

    of buffer solution in mrFPGA. Results in Fig. 14 shows more than

    5.5x saving of total area for mrFPGA. Table I shows the performance

    comparison between Virtex-6 baseline and mrFPGA. It shows a small

    performance degradation before buffer insertion in mrFPGA and a

    2.28x speedup after buffer insertion. The total speedup primarily

    stems from the reduction of interconnect delay. In the routing

    structure of mrFPGA, the reduction of tile area by ∼5.5x results in

    a shortening of wire segments in the programmable interconnects by

    ∼2.35 times. Then both the resistance and capacitance values of wire

    segments decrease by ∼2.35 times, contributing to the total reduction

    of RC delays of the wire segments by ∼5.5 times. The shielding

    effect of memristor also helps to improve performance. The quadratic

    increase of the RC delay of a pure RC network cancels out the benefit

    in mrFPGA without buffer insertion. We see in Table I that for some

    benchmarks with long interconnect paths between logic blocks, the

    pure RC delay without buffers is kind of large. However in mrFPGA

    with buffer insertion the interconnects perform much better than

    Virtex-6 baseline and mrFPGA unbuffered. The good performance

    also stems from the adaptive buffer insertion in mrFPGA. Table II

    shows the power consumption by Virtex-6 basline and mrFPGA

    respectively. An average of 40% power savings is achieved. Thepower savings primarily come from the replacement of transistors in

    the programmable interconnects, such as MUXs and SRAMs, with

    the simple structure of memristors. Both dynamic power and static

    power are reduced due to less capacitance on the routing paths and

    the non-volatility of memristor. Notice that no die-stacking is needed

    to achieve this degree of area reduction and speedup. If 3D integration

    technology is introduced in the future to stack several mrFPGAs

    together, at least two more times of improvements in density and

    speedup, i.e. 10x and 4.5x respectively, are both expected, according

    to the experimental results on 3D architecture in [17].

    VII. CONCLUSIONS

    This work presents a novel FPGA architecture with memristor-based

    reconfiguration named mrFPGA. The programmable interconnects of

    mrFPGA use memristors and metal wires only. The routing structure

    is then able to be placed over logic blocks in the same die according to

    the existing memristor fabrication structure. An improved architecture

    with adaptive buffer insertion is proposed to further reduce the

    interconnect delay. A complete CAD flow is provided for mrFPGA

    with an advanced P&R tool named mrVPR developed for mrFPGA.

    The tool can deal with the novel routing structure of mrFPGA,

    the memristor shielding effect, and the algorithm of optimal buffer

    insertion. An evaluation of mrFPGA is done on the 20 largest MCNC

    benchmark circuits. Results show that mrFPGA achieves a 5.18x area

    savings, a 2.28x speedup and a 1.63x power savings.

    REFERENCES

    [1] P. Schaumont and I. Verbauwhede, “Domain-specific codesign for em-bedded security,” Computer, vol. 36, no. 4, pp. 68–74, Apr. 2003.

    [2] I. Kuon and J. Rose, “Measuring the Gap Between FPGAs and ASICs,”IEEE Transactions on Computer-Aided Design of Integrated Circuits

    and Systems, vol. 26, no. 2, pp. 203–215, Feb. 2007.

    [3] J. Cong, V. Sarkar, G. Reinman, and A. Bui, “Customizable Domain-Specific Computing,” IEEE Design and Test of Computers, vol. 28, no. 2,pp. 6–15, Mar. 2011.

    [4] V. George, “Low Energy Field-Programmable Gate Array,” Ph.D. dis-sertation, UC Berkeley, 2000.

    [5] A. DeHon, “Reconfigurable Architectures for General-Purpose Comput-ing,” Ph.D. dissertation, MIT, Dec. 1996.

    [6] E. Ahmed and J. Rose, “The Effect of LUT and ClusterSize on Deep-Submicron FPGA Performance and Density,” IEEE Transactions onVLSI Systems, vol. 12, no. 3, pp. 288–298, Mar. 2004.

    [7] F. Li et al., “Power modeling and characteristics of field programmablegate arrays,” IEEE Transactions on Computer-Aided Design of IntegratedCircuits and Systems, vol. 24, no. 11, pp. 1712–1724, Nov. 2005.

    [8] D. B. Strukov, G. S. Snider, D. R. Stewart, and R. S. Williams, “Themissing memristor found.” Nature, vol. 453, no. 7191, pp. 80–3, May2008.

    [9] Y. S. Chen et al., “Highly scalable hafnium oxide memory withimprovements of resistive distribution and read disturb immunity,” inIEDM Technical Digest, Dec. 2009, pp. 1–4.

    [10] C.-h. Wang et al., “Three-Dimensional 4F2 ReRAM Cell with CMOSLogic Compatible Process,” in IEDM Technical Digest, 2010, pp. 664–667.

    [11] S.-s. Sheu et al., “A 5ns Fast Write Multi-Level Non-Volatile 1 Kbits RRAM Memory with Advance Write Scheme,” in VLSI Circuits,Symposium on, 2009, pp. 82–83.

    [12] M. Mahvash and A. Parker, “A memristor SPICE model for designingmemristor circuits,” in MWSCAS, no. 2, 2010, pp. 989–992.

    [13] S. Yu, J. Liang, Y. Wu, and H.-S. P. Wong, “Read/write schemes analysisfor novel complementary resistive switches in passive crossbar memoryarrays.” Nanotechnology, vol. 21, no. 46, p. 465202, Oct. 2010.

    [14] M. Liu and W. Wang, “rFPGA: CMOS-nano hybrid FPGA using RRAMcomponents,” in NanoArch, Jun. 2008, pp. 93–98.

    [15] C. Xu, X. Dong, N. P. Jouppi, and Y. Xie, “Design Implications ofMemristor-Based RRAM Cross-Point Structures,” in DATE, 2011.

    [16] G. Lemieux and D. Lewis, “Circuit design of routing switches,” Inter-national Symposium on FPGAs, p. 19, 2002.

    [17] M. Lin, A. El Gamal, Y.-C. Lu, and S. Wong, “Performance Benefits ofMonolithically Stacked 3-D FPGA,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 26, no. 2, pp.681–229, Feb. 2007.

    [18] Y. Chen, J. Zhao, and Y. Xie, “3D-nonFAR: Three-Dimensional Non-Volatile FPGA ARchitecture Using Phase Change Memory,” in ISLPED,2010, p. 55.

    [19] C. Dong, D. Chen, S. Haruehanroengra, and W. Wang, “3-D nFPGA:A Reconfigurable Architecture for 3-D CMOS/Nanomaterial HybridDigital Circuits,” IEEE Transactions on Circuits and Systems I: RegularPapers, vol. 54, no. 11, pp. 2489–2501, Nov. 2007.

    [20] C. Chen et al., “Efficient FPGAs using nanoelectromechanical relays,”International Symposium on FPGAs, p. 273, 2010.

    [21] P.-e. Gaillardon et al., “Emerging memory technologies for reconfig-urable routing in FPGA architecture,” in ICECS. IEEE, Dec. 2010, pp.62–65.

    2011 IEEE/ACM International Symposium on Nanoscale Architectures 7

  • TABLE I: Critical Path Delay and Comparison (unit: ns)Virtex-6 Baseline mrFPGA unbuffered mrFPGA

    Interconnect Delay Total Delay Interconnect Delay Total Delay Interconnect Delay Total Delay

    alu4 6.62 9.54 5.53 (1.19x) 8.76 (1.08x) 1.52 (4.35x) 4.75 (2.00x)

    apex2 7.84 10.4 6.96 (1.12x) 10.2 (1.01x) 1.52 (5.15x) 5.12 (2.04x)

    apex4 6.79 9.71 7.76 (0.87x) 10.3 (0.94x) 1.35 (4.99x) 4.58 (2.11x)

    bigkey 10.9 13.3 37.7 (0.29x) 40.1 (0.33x) 3.98 (2.75x) 4.97 (2.68x)

    clma 12.0 14.6 22.7 (0.53x) 24.3 (0.60x) 2.80 (4.31x) 5.27 (2.78x)

    des 15.1 17.9 40.8 (0.37x) 43.6 (0.41x) 5.59 (2.71x) 8.02 (2.23x)

    diffeq 4.11 5.37 2.72 (1.50x) 5.09 (1.05x) 0.34 (11.7x) 3.27 (1.63x)

    dsip 9.61 10.9 27.2 (0.35x) 28.5 (0.38x) 2.52 (3.80x) 4.89 (2.22x)

    elliptic 5.93 8.36 10.9 (0.54x) 13.0 (0.64x) 1.72 (3.43x) 4.09 (2.04x)

    ex1010 14.2 16.8 22.1 (0.64x) 25.0 (0.67x) 3.63 (3.90x) 6.92 (2.42x)

    ex5p 7.23 10.1 5.42 (1.33x) 8.34 (1.21x) 0.76 (9.43x) 4.30 (2.35x)

    frisc 8.97 9.92 11.9 (0.75x) 13.2 (0.74x) 0.71 (12.5x) 4.87 (2.03x)

    misex3 6.20 9.06 5.73 (1.08x) 8.65 (1.04x) 1.04 (5.91x) 4.58 (1.97x)

    pdc 11.1 14.4 18.6 (0.59x) 22.3 (0.64x) 2.25 (4.93x) 6.22 (2.32x)

    s298 8.39 11.1 8.29 (1.01x) 10.1 (1.09x) 0.85 (9.86x) 4.27 (2.59x)

    s38417 8.39 11.1 8.29 (1.01x) 10.1 (1.09x) 0.85 (9.86x) 4.27 (2.59x)

    s38584.1 8.39 11.1 8.29 (1.01x) 10.1 (1.09x) 0.85 (9.86x) 4.27 (2.59x)

    seq 7.45 10.6 7.84 (0.95x) 10.3 (1.03x) 1.40 (5.30x) 4.63 (2.30x)

    spla 10.8 13.8 11.9 (0.91x) 15.1 (0.91x) 1.88 (5.77x) 5.48 (2.52x)

    tseng 4.90 7.27 7.62 (0.64x) 9.99 (0.72x) 0.94 (5.16x) 3.31 (2.19x)

    average - - - (0.83x) - (0.83x) - (6.29x) - (2.28x)

    TABLE II: Power Consumption and Comparison (unit: mW)Virtex-6 Baseline mrFPGA unbuffered mrFPGA

    Interconnect Power Total Power Interconnect Power Total Power Interconnect Power Total Power

    alu4 34.4 52.0 6.57 (5.24x) 24.1 (2.15x) 10.8 (3.18x) 28.4 (1.83x)

    apex2 40.1 58.5 7.36 (5.44x) 25.8 (2.26x) 11.4 (3.50x) 29.9 (1.95x)

    apex4 26.5 42.8 4.37 (6.07x) 20.6 (2.07x) 6.77 (3.92x) 23.0 (1.85x)

    bigkey 49.6 375. 6.34 (7.83x) 332. (1.13x) 12.9 (3.82x) 338. (1.10x)

    clma 75.0 139. 9.24 (8.12x) 73.2 (1.89x) 16.1 (4.65x) 80.1 (1.73x)

    des 112. 558. 19.3 (5.79x) 465. (1.19x) 31.5 (3.56x) 477. (1.16x)

    diffeq 15.7 32.5 2.16 (7.29x) 18.9 (1.71x) 3.81 (4.13x) 20.6 (1.57x)

    dsip 79.9 409. 11.4 (6.97x) 341. (1.20x) 20.9 (3.82x) 350. (1.16x)

    elliptic 52.7 157. 7.28 (7.23x) 111. (1.40x) 12.9 (4.07x) 117. (1.33x)

    ex1010 61.3 109. 7.92 (7.74x) 56.3 (1.94x) 13.6 (4.48x) 62.1 (1.76x)

    ex5p 20.6 34.7 3.40 (6.06x) 17.4 (1.98x) 5.47 (3.77x) 19.5 (1.77x)

    frisc 28.1 60.5 2.87 (9.76x) 35.2 (1.71x) 5.25 (5.34x) 37.6 (1.60x)

    misex3 33.4 50.4 6.31 (5.29x) 23.2 (2.16x) 10.0 (3.33x) 26.9 (1.86x)

    pdc 57.3 97.1 8.48 (6.75x) 48.2 (2.01x) 13.8 (4.14x) 53.6 (1.81x)

    s298 16.0 28.6 2.30 (6.95x) 14.9 (1.92x) 3.81 (4.20x) 16.4 (1.74x)

    s38417 16.0 28.6 2.30 (6.95x) 14.9 (1.92x) 3.81 (4.20x) 16.4 (1.74x)

    s38584.1 16.0 28.6 2.30 (6.95x) 14.9 (1.92x) 3.81 (4.20x) 16.4 (1.74x)

    seq 37.1 55.6 6.75 (5.49x) 25.2 (2.20x) 10.9 (3.39x) 29.4 (1.88x)

    spla 51.1 87.1 8.28 (6.17x) 44.2 (1.96x) 13.0 (3.90x) 49.0 (1.77x)

    tseng 20.4 70.8 2.50 (8.19x) 52.8 (1.34x) 4.93 (4.14x) 55.3 (1.28x)

    average - - - (6.81x) - (1.80x) - (3.99x) - (1.63x)

    [22] D. B. Strukov and K. K. Likharev, “CMOL FPGA: a reconfigurablearchitecture for hybrid digital circuits with two-terminal nanodevices,”Nanotechnology, vol. 16, no. 6, pp. 888–900, Jun. 2005.

    [23] P.-E. Gaillardon, F. Clermidy, I. O’Connor, and J. Liu, “Interconnectionscheme and associated mapping method of reconfigurable cell matricesbased on nanoscale devices,” in NanoArch, Jul. 2009, pp. 69–74.

    [24] K. Tsunoda et al., “Low Power and High Speed Switching of Ti-dopedNiO ReRAM under the Unipolar Voltage Source of less than 3 V,” inIEDM Technical Digest, Dec. 2007, pp. 767–770.

    [25] R. Huang et al., “Resistive switching of silicon-rich-oxide featuring highcompatibility with CMOS technology for 3D stackable and embeddedapplications,” Applied Physics A, pp. 927–931, Mar. 2011.

    [26] Xilinx, “Virtex-6 FPGA data sheets.” [Online]. Available:http://www.xilinx.com/support/documentation/virtex-6.htm

    [27] L. van Ginneken, “Buffer placement in distributed RC-tree networks forminimal Elmore delay,” in ISCAS, 1990, pp. 865–868.

    [28] I. Kuon and J. Rose, “Area and delay trade-offs in the circuit andarchitecture design of FPGAs,” in International Symposium on FPGAs,

    2008, p. 149.

    [29] “Berkeley Logic Synthesis and Verification Group, ABC: A Systemfor Sequential Synthesis and Verification, Release 61225.” [Online].Available: http://www.eecs.berkeley.edu/∼alanmi/abc/

    [30] V. Betz, J. Rose, and A. Marquardt, Architecture and CAD for Deep-Submicron FPGAs. Norwell: MA:Kluwer, 1999.

    [31] “International Technology Roadmap for Semiconduc-tors,” Tech. Rep., 2010. [Online]. Available:http://www.itrs.net/Links/2010ITRS/Home2010.htm

    [32] W. Zhao and Y. Cao, “Predictive Technology Model (PTM) website.”[Online]. Available: http://ptm.asu.edu/

    [33] W. Zhao and Y. Cao, “New Generation of Predictive Technology Modelfor Sub-45nm Design Exploration,” in ISQED, 2006, pp. 585–590.

    8 2011 IEEE/ACM International Symposium on Nanoscale Architectures

    /ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 150 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages false /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 1200 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 600 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False

    /Description >>> setdistillerparams> setpagedevice