of 68 /68
www.xilinx.com/xcell/ SOLUTIONS FOR A PROGRAMMABLE WORLD Xcell journal Xcell journal ISSUE 80, THIRD QUARTER 2012 FPGA-Based Instrumentation Withstands the Chill of Deep Space Partial Dynamic Reconfiguration Fuels FSK Demodulator Design Demystifying FPGA Mathematics Ins and Outs of ADCs and DACs Xilinx Rolls World’s First Heterogeneous 3D FPGA Virtex-7 H580T Device Enables 2x100G Transponder-on-Chip for CFP2 Optical Nets Artix-7 FPGA Brings High-End Value to Low-Cost Market page 14

Xcell Journal issue 80

Embed Size (px)

DESCRIPTION

The summer 2012 edition of Xcell Journal magazine features in-depth looks at Xilinx’s Virtex®-7 H580T, the world’s first heterogeneous 3D FPGA, and the Artix™-7 A100T, the first device shipping from Xilinx’s feature-rich, low-power, low-cost 28nm generation of All Programmable devices. The feature also includes several informative how-to and design methodology articles from the Xilinx user community.

Text of Xcell Journal issue 80

  • www.xilinx.com/xcell/

    S O L U T I O N S F O R A P R O G R A M M A B L E W O R L D

    Xcell journalXcell journalI S SUE 80 , TH IRD QUAR TER 2012 FPGA-Based Instrumentation Withstands the Chill of Deep Space

    Partial Dynamic Reconfiguration Fuels FSK Demodulator Design

    Demystifying FPGA Mathematics

    Ins and Outs of ADCs and DACs

    Xilinx Rolls Worlds First Heterogeneous 3D FPGAVirtex-7 H580T Device Enables 2x100GTransponder-on-Chip for CFP2 Optical Nets

    Artix-7 FPGA Brings High-EndValue to Low-Cost Market

    page14

  • New stacked silicon architecture from Xilinx makes your big design much easier to prototype.Partitioning woes are forgotten, and designs run at near final chip speed. The DINI Group DNV7F1board puts this new technology in your hands with a board that gets you to market easier, faster andmore confident of your designs functionality running at high speed. DINI Group engineers put thefeatures you need most, right on the board:

    10GbE

    USB 2

    PCIe, Gen 1, 2, and 3

    240 pin UDIMM for DDR3

    There is a Marvel Processor for any custom interfaces you might need andplenty of power and cooling for high speed logic emulation. Software andfirmware developers will appreciate the productivity gains that come with thislow cost, stand-alone development platform.

    Prototyping just got a lot easier, call DINI today and get your chip up to speed.

    www.dinigroup.com 7469 Draper Avenue La Jolla, CA 92037 (858) 454-3419 e-mail: [email protected]

  • L E T T E R F R O M T H E P U B L I S H E R

    Xilinx, Inc.2100 Logic DriveSan Jose, CA 95124-3400Phone: 408-559-7778FAX: 408-879-4780www.xilinx.com/xcell/

    2012 Xilinx, Inc. All rights reserved. XILINX, the Xilinx Logo, and other designated brands includedherein are trademarks of Xilinx, Inc. All other trade-marks are the property of their respective owners.

    The articles, information, and other materials includedin this issue are provided solely for the convenience ofour readers. Xilinx makes no warranties, express,implied, statutory, or otherwise, and accepts no liabilitywith respect to any such articles, information, or othermaterials or their use, and any use thereof is solely atthe risk of the user. Any person or entity using suchinformation in any way releases and waives any claim itmight have against Xilinx for any loss, damage, orexpense caused thereby.

    PUBLISHER Mike [email protected]

    EDITOR Jacqueline Damian

    ART DIRECTOR Scott Blair

    DESIGN/PRODUCTION Teie, Gelwicks & Associates1-800-493-5551

    ADVERTISING SALES Dan [email protected]

    INTERNATIONAL Melissa Zhang, Asia [email protected]

    Christelle Moraga, Europe/Middle East/[email protected]

    Miyuki Takegoshi, [email protected]

    REPRINT ORDERS 1-800-493-5551

    Xcell journal

    www.xilinx.com/xcell/

    Hardware-Assisted Prototyping: Make vs. Buy?

    A s long as there are ASICs, ASSPs, microprocessors and other types of digital ICsfollowing the silicon process technology curve, there will be a need to prototypethose devices in real hardware before they go to production. But is it easier tomake or buy a prototyping system? Thats the question panelists attacked in a recentPavilion Panel session at the 49th Design Automation Conference in San Francisco.

    Gabe Moretti, EDA veteran and owner of the popular website GabeonEDA.com, moder-ated the panel, entitled Hardware-Assisted Prototyping and Verification: Make vs. Buy?Gabes panelists were Qualcomm engineering director Albert Camilleri; Austin Lesea ofXilinx Research Labs, and co-author of the book The FPGA Prototyping MethodologyManual; and Mike Dini, CEO of the hardware-assisted verification company the Dini Group.

    Panelists agreed that with SoCs now reaching well over 100 million gates, its imperativeto emulate the chip you are designing before it goes to manufacturing so as to reduce corner-case bugs and minimize respins. Likewise, as software becomes a greater part of theoverall system development, hardware-assisted prototyping systems, such as those offeredby the Dini Group and Synopsys HAPS group, are becoming an imperative.

    I look at the size and complexity of IC designs people are doing today, and Im not sureId want to build a prototype for something this large, said Moretti. The commercial sys-tems are pretty expensive, but Im not sure if it makes more sense today to buy one ratherthan build one. So, should I build one or buy one? Moretti asked panelists.

    I think a lot of people start out thinking there are some good reasons to build their own,but quickly realize it probably would have been easier to buy one, said Lesea. In the mod-ern SoC world, about 80 percent of the project is software. The sooner you can get somethingfor the software team to work on, the sooner your project is not stuck. The time you savein time-to-market pretty much pays for the commercial prototyping system and more.

    Ive long argued that unless software guys have real hardware working at some rea-sonable frequency, they cant really do effective development, added Dini. Unless theycan get a real blue screen of death, if they cant run an interrupt into the weeds and smokea piece of hardware, generally development slows or stalls pure software-only simula-tion cant cover enough corner cases.

    Camilleri said he has both built and purchased prototyping systems depending on whata given design required and what commercial offerings were available. If there is alreadya commercial offering that will do what you need, then why build one? Camilleri noted,however, that sometimes a designs performance requirements will force design groups tobuild a custom system. But he said that in general its only a good idea if you are buildingit for a specific project in which everyone on the design team is intimately aware of all thenuances. If you are building a prototyping system for your division, for example, that canbe a huge undertaking, he said. Its easy to underestimate the time it takes and the quali-ty of EDA software it takes to develop a commercial-class prototyping system.

    With a few exceptions in the emulation world, the vast majority of emulation andprototyping systems are FPGA based.

    Panelists were especially encouraged by FPGAs vendors embrace of 3D IC technologyto exceed the doubling of capacity for the next generation of FPGAs that one typicallyexpects with each new silicon process technology. The Virtex-7 2000T is a game changer,said Dini. Panelists said that the fewer FPGAs there are in an emulation system, the easierit is to partition a design and stabilize it in the prototyping system. An FPGA with massivecapacity allows ever-larger designs, with hundreds of millions of ASIC gates, to get tomarket much sooner.

    Mike SantariniPublisher

  • ww

    w.a

    ldec

    .com

    2012 Aldec and Xilinx are trademarks of Aldec, Inc. and Xilinx, Inc. respectively. All other trademarks or registered trademarks are property of their respective owners.

    Headquarters-US2260 Corporate Circle Henderson, NV 89074USA

    Phone: +1.702.990.4400E-mail: [email protected]

    EuropeMercia House51 The Green, South BarBanbury, OX16 9ABUnited Kingdom

    Phone: +44.1295. 20.1240 Email: [email protected]

    JapanShinjyuku Estate Bldg. 9F1-34-15, Shinjyuku Shinjyuku-ku, Tokyo 160-0022Japan

    Phone: +81.3.5312.1791Email: [email protected]

    ChinaSuite 2004, BaoAn Building#800 DongFang RoadPuDong DistrictShanghai City, 200122, P.R. China

    Phone: +86.21.6875.2030Email: [email protected]

    Israel1 Haofe StreetKadima 60920Israel

    Phone: +972.072.2288316E-mail: [email protected]

    TaiwanNo. 37, Section 2Liujia 5th RoadHsinchu County 302Zhubei City, Taiwan

    Phone: +886.3.6587712E-mail: [email protected]

    India#2145, 17th Main 2nd Cross, HAL 2nd Stage IndiranagarBangalore, 560008, India

    Phone: +91.80.3255.1030Email: [email protected]

    VHDL, Verilog, SystemVerilogOVM/UVM, VMMCode and Functional CoverageDSP Co-Simulation (MATLAB & Simulink)Emulation (Hardware-Assisted Verication)

    Your Global Verication Solutions ProviderXilinx

    V

    THE DESIGN VERIFICATION COMPANYR

    Verification_2012.07.pdf 1 7/20/2012 2:07:41 PM

  • C O N T E N T S

    VIEWPOINTS XCELLENCE BY DESIGN APPLICATION FEATURES

    1414

    2020

    Cover StoryXilinx Introduces First Heterogeneous 3D FPGA: Virtex-7 H580T

    88

    Product Feature

    Artix-7 FPGA Brings High-End Value to Low-Cost Market14

    Xcellence in Aerospace & Defense

    FPGA-Based Instrumentation Withstandsthe Chill of Deep Space 24

    Xcellence in Green Technology

    Using Spartan Technology to SupportGreen Energy Development 28

    Xcellence in Solid-State Storage

    Designing a 19-nm Flash PCIe SSD with Kintex-7 FPGAs 32

    Letter From the PublisherHardware-Assisted Prototyping:

    Make vs. Buy? 4

    Xpert OpinionFPGAs Head for the Cloud 20

    2828

  • T H I R D Q U A R T E R 2 0 1 2 , I S S U E 8 0

    Xperts Corner

    How Partial Dynamic ReconfigurationHelped Build an FSK Demodulator 38

    Xplanation: FPGA 101

    The Basics of FPGA Mathematics 44

    Xplanation: FPGA 101

    The FPGA Engineers Guide to Using ADCs and DACs 50

    THE XILINX XPERIENCE FEATURES

    5050

    Excellence in Magazine & Journal Writing2010, 2011

    Excellence in Magazine & Journal Design and Layout2010, 2011, 2012

    Profiles of Xcellence The skys the limit for SSD enterprise storage startup Skyera 54

    Tools of Xcellence Xilinx FPGAs improve beamforming system design 58

    Xtra, Xtra Your questions answered on the new Vivado Design Suite 64

    Xclamations! Share your wit and wisdom by supplyinga caption for our techy cartoon. Three chances to win

    an Avnet Spartan-6 LX9 MicroBoard! 66

    XTRA READING

    44442424

  • COVER STORY

    Xilinx Introduces FirstHeterogeneous 3D FPGA:

    Virtex-7 H580T

    Xilinx Introduces FirstHeterogeneous 3D FPGA:

    Virtex-7 H580T

    8 Xcell Journal Third Quarter 2012

  • C O V E R S T O R Y

    by Mike SantariniPublisher, Xcell JournalXilinx, [email protected]

    Built with Xilinx 3D SSI technology,this device enables designers to createa 2x100G OTN transponder-on-a-chip.

    Hot on the heels ofbreaking capacityand transistor-countrecords with therelease of its 28-nanometer Virtex-7

    2000T (the industrys first 28-nm FPGAimplemented in 3D stacked-siliconinterconnect technology), Xilinx inMay released a device using SSI tech-nology that breaks the record for FPGAbandwidth. The new Virtex-7 H580Tdevice is the worlds first heteroge-neous 3D FPGA, integrating a dedicat-ed eight-channel 28-Gbps transceiverslice (or die) alongside two transceiver-rich FPGA dice on a single silicon inter-poser. All told, the new product giveswired communications companies adevice with up to forty-eight 13.1-Gbpstransceivers as well as the eight 28-Gbps transceivers and 580,480 logiccellsmaking the Virtex-7 H580TFPGA the only single-chip solution foraddressing key 2x100G applications andfunctions (Figure 1). Product details arespelled out at http://www.xilinx.com/publications/prod_mktg/Virtex7-Product-Table.pdf.

    Combined with Xilinxs 100-Gbpsgearbox, Ethernet MAC, OTN andInterlaken IP, the Virtex-7 HTdevices provide customers with thekind of system integration they needto meet space, power and cost chal-lenges as they transition to 100-Gbpslow-power optical modules in thenew CFP2 form factor, said EphremWu, senior director of advancedcommunications at Xilinx. The 28-Gbps transceivers are independentfrom the 13.1-Gbps transceivers.Customers can use all available 28-

    Gbps transceivers without having togive up any 13.1-Gbps transceivers.

    The Virtex-7 H580T FPGA is the firstof three heterogeneous 3D devicesXilinx will field in its 28-nm family. TheVirtex-7 H870T, due in the near future,comprises two eight-channel transceiv-er dice alongside three FPGA logic diceon a single device, yielding a total of six-teen 28-Gbps transceivers, seventy-two13.1-Gbps transceivers and 876,160logic cells on one chip. A third heteroge-neous device, the Virtex-7 H290T, placesone eight-channel transceiver die along-side one FPGA logic slice on a singledevice, yielding twenty-four 13.1-Gbpstransceivers, eight 28-Gbps transceiversand 284,000 logic cells on one chip.

    Our 3D SSI technology enablesXilinx to jump ahead of the technologycurve and offer All Programmabledevices that enable the highest levelsof integration, system performance,power reduction, BOM cost reductionand productivity, said Wu. With theVirtex-7 2000T, we used 3D SSI tech-nology to stack four logic slices side-by-side on a silicon interposer to cre-ate a device with 6.8 billion transistorsand 1,954,560 logic cellsdouble thecapacity of the largest competing 28-nm FPGA and well beyond the expect-ed doubling of transistor counts dic-tated by Moores Law. Now, with theVirtex-7 HT devices, we have lever-aged our 3D SSI technology to stack28-Gbps transceiver slices alongside28-nm FPGA slices on a silicon inter-poserall in a single chip.

    The SSI technology, Wu said,enables Xilinx to field a device todaythat will allow customers to offer dis-tinctly compelling value to their cus-

    Third Quarter 2012 Xcell Journal 9

  • tomers of 100-Gbps optics-enabledequipment and allow the wired com-munications industry to acceleratethe development of next-generation400G equipment.

    THE INSATIABLE BANDWIDTH REQUIREMENTAlex Goldhammer, senior product linemanager at Xilinx for Virtex-7 FPGAs,notes that as more and more systemsconnect to the Internet and privatenetworks, the demand spirals for morebandwidth to transfer ever-larger filesand stream higher-quality video andaudio across the globe. To accommo-date this demand, service providerswant higher-bandwidth wired commu-nications equipment at a lower costper bit. The wired communicationssector in particular is currently build-ing equipment to conform to recentlyformalized 100-Gbps communicationsoptical transceiver standardsmostnotably CFP2 optics, OIF CEI-28-VSRand IEEE 802.3ba.

    At the heart of the 100-Gbps infra-structure buildout are optical trans-port network (OTN) transponders andmuxponders, and 100G Ethernetcards. Network companies place theseOTN cards at the center or core of theoptical networkthe fastest sec-tionsto ensure the integrity andproper routing of data as it flies acrossthe globe on fiber-optic cables.

    Goldhammer said companiesalready have equipment with first-gen-eration 100-Gbps OTN transpondercards today, each typically composed

    of a series of one or two ASSPs andone FPGA. These first-generation 100-Gbps OTN cards transmit and receiveinput from fiber optics via a CFP opti-cal module. (The acronym stands for CForm-factor Pluggable.) An ASSP thentakes 10x11.1G OTL 4.10 or CAUI (100-Gbps attachment unit interface) fromthe CFP and performs 100-Gbps for-ward error correction (GFEC), OTU-4framing and 100GE mapping beforesending the data to the FPGA via aCAUI. The FPGA is commonly used totranslate the protocol to the requiredform for the backplane to route thedata to the next point in the networkand, ultimately, its destination.

    The CFP optical modules, relativelybulky and relatively expensive, are the

    main stumbling blocks in these first-generation 100-Gbps OTN transportcards, said Goldhammer. To addressthis problem, the industry recently cre-ated the CFP2 form factor, defining anoptical module for 100-Gbps line cardsthat is half the width (pitch) of CFPwith slightly less depth, in the samepower envelope. The advent of CFP2means equipment companies can swapout existing CFP-based line cards fornew cards that have two CFP2 chan-nels per unit area, thus doubling thebandwidth of each card slot and poten-tially doubling the bandwidth of datacenters (see Figure 2).

    But Goldhammer said that CFP2comes with new technical challenges.CFP2 requires 25- to 28-Gbps trans-

    10 Xcell Journal Third Quarter 2012

    C O V E R S T O R Y

    CFP

    CFP2 CFP2 CFP2 CFP2 CFP2 CFP2 CFP2 CFP2

    CFP CFP CFP

    100G CFPOPTICS

    CFP2OPTICS

    CAUI

    CAUI4

    10x10G

    4x25G

    100G

    4 CFPs400 Gbps60 Watts

    8 CFP2s800 Gbps60 Watts

    Figure 1 Xilinx 3D SSI technology forms the foundation for the Virtex-7 H580T, the worlds first heterogeneous FPGA. It marries 28-nm FPGA logic slices and a dedicated 28-Gbps transceiver die on a silicon interposer.

    Figure 2 The CFP2 form factor makes it possible todouble the bandwidth of 100-Gbps OTN cards. CFP2cards are half the width of a CFP and use half thepower, enabling drastically lower system costs.

  • ceivers, PCB channel modeling usingIBIS-AMI models and high-speed serialmodeling software tools. And eachcard must maintain the same powerbudget as the CFP card it is replacing.Although the move from CFP to CFP2delivers twice the bandwidth per watt,simply doubling the amount of chipson each card to handle the bandwidthis not going to be viable, especiallywhen it comes to power budgets, hesaid. CFP2 requires more sophisticat-ed silicon devices with greater degreesof integration.

    One architecture that equipmentmanufacturers are currently contem-plating for CFP2 cards consists of fivedevices, Goldhammer said: four ASSPsand one FPGA. Each card will have twoCFP2 optical modules, which woulduse a 4x27G OTL 4.4 interface to a gear-box ASSP. The gearbox in its turndemultiplexes the 4x27G OTL 4.4 signalto 10x11.1G OTL 4.10. Then, anotherASSP performs 100-Gbps GFEC, OTU-4

    framing and 100GE mapping, and trans-ports the data on a CAUI interface to theFPGA. Next, each of the CFP2s twochannels sends the data to the oneFPGA on the board, which serves as aCAUI-to-Interlaken bridge for the back-plane to in turn send data to the nextpoint in the network, and ultimately itsdestination (Figure 3).

    That configuration will typicallyrequire four ASSPs and one FPGA, saidGoldhammer. The biggest problemswith that configuration are power, com-plexity and cost. Simply doubling theASSPs will blow the power budgets.

    While a CFP2 card will enable twicethe bandwidth of CFP, each CFP2 mod-ule (with two 100-Gbps CFP2 ports)must maintain the power budget allot-ted to a single-port CFP module so asto stay within the same power budgetsallocated for the entire card.Goldhammer said that the operationalexpense of this equipment is a signifi-cant concern for carriers, as there are

    many of these systems in the carriersplant and they must maintain strictpower caps. They have to stay in thesepower budgets, but they want that 2xincrease in bandwidthso much ofthat burden falls to the semiconductorvendors to also lower power, he said.

    With the new Virtex-7 H580T FPGAand Xilinx IP, Goldhammer said, 100-Gbps OTN line card makers can furthermaximize the value of their CFP2-basedOTN cards by using just one Virtex-7H580T to do the job of what would oth-erwise take all five chips. The Virtex-7H580T FPGA is a groundbreakingdevice timed perfectly to the CFP2 100-Gbps OTN transponder card marketrequirements, said Goldhammer.

    With the Virtex-7 H580T FPGA andXilinx IP, companies can implement anarchitecture for their CFP2-based cardsin which two CFP2 channels on a cardfeed into one Virtex-7 H580T FPGA. TheFPGA integrates gearbox, 100-GbpsGFEC, OTU-4 framing, 100GE mapping

    Third Quarter 2012 Xcell Journal 11

    C O V E R S T O R Y

    CFP2ASSP

    Gearbox

    Backp

    lane In

    terface

    FPGA

    MAC toInterlaken

    Bridge

    100GGFEC

    CAUI CAUI

    Interlaken

    OUT-4Framer

    ASSP

    100GEMapper

    CFP2

    CFP2

    Backp

    lane In

    terface

    CFP2

    ASSPGearbox 100G

    GFECCAUI CAUIOUT-4

    Framer

    ASSP

    100GEMapper

    100GGFEC

    OUT-4Framer

    100GEMapper

    100GGFEC

    Gearbox

    Virtex-7 H580T

    Xilinx Virtex-7 H580T Single-Chip OTN 2x100G Transponder

    ASSP Solution Five-Chip OTN 2x100G Transponder

    OUT-4Framer

    100GEMapper

    Gearbox

    Interlaken

    Interlaken

    Figure 3 The Virtex-7 H580T FPGA and Xilinx IP will enable customers to quickly create single-chip, CFP2-based 100-Gbps OTN transponder cards instead of using five-chip cards.

  • and Interlaken bridging in one device(as again shown Figure 3).

    This is a single-chip solution that isnot only much lower power than a mul-tiple-chip ASSP or ASIC configuration,but faster, more reliable and of coursemuch less expensive to produce, saidGoldhammer. It eliminates the needfor multiple chips and their relatedpower and cooling circuitry. With theVirtex-7 H580T FPGA, we offer cus-tomers more value in terms of integra-tion, BOM cost reduction and improvedsystem performance, without exceed-ing the power cap requirements forCFP2-based OTN transport cards.

    Whats more, Xilinx also has theright IP to enable communicationequipment companies to acceleratetheir design productivity and get theirsingle-chip 100G optics-based cardsto market faster. Through internaldevelopment and a series of strategicacquisitions, Xilinx offers the wholepackage: 100-Gbps gearbox, Ethernet

    MACs, OTN and Interlaken IP. Weveoptimized all these cores for designinginto the 28-nm Virtex-7 FPGA logiccell slices on the device, saidGoldhammer. Xilinx manufacturedthe slices in TSMCs 28-nm high-per-formance, low-power [HPL] technolo-gy, which greatly reduces leakage, forthe optimal mix of high performanceand low power.

    SSI TECHNOLOGY AND 28-GBPS TRANSCEIVERSOne of the biggest challenges of high-speed communications equipmenttoday is ensuring the proper functionof transceivers so that they maintaingood signal integrity. Transceivers areanalog circuits and that means they canbe affected by a number of factorsespecially noise, said Goldhammer.In most mixed-signal devices, trans-ceivers are usually placed in isolationon the edges of devices to shield themfrom the digital circuitry in the middle

    of the device. Digital circuitry tends tobe noisy, so thats why they usuallyisolate it from analog.

    Over the last decade, to increasebandwidth into the gigabits-per-secondrange, the industry has turned to high-speed analog transceivers to quicklysend and receive fast-traveling signals.Traditionally, the rule of thumb hasbeen that the higher the bandwidth ofthe transceiver, the harder it is toensure solid signal integrity.

    Goldhammer said that because theVirtex-7 H580T FPGA is a highly inte-grated one-chip SSI technology solu-tion, CFP2-based line cards builtaround it will achieve much improvedperformance. Moving to 4x25G inter-faces greatly reduces the complexityof routing 10x10G interfaces, he said.Although some are concerned about25G-to-28G transceivers, with XilinxSSI technology, Xilinx is able to sub-stantially reduce this complexity. The28G transceivers, which have sensi-tive analog circuitry, are physicallyseparated from the digital logic. Thisarchitecture ensures good isolationfrom the digital transceiver-rich dice.

    The 28G transceivers are manufac-tured on a high-speed process technol-ogy, Goldhammer said, ensuring theyare best in class. The FPGA slices, bycontrast, are manufactured in 28-nmHPL to ensure the lowest total power.The result, he said, is stellar 28-Gbpstransceiver performance and signalintegrity on the Virtex-7 H580T FPGA.To see these transceivers in action,check out this video on YouTube:http://www.youtube.com/watch?v=FFZVwSjRC4c&feature=player_profilepage.

    Goldhammer said the physical isola-tion afforded by the SSI architectureenabled Xilinx to give the Virtex-7 H580TFPGA eight 28-Gbps transceiverstwicethe number found in the largest devicefielded by the competition.

    Whats more impressive is that theVirtex-7 H580T FPGA is not even thehighest-transceiver device Xilinx willoffer in its 28-nm family. The company

    C O V E R S T O R Y

    12 Xcell Journal Third Quarter 2012

    Video: A Virtex-7 H580T device demonstrates its ability to deliver the eye and jitter characteristics necessary to reach the

    performance required to interface to CFP2 optic modules.

  • will soon roll out the Virtex-7 H870Tdevice, which will have sixteen 28-Gbps transceivers, seventy-two 13.1-Gbps transceivers and 876,160 logiccells. Goldhammer said that if a cus-tomer used all the transceiver capa-bilities of the H580T device, theycould conceivably have a design witha serial connectivity totaling 2.78 ter-abits per second.

    It was impractical and cost-prohibi-tive to place that many 28-Gbps trans-ceivers on a monolithic FPGA, he said.Fortunately, SSI technology enabledus to create a scalable FPGA familytoday that has eight to sixteen 28-Gbpstransceivers. ASSP suppliers and otherFPGA vendors have at most four 28Gtransceivers. This seems indicative of

    the challenges in doing the job monolith-ically in 40- and 28-nm processes.

    The Virtex-7 H870T device is tar-geted at the next generation in wiredcommunicationsthe 400G market,Goldhammer said. The 400G marketis a ways off, and if anything, compa-nies are just starting to look at it intheir labs and the standards bodieshavent gotten to it yet, he said.Whats beautiful is that we alreadyhave a device thats capable of doingit. We can help them speed up devel-opment of 400G, speed up the paceof innovation.

    In addition to the Virtex-7 H580T andH870T FPGAs, Xilinx will also releaseas part of the 28-nm family the Virtex-7H290T. By leveraging Xilinxs 3D SSI

    technology, the H290T will offer twenty-four 13.1-Gbps transceivers, eight 28-Gbps transceivers and 284,000 logiccells. Goldhammer said the Virtex-7H290T is particularly well suited for the2x100G gearbox market.

    First silicon of the Virtex-7 H580TFPGAs is shipping to key customerstoday, with development tool supportavailable in the recently announcedVivado Design Suite. Customers inter-ested in using the Virtex-7 H580T devicecan contact their local Xilinx represen-tative for further pricing and availabilitydetails. You can also find new whitepapers and videos on Xilinxs 28-GbpsSerial Transceiver Technology page:http://www.xilinx.com/products/technology/transceivers/index.htm.

    Third Quarter 2012 Xcell Journal 13

    C O V E R S T O R Y

  • 14 Xcell Journal Third Quarter 2012

    PRODUCT FEATURE

    Xilinx Artix-7 FPGA Ships High-End Value to Low-Cost Market by Mike SantariniPublisher, Xcell JournalXilinx, [email protected]

  • With an eye on helpingits customers offergreater value to theircustomers, Xilinx

    in July announced itis shipping its Artix-7 A100T FPGA,the first of three parts in a feature-richline of low-cost, low-power, All Prog-rammable devices. The larger Artix-7A200T and A350T FPGAs will follow inthe coming months.

    The first Artix-7 device shipment tocustomers represents another majormilestone for Xilinx. It means the com-pany is now shipping FPGAs from allof the families in its 28-nanometer AllProgrammable device rollout. Xilinxearlier released the worlds first 3D ICFPGAs, the Kintex-7 line, and thenbroke new ground with the Zynq-7000 Extensible Processing Platform,which marries an ARM processor andFPGA logic on the same die.

    Ehab Mohsen, product marketingmanager at Xilinx, predicts the Artix-7lineup will prove to be a smash hitwith customers and will set a newstandard for feature-set sophistica-tion, power consumption and ulti-mately value in what the press has tra-ditionally called the low end of theFPGA market. FPGA vendors refer tothis sector as the value-based, high-volume or cost-sensitive market.

    If you look at the feature set of theArtix-7 family, its hard to call it low-end. Its certainly the highest-end andhighest-value FPGA line in that marketto date, said Mohsen. Where thebiggest Spartan-6 FPGA was 150klogic cells, the Artix-7 family starts at100k logic cells and runs all the way upto 350k logic cells. Beyond logic cellcount, he said, these FPGAs have eightto sixteen 6.6-Gbps transceivers, up to18,540 kbits of block RAM and as manyas 1,040 DSP48E1 slices.

    The Artix-7 family offers twice theperformance of the Spartan-6 familyand half the power. Thats a pretty high-end, low-end FPGA, added Maureen

    Third Quarter 2012 Xcell Journal 15

    P R O D U C T F E A T U R E

    Xilinx is now shippingthe first device in itsAll Programmable Artix-7 FPGA series,setting new powerand performance standards for cost-sensitive applications.

  • ing the traditional FPGA label, whichhas long tied FPGA advancementsmerely to a doubling of logic cells every22 months in keeping with MooresLaw. Even the Artix-7 familywhich isin fact Xilinxs smallest 28-nm deviceis loaded with programmable systemfeatures well beyond logic cells.

    Mohsen said that with a blockRAM-to-logic ratio of up to 18.5 Mbitswithin 360k logic cells and 1,040DSP48E1 slices for the same capacity,the Artix-7 rivals the logic density ofcompeting midrange products whilestill benefiting from lower power andlower cost. The DSP resources pro-

    vide up to 1,306 GMACs of DSP per-formancethree times that of thecompetition. This signal-processingclout is useful for imaging and com-munications applications requiringextensive processing capacity.

    In addition, the Artix-7 family sup-ports up to 16 configurable 6.6-Gbpstransceivers that Xilinx has optimizedfor low power, giving the Artix-7 thefastest line rates for the cost-sensitivemarket. These transceivers supportpre-emphasis and continuous-time lin-ear equalization (CTLE) to compen-sate for signal distortion across trans-mission channels. With 211 Gbps of

    Smerdon, strategic marketing manag-er at Xilinx. In fact, you would have togo to our competitors midrange line,a more expensive device family, to finda comparable feature set, and eventhen, the Xilinx Artix-7 family still hasadvantages.

    LEVERAGING HPL AND 7 SERIESSCALABLE ARCHITECTUREReducing power consumption was atop priority for Xilinxs 28-nm genera-tion of devices (see cover story, XcellJournal issue 76). In fact, Xilinxworked very closely with TSMC to for-mulate TSMCs HPL (high-perform-

    ance, low-power) 28-nm silicon manu-facturing process to a sweet spot forFPGA production. As a result, theentire Xilinx 28-nm line halved totalpower consumption compared withthe previous generation of FPGAs.

    Across all product families, cus-tomers had been asking for lowerpower consumption, but especially soin the cost-sensitive market, saidMohsen. These devices go into a widerange of applications where lowerpower is needed for multiple reasonsranging from longer battery life tolower energy costs, better power dissi-pation, lower BOM (not requiring

    16 Xcell Journal Third Quarter 2012

    P R O D U C T F E A T U R E

    extra shielding and power circuitry)and smaller end-product form factor.

    As such, Mohsen said, the Artix-7family line takes full advantage ofthe 50 percent power savings whiledelivering the needed performancefor its target markets. The 50 per-cent power reduction provides head-room for additional performance,logic density, I/O bandwidth and sig-nal processing, said Mohsen. Thatgives designers the flexibility toeither lower power by 50 percent ortake advantage of greater perform-ance and capacity at previous powerbudgets, he said.

    Mohsen noted that all of Xilinxs 28-nm All Programmable devices use thesame logic architecture. The Artix-7FPGAs slice architecture is basedclosely on that of the Xilinx Virtex-6and Spartan-6 FPGA families, usingthe same LUT structure, control logicand outputs. This scalable architec-ture provides users with an easymigration path when moving theirdesigns between Spartan-6 and Artix-7 FPGAs, said Mohsen.

    MOORE THAN LOGIC CELLSThe Artix-7 is a prime example of howall Xilinx devices are quickly outgrow-

    AD

    C

    128

    Ch

    ann

    els

    Deserializer

    RX Beamformer Control

    DataHigh-Speed I/O

    ControlHigh-Speed I/O

    Artix-7 FPGARX 128-Channel Beamformer

    128-Channel Transducer

    547 Pins (LVDS)

    46 Pins

    Figure 1 The Artix-7 FPGAs DSP performance and I/O count can be leveraged for 128-channel portable ultrasound equipment.

  • P R O D U C T F E A T U R E

    Third Quarter 2012 Xcell Journal 17

    total throughput, the Artix-7 is a low-cost alternative for bandwidth-sensi-tive applications that would other-wise require midrange solutions,said Mohsen.

    Whats more, Mohsen said thatbecause memory read/write band-width can affect overall system per-formance, the Artix-7 family offersDDR3 data rates of up to 1,066 Mbps,the highest in the industry for FPGAsin its class. The memory solution con-sists of a flexible controller and phys-ical layer (PHY) for interfacingdesigns and AMBA AXI4 slave inter-faces to DDR3 and DDR2 SDRAMdevices. The controller supports anarray of external memories for flexi-ble system design, such as for stream-lined access to video and data storage.

    As such, the Artix-7 A100T devicesare ideally suited for a number ofapplications that will allow cus-tomers to innovate, offer a rich new

    set of features to their customers andexpand their markets. To illustrate,Mohsen described three marketsportable medical equipment, handheldradios and small cellular basesta-tionsthat will greatly benefit fromthe Artix-7 FPGA familys feature set.

    PREMIUM VALUE FOR PORTABLE MEDICAL Mohsen said that companies creatingdevices for the medical electronicsfield are eager to expand their productportfolios beyond million-dollar, large-form-factor, hospital-class equipment.They are striving to also offer lower-cost, portable electronic equipmentlines to smaller doctors offices, hos-pital departments and even individ-ual practitioners.

    Portable ultrasound equipment is aprime example of a market that cangreatly benefit from the feature set ofthe Artix-7 FPGAs, said Mohsen.

    Instead of having to wheel a patientinto a special room to be tested with avery large ultrasound system, theseportable systems are much smaller andcan be on a cart or even handheld andbrought to the patient. Paramedics canuse them in ambulances, and doctorswho still perform house calls can usethem, too. Whats amazing is that withthe Artix-7 FPGA line, companies cangive their next generation of portableultrasound equipment many of theadvanced features found previouslyonly in high-end systems.

    Thats not to say that these newclasses of equipment will replacethose bigger systems, Mohsen added,because those systems also contin-ue to add incredible new featuresthanks in part to the rich feature setof our larger Kintex-7 and Virtex-7FPGA families.

    Mohsen said that because the Artix-7family offers 65 percent lower static

    Low-NoiseAmplifierSection

    RFTuning

    SAWFilter

    A/DData

    Formatting I/O

    HumanInterface

    Digital WidebandFront End

    Software-DefinedRadio Processing

    Engine

    Software Control Processing Engine

    EncryptionProcessing

    Engine

    TxRxSwitch Artix-7 FPGA

    300-MHz to 2-GHzAntenna

    Traditional RF Section

    Figure 2 The system integration and DSP processing capacity in the Artix-7 FPGA is critical for software-radio design.

    The new device family supports up to 16 configurable 6.6-Gbps transceivers that Xilinx

    has optimized for low power, giving the Artix-7 FPGAsthe fastest line rates for the cost-sensitive market.

  • challenging to support all these wave-forms, but they must also be complete-ly secure and able to operate in ruggedconditions where radio frequency isdifficult. So the military is always look-ing for better, lighter systems that canrun longer and more securely.

    All these requirements make theArtix-7 family ideal for SDR systems.Indeed, the new device is particularlywell suited for SDR modem manage-

    ment. Mohsen explained that themodem in an SDR system performsbaseband signal preprocessing and RFsignal improvements, which requireenormous amounts of parallel process-ing and reconfigurability. FPGAs are anatural fit for this application and mostsystems today do indeed use FPGAs,but the Artix-7 offers a vast perform-ance improvement, he said. With up to1,040 DSP slices, the Artix-7 can provideup to 1,306 GMACs of DSP perform-ancethree times the performance ofcompeting FPGAs and far greater thanany standalone DSP or GPU.

    and 50 percent lower dynamic powerconsumption than Xilinxs Spartan-6devices, while delivering up to sixteen6.6-Gbps transceivers, designers ofportable ultrasound equipment canachieve the highest image quality formeeting JESD204B high-speed serialinterface standards. At the same time,they can extend battery life and meetsafety standards while implementing a128-channel beamformer at 41 percent

    less power than alternative FPGAimplementations.

    Figure 1 shows an example of theAll Programmable advantage Artix-7FPGA affords to the portable ultra-sound market.

    REDUCING BOM, WEIGHT AND COST FOR MILITARY SDRAnother example of a market that willgreatly benefit from the Artix-7 FPGAsrich feature set is military software-defined radio (SDR), Mohsen said.Over the last decade, the U.S. militaryhas been diligently constructing a

    18 Xcell Journal Third Quarter 2012

    P R O D U C T F E A T U R E

    highly sophisticated worldwide com-munications network called the GlobalInformation Grid (see cover story,Xcell Journal issue 69) that allows U.S.forces and allies to communicate glob-ally and run intelligence and militaryoperations more precisely. WhileXilinxs larger Virtex-7 and Kintex-7FPGAs play an increasing role in thelarger communications equipment inthe GIGfrom networking equipment

    to aircraft and UAVsthe military isseeking better ways to connect all itsassets, even individual soldiers, to thegrid more efficiently.

    Many of the portable SDR systemsin deployment today suffer fromincreased power and short batterylife, said Mohsen. They are also toobig and heavy as well as expensive.They are fairly complex, too. Thesesystems require extensive DSP pro-cessing capabilities to support a vari-ety of radio protocols or waveformsfor voice, data and video communica-tions across the globe. Not only is it

    Tran

    scei

    vers

    Tran

    scei

    vers

    AD

    CD

    AC

    Modem

    Ethernet Switch

    Control Plane Processor

    Traffic Management,Packet Processing

    TimeSynchronization

    Channel 1 RadioEthernet

    Channel 2

    Artix-7 FPGA

    Figure 3 Designers can integrate multichip functionality for microwave mobile backhaul using the Artix-7 FPGA.

  • P R O D U C T F E A T U R E

    Third Quarter 2012 Xcell Journal 19

    Offering 101,440 logic cells, theArtix-7 is available in a 15 x 15-mmpackage, which makes it the indus-trys smallest device at that capacitylevel. The mix of increased capacityand smaller size allows design teamsto create smaller and lighter systems.

    Figure 2 shows an example of theAll Programmable advantage theArtix-7 FPGA affords in the portableSDR systems market.

    WIRELESS BACKHAUL BUILDOUTWireless backhaul is another exampleof an application that will greatly ben-efit from the Artix-7 family. Mohsensaid that the vast majority of thegrowth in cellular traffic today isoccurring in urban and suburbanareas. To address this trend, opera-tors plan to boost the capacity of theirnetworks by deploying small cellbasestations on lamp posts, trafficlights and even the walls of adjoiningbuildings. They need to connect allthese small cells in clusters and to thenearest aggregation points, so opera-tors must deploy low-power, low-costbackhaul units whose microwaveradio links can span up to tens ofmiles, said Mohsen.

    Whereas a traditional mobilebackhaul unit typically supports sev-eral Ethernet links, a wirelessmobile backhaul forwards the trafficbetween Ethernet links and radiochannels using an internal Ethernetswitch. Both ends of the unitrequire high-speed transceivers, sothats where the Artix-7 FPGAmakes sense as an ideal low-costalternative to bigger, more expen-sive devices, said Mohsen. TheArtix-7 family delivers maximalbandwidth with its sixteen 6.6-Gbpstransceivers for both Ethernet andRF links using Jedec JESD204B con-nectivity to data converters.

    Artix-7 devices also allow wire-less-equipment providers to achievegreater system integration andreduce BOM costs. Half of the back-haul unit contains packet-process-

    ing, traffic-management and timing-synchronization functions, Mohsenexplained. Meanwhile, the other halfof the unit supports modem channelsfor signal processing. The key require-ments for the modem are adequatehigh-performance DSP processingand high-speed transceivers to inter-connect with the data converters forhigh data throughput.

    The Artix-7 family fits nicely forthese functions because it has the rightmix of logic density, intellectual-prop-erty support and DSP resources, saidMohsen. The Artix-7 A200T, whichwill follow the Artix-7 A100T releaselater this year, has 215,360 logic cells,allowing wireless-equipment compa-nies to create a backhaul solution thatintegrates all the needed packet-pro-cessing, traffic-management, timingand synchronization blocks as well asa single high-speed radio channel intoone chip. Likewise, the third memberof the family, the Artix-7 A350T device,will allow vendors of wireless-networkequipment to integrate two high-speedradio channels on a single chip.

    Mohsen also noted that equipmentvendors make a concerted effort toensure the units have a low visualimpact and dont appear to clutter theurban and suburban landscapes. Thisdesign requirement typically meansthe units must be very small, whichcan make it challenging for designersto ensure that each unit effectivelydissipates the heat it generates. TheArtix-7 family allows equipment ven-dors to keep power in check whilefurther reducing the overall unit sizeof their systems.

    Figure 3 shows an example of the AllProgrammable advantage the Artix-7FPGA affords to the small cell wirelessbackhaul systems.

    The first Artix-7 A100T FPGAs areavailable today, with production qualifi-cation scheduled for the first quarter of2013. Designers can begin their Artix-7family designs today using Xilinx designtools. For more information, please visitwww.xilinx.com/artix.

  • The last decade has seen the emer-gence of a new global market forcloud computing. This new para-digm, which delivers computing as a serv-ice over the Internet, represents a funda-mental shift in the way computers areused. The cloud offers enterprises themeans to shift tasks from their local ITinfrastructure into remote, optimized com-puting clusters and thus into the hands ofthe operator providing the cloud service.For consumers, the cloud delivers storage,video, messaging, social networking, gam-ing, Web search and many other servicescoherently across diverse computingdevices anywhere in the world.

    20 Xcell Journal Third Quarter 2012

    XPERT OPINION

    FPGAs Head for the Cloud

    by Michaela BlottSenior Research EngineerXilinx, [email protected]

    Tom EnglishResearch ScientistXilinx, [email protected]

    Emilio BilliCTOEB [email protected]

  • Third Quarter 2012 Xcell Journal 21

    At the heart of the cloud computingrevolution is the data center, whichintegrates the compute power, storageand interconnect required to service aglobal user base. Data centers areexperiencing phenomenal growth,which is translating directly into mas-sive investment. According to SynergyResearch Group, data center networkinfrastructure sales grew 22 percent in2010 alone. Companies such as Googleand Facebook are at the forefront ofthe cloud computing revolution andcorrectly anticipated the need forcolossal data center infrastructure toaddress a massive global user base.

    THE FPGA ADVANTAGEFPGA technology can bring numerousadvantages for computing, storage and

    networking as data centers strive tobecome faster, larger, cheaper and

    greener. Within the networkinginfrastructure, FPGAs can

    address the ever-increasingthroughput and process-

    ing requirements whileremaining highly power-efficient. Furthermore,the inherent flexibilityof the FPGA is a cru-cial benefit in thislandscape, given thecontinual arrival ofnew communicationsprotocols.

    On a basic level,FPGAs offer the right

    physical interfaces andprovide the required

    support and bandwidthfor high-speed memory

    interfaces. They offer suffi-cient device complexity to

    implement packet-processingpipelines greater than 100G.

    Their flexibility allows forthe implementation of per-

    fectly optimized customcircuits that operate at

    maximum efficiency.Major improve-

    ments in high-levelsynthesis, as for

    example offered withAutoESL, are helping,

    to overcome the greatestdisadvantage of FPGAs in this spacenamely, the low abstraction level of theFPGA programming flow. Finally, abasic FPGA IP portfolio exists that cov-ers the fundamental networking func-tions. However, more data-center-spe-cific solutions around data centerbridging (DCB), VXLAN, virtual switch-ing and other specialized technologieshave yet to be developed.

    Within servers, FPGAs are an attrac-tive implementation on network inter-face cards (NICs). Although a plethora

    of controllers from Intel, Broadcom andothers are available for implementingstandard adapters for Ethernet andFibre Channel, FPGAs are ideal whenadditional processing functions are alsointegrated on the data path between thenetwork and the CPU. Examples ofadditional processing functions includeencryption, high-frequency trading andTCP offload engines (TOE).

    FPGAs are also attractive wheneither the network interface or the pro-cessing function needs to be cus-tomized in any way. In these scenariosthe FPGA offers high-speed serialtransceivers, memory interfaces, PCIe

    endpoints and a sufficiently large fabricto accommodate high-throughput datastream processing along with the basicIP blocks. A more sophisticated IP-and-solutions portfolio addressing the spe-cific needs of this market could makeFPGAs more competitive in an environ-ment where the end user is accustomedto deploying fully integrated platforms.For example, a more sophisticatedTOE IP block capable of thousands ofsimultaneous sessions (and accompa-nied by a full Linux driver and TCP/IPstack) would open up a range of newFPGA data center applications.

    A special case of such a networkadapter is a QuickPath Interconnect(QPI) network adapter. QPI is Intelsproprietary high-bandwidth, low-laten-cy CPU interconnect. Xilinx has devel-oped IP that allows FPGAs to directlyattach to the CPU via QPI, significantlyreducing latency on the host interfaceas well as providing higher bandwidthbetween CPU and network interface.Such a network adapter could be high-ly attractive within data centers, sincelatency is rapidly becoming the keyperformance bottleneck in applica-tions that are already heavily paral-lelized. A QPI NIC has as much as fourtimes the bidirectional peak bandwidthto the host as a typical PCIe Gen2 serv-er NIC. QPIs higher transfer rates,direct FPGA-to-CPU transfers andshorter headers make it possible totransfer small messages with much

    X P E R T O P I N I O N

    Robust growth in the data center opens new opportunities for existing and advancedFPGA devices.

  • lower latency than in PCIe. As latencybecomes the key performance bottle-neck in applications that are alreadyheavily parallelized, an ultralow-laten-cy, high-bandwidth QPI NIC becomes avery attractive proposition.

    ON THE MOTHERBOARDWe see further opportunities forFPGAs on the motherboard itself.Some common applications in thedata center such as in-memorycaching are currently implemented onx86-based servers, despite the fact thatthe x86 is not particularly well-suited tothese types of applications. FPGAscould offer a dramatic improvement inperformance, power and latency. Thetrend is to move compute from a dis-tributed number of cores to a morepipelined style of data processing. Thisapproach is beneficial for FPGA archi-tectures. The volume of the silicon fitswell with the FPGA opportunities aswell. However, the low abstraction levelof FPGA programming tools will haveto be addressed in order to compete

    with C compilers on x86-based serversfor end users such as Facebook.

    On a more speculative note, thereis a subclass of servers in data cen-ters that are casually referred to aswimpy nodes. Xilinx already hasmany of the key technologies neededto accommodate the new server andSoC architectures emerging for thisspace, such as ARM processor cores,PCIe interface blocks, memory inter-faces and programmable logic. The

    current Zynq-7000 ExtensibleProcessing Platform is not yetequipped to compete with ARM-basedserver SoCs, such as Applied MicrosX-Gene, for this market, but using thetechnology blocks already available, afuture Zynq device could conceivablypower a data center server.

    Finally, increasing processingdemands, especially at the high-per-formance computing end of the spec-trum, can greatly benefit from hybridcomputing solutions that combineboth FPGAs and CPUs. Existing solu-tions from Convey and Maxeler show-case the tremendous performance andpower benefits of such an approach.For example, Maxelers implementa-tion of a credit derivative pricing sys-tem for a financial customer was up to37 times faster than software on anIntel E5430 server and reduced energyconsumption by more than 97 percent.The QPI technology has the potentialto increase these advantages even fur-ther, as hardware accelerators canbecome more tightly coupled with the

    CPU over a low-latency, high-band-width, cache-coherent interface.

    DATA STORAGE, WAREHOUSINGAND ANALYTICSSimilar to the server and networkingscenarios, existing FPGAs can providecompetitive implementations withinstorage, data warehousing and dataanalytics in three very different ways.First, recent trends integrate flash-based storage closer with the host. A

    new crop of PCIe SSD controllersallows for direct attachment of flash toPCIe. FPGAs already compete in thisspace, offering the key functionalityand the basic IP building blocks. A fur-ther key advantage is the FPGAs flexi-bility. Currently, the flash-based inter-face lacks an industry standard,although new standardization effortssuch as the Open NAND FlashInterface (ONFi) are on their way.

    Then too, FPGAs can assist in accel-eration of query processing, handlingfiltering, decompression and executionof some of the relational operators.This capability could be vital in futurestorage appliances that need to pro-vide more intelligence in order to copewith throughput bottlenecks. Finally,in so-called super-storage applica-tions, FPGAs can play a major role,accelerating file system operationsthat would otherwise consume consid-erable CPU cycles. These currently runon separate servers co-located with thestorage servers in the storage-area net-work. FPGA acceleration would allow

    for a reduction of the control-to-stor-age server ratio, increasing availablestorage and performance.

    Existing FPGA technology can serv-ice these requirements. In particular,the embedded ARM processor in thecurrent Zynq architecture can alreadytackle OS functionality. Again, a moresophisticated IP-and-solutions portfo-lio could of course further increase thepotential of FPGAs and help acceler-ate new design developments.

    22 Xcell Journal Third Quarter 2012

    X P E R T O P I N I O N

    Hybrid Computing

    Desktop Virtualization

    OPST

    Optical Backhaul

    Custom NICs

    Smart Analytics

    Super Storage

    Smart NICs

    Flash Controllers

    Group 1 Group 2 Group 3

    QPI NIC

    QPI I/O & Memory Expansion

    Application-specific Servers

    Cloud RAN

    Smart Networking

    Wimpy Nodes

    Optical Interconnects

    Figure 1 Data center opportunities in three broad categories await FPGA implementation.

  • THREE KINDS OF OPPORTUNITIESAs shown in Figure 1, we categorizethese varied opportunities in threegroups. The first group contains applica-tions that require no additional develop-ment effort. Silicon features, IP portfo-lio, related software and programmabili-ty are adequate to address these mar-kets, some of which already employFPGAs. For example, Intune Networksuses FPGAs to implement optical packetswitch and transport (OPST) solutions,claiming up to 300 percent cost reduc-tion in power consumption. Maxeler andConvey Computers offer hybrid comput-ing solutions on an FPGA basis. Smartanalytics are based on FPGAs withinIBM/Netezzas products. BlueArc

    demonstrates how super storage can besignificantly improved through FPGAs,while FusionIO uses FPGAs for flashcontrollers. Napatech and Nallatech areamong the many vendors offering FPGA-based smart or custom NICs.

    The second group addresses oppor-tunities that require some advanceddevelopment effort. Most notable inthis category are the QPI-relatedopportunities: QPI NIC and memoryand I/O expansion.

    The final group includes longer-termopportunities that will involve signifi-cant research-and-development efforton silicon features or programming envi-ronments. For example, to addresswimpy nodes through an FPGA devicewould take a new generation of Zynqdevices that offers more integration of64-bit ARM processors, as well as widerand faster memory interfaces.

    Application-specific servers and C-RAN, or Cloud RAN, lie on the borderbetween groups 2 and 3. Both of themnecessitate advanced developmentefforts to provide necessary infrastruc-

    ture and platforms, and would benefitfrom new programming tools with ahigher abstraction level. However, thefact that a traditional RTL-baseddesign flow might be adequate to someof the end users moves these opportu-nities to the second group.

    HIGHLY DYNAMIC MARKETData centers are a highly dynamicmarket in which interface standardsand protocols are changing rapidly.Particularly within networkingequipment, but also for computefunctions, this environment offersgreat opportunities for the deploy-ment of FPGA-based high-speed pro-cessing systems. These types of

    applications are well suited forFPGA implementation with currentXilinx technology, such as theKintex and Virtex families ofdevices, leveraging high-speed serialI/O and corresponding IP.

    The opportunities expand with anadditional focus on the needs of thedata center market. In particular,memory access (access bandwidthand density), as well as hash andsearch function support in the form ofsilicon or IP, are extremely importantfeatures within data centers, giventhat most applications center aroundlarge amounts data that must besearched and sorted. Future FPGAdevices based on Xilinxs stacked-sili-con interconnect (SSI) technologycan potentially play a key role here.

    Finally, we believe that these oppor-tunitiesin particular those within theserver spacewill be contingent onimprovements in FPGA programmabil-ity. FPGA programming must beabstracted to a level that is acceptableto programmers of a data center.

    Third Quarter 2012 Xcell Journal 23

    X P E R T O P I N I O N

    The opportunit ies for FPGAs in this environmentin particular those within the server spacewil l be contingent onimprovements in FPGA programmabil ity.

    GetonTarget

    Is your marketingmessage reachingthe right people?

    Hit your target by advertising your product or service in the Xilinx

    Xcell Journal, youll reach thousands of qualified engineers, designers, and

    engineering managers worldwide.

    The Xilinx Xcell Journal is an award-winning publication, dedicated specifically to helping programmable

    logic users and it works.

    We offer affordable advertising rates and a variety

    of advertisement sizes to meet any budget!

    Call today: (800) 493-5551 or e-mail us at

    [email protected]

  • 24 Xcell Journal Third Quarter 2012

    XCELLENCE IN AEROSPACE & DEFENSE

    The Xilinx Virtex-5 proves able to endure, operate and survive cryogenic temperatures. No wonder NASA is onboard.

    FPGA-Based Instrumentation Withstands the Chill of Deep Space

  • Third Quarter 2012 Xcell Journal 25

    Current and future NASA robotic-flightmissions to outer planets and asteroidsrequire avionics systems, computers,controllers and data-processing unitscapable of enduring the extreme low-temperature environments of deepspace and lunar and Martian surfaces.With recent technological advances inFPGAs, it has become possible toarchitect a complete system-on-a-chip(SoC) using a single FPGA. LargeFPGAs that are radiation-hardened bydesign (RHBD) have increased thenumber of gates per square inch,reduced power consumption per gateand included microprocessors, softand hard IP, arithmetic modules, siz-able onboard memory and analog-to-digital converters.

    B&A Engineering (BAENG) conduct-ed studies with the Xilinx Virtex-5mixed-signal RHBD FPGA to addressNASAs need for protected, reliabledata-acquisition controllers and com-puter electronics able to operate incryogenic temperatures. This RHBDFPGA will be the workhorse of futureNASA computer and data-handling sys-tems targeted for outer-planet landing,orbiting and sample-retrieval missions.

    To conduct the experiment, BAENGdesigned and built a test board basedon a commercial Xilinx XC5VLX30FPGA and support circuitry (resistors,capacitors and oscillator), as shown inFigure 1. Whats remarkable is that wefound that the chip works at tempera-tures well below spec for a commercialpart and even well below spec for aspace-grade Xilinx FPGA.

    The FPGA included circuits usinginternal phase-locked loops (PLLs),as well as a ring oscillator and a num-ber of basic circuits built from LUTs.During the test, we monitored bothcircuit functionality and FPGApower consumption.

    We disabled all the regulators,switches, resets and configuration-mode pins on the board with theexception of the 100-MHz oscillator.We simulated FPGA and externalflash memory voltages and currents,switches, resets and configuration-mode pins and monitored themusing external test equipment andpower supplies.

    The FPGA was reconfigured atevery 10-degree decrement of temper-ature, from room temperature down to-150C, from both the JTAG interface(Xilinx IMPACT) and flash (XCF08).We erased and reprogrammed onboardflash during each temperature meas-urement. Additionally, using XilinxsChipSope Pro and the internal FPGAsystem monitor, we monitored Xilinxdie temperature, along with 2.5V auxil-iary and 1.0V internal voltages. Thisdata provided additional reference

    X C E L L E N C E I N A E R O S PA C E & D E F E N S E

    Figure 1 The Xilinx Virtex-5, XC5VLX30 test board

    Cby Alireza BakhshiPrincipalB&A Engineering Systems [email protected]

  • points to the chamber and test equip-ment monitoring. The internal systemmonitor can be accessed preconfigura-tion through the JTAG interface.

    TEST RESULTSCryogenic testing was done using liq-uid nitrogen. We started the testing atroom temperature of 24C, having pro-grammed the test chamber to proceedin steps to 10C, 0C, -10C, all the waydown to -150C. Figure 2 charts Xilinxvoltage currents vs. the temperature.

    In the course of our testing, wemade some interesting observations.For starters, we found that the2.5V/2.5V auxiliary voltage currentsremained stable over the test temper-ature range. Both of these voltagesare used for system monitor andJTAG communication. All of the I/Osare tied to 3.3V.

    Second, the internal 1.0V currentwas significantly reduced from 140 mAat +20C to 81 mA at -150C. This wasno surprise, since a reduction in poweris expected at low temperatures.

    Finally, we found that the flashmemorys 1.8V current remainedalmost zero down to -50C and thenchanged to 10 mA from -50 to -90C.It dropped to zero from the -100 to -120C temperature range, and wentback to 10 mA from -130 to -150 C.We are not sure what to make of thisfinding; it could be due to test meas-urement errors.

    Importantly, both clocks remainedstable over the test temperature range(see Figure 3). We used a PLL to bothdivide and multiply the masteronboard 100-MHz oscillator so as togenerate the 50- and 150-MHz clocks.

    XILINX LOGICAll of the logic circuits remainedfunctional through the test tempera-ture range. However, the same wasnot true of the flash memory. Duringtesting, we found that the onboardflash memory became unstable at -110C. Starting at that temperature,it took couple of attempts to pro-

    gram the flash memory from JTAG.Nevertheless we were able to do soafter two tries. In addition, theonboard flash memory became non-functional at -140C. We werent ableto program the flash from JTAGthereafter.

    The internal 1.0V current increasedsignificantly (384 mA) when we triedto configure the FPGA from nonfunc-

    tional flash memory. This result makessense, since we are not sure what thestate of FPGA I/Os would be once theFPGA is configured from nonfunction-al flash memory. The 1.0V internal cur-rent became normal once the FPGAwas configured through the JTAG port.Meanwhile, JTAG communicationthrough IMPACT was functionalthroughout the test temperature range.

    26 Xcell Journal Third Quarter 2012

    X C E L L E N C E I N A E R O S PA C E & D E F E N S E

    160

    140

    120

    100

    80

    60

    40

    20

    -200C -150C -100C -50C 0C 50C

    XILINX I/O3V3 Voltagecurrent (mA)

    Flash 1.8Vcurrent (mA)

    XILINX Internal1.0V current (mA)

    XILINX 2.5V/2.5VAuxiliary current(mA)

    160

    140

    120

    100

    80

    60

    40

    20

    -200C -150C -100C -50C 0C 50C

    XILINX 50 MHz

    XILINX 150 MHz

    Figure 2 Xilinx FPGA voltage currents vs. temperature

    Figure 3 Xilinx 50/150-MHz clock vs. temperature

  • Additionally, ChipScope Pro wasfunctional throughout the test tem-perature range and with it we wereable to monitor the die temperature,1.0V internal and 2.5V auxiliary volt-age. They all tracked very closelywith the external LabVIEW and tem-perature sensor measurements. Thisis important since it tells us that theXilinx system monitor including theinternal A/D was functional through-out the test.

    At the end, we brought the temper-ature back to ambient, in increments,and left the unit under test to stabilizefor 48 hours. In performing our end-to-end test, we were able to configurethe Xilinx FPGA from JTAG onlyonce. JTAG communication including

    the ChipScope Pro stopped thereafter,even though we were able to initializethe JTAG chain through IMPACT. Wewerent able to program the flash orconfigure the FPGA from either JTAGor flash memory. After removing theflash and rewiring the TDI/TDO chain,we were able to configure the FPGAthru the JTAG once again. This isimportant, since it shows that theFPGA wasnt damaged.

    It is important to note that the JTAGchain is serial. The TDO of theIMPACT is connected to TDI of theflash memory, and then the TDO of theflash memory is linked to the TDI ofthe Xilinx FPGA. This means that theflash memory is sitting between theIMPACT and the Xilinx FPGA.

    PROMISING RESULTSXilinx FPGA testing using commercialparts showed some promising results asfar as reconfiguration at very low tem-peratures. Reconfiguration through theJTAG interface continued to work downto -150C. However, due to what appearsto be a failure of the flash memory chipat -130C, we were not able to reconfig-ure the Virtex-5 chip from flash memorybelow that temperature.

    Internal current steadily declined onthe internal 1.0V as expected (and inter-nal power consumption) and ended at66 percent of where it was at room tem-perature. Basic circuits, ring oscillator,shift registers and PLL outputs contin-ued to function normally with hardlyany detectable changes.

    X C E L L E N C E I N A E R O S PA C E & D E F E N S E

    Third Quarter 2012 Xcell Journal 27

    Versatile FPGA PlatformKINTEX-7

    www.techway.eu

    A cost-effective solution for intensive calculations

    and high speedcommunications.

    FMC-SFP/SFP+

    PCI-e 4x Gen2Kintex-7 SeriesSDK for Windows and LinuxReady to go 10 GbE on FMC slot!

  • 28 Xcell Journal Third Quarter 2012

    The Xilinx Spartan-3A FPGA augmentscontrol algorithm implementation for a multiterminal DRI power inverter.

    XCELLENCE IN GREEN TECHNOLOGY

    by Phillip SouthardSenior Design EngineerPDS Consulting, [email protected]

    Using Spartan Technology to Support Development of Green Energy

  • Third Quarter 2012 Xcell Journal 29

    Product development for industrialapplications involves extensiveresearch and preparation in an environ-ment of rolling deadlines and ever-evolving product specifications. Whiletime-to-market for this sector may notbe as short as it is for consumer elec-tronics, products must ship quickly andwith as many essential functions, fea-tures and potential hooks for the nextgeneration as possible. Companies vieto be industry leaders in their respectivecompetitive arenasespecially in newmarkets such as green power, which intheir infancy and without defined lead-ers require pioneers to design, developand deliver new products. Successdepends not only on an inspired, dedi-cated team of engineers, advanced com-puting technology and new materials,but also on angel investors or govern-ment agencies to provide grants forpromising approaches to improvedenergy generation, distribution, moni-toring, metering and consumption.

    In the fall of 2011, engineers fromPrinceton Power Systems (PPS), a NewJersey-based manufacturer of advancedpower-conversion products and alterna-tive-energy systems, demonstrated theirlatest green power product. Thisdemand response inverter (DRI) wasthe result of a three-year collaborationbetween PPS, the United StatesDepartment of Energy and SandiaNational Laboratories Solar EnergyGrid Integration Systems (SEGIS).

    The resulting multiterminal DRI(Figure 1) is uniquely flexible to bemore reliable, more efficient and morecost-effective than currently availableinverters. Equipped with multiple ACand DC terminals, the DRI can routepower to the grid, a microgrid, DCenergy storage or dynamic loads.Programmable power curves andcharge profiles enhance control forgenerators, loads and batteries, ensur-ing greater efficiency. And the use ofadvanced high-capacity long-lifespanswitches maximizes reliability.

    Princeton Power Systems show-cased features of the DRI thatimprove electrical-grid interconnectiv-ity and efficiency, enhance the per-formance of renewable energy sys-tems and allow for better integration

    of electric vehicles and distributedpower generation. The DRI was part ofthe companys An Island in the Sunmicrogrid demonstration (Figure 2),which detailed key advancements inclean technology and manufacturing,including a 200-kilowatt solar arrayand lithium-ion battery system.

    A microgrid can operate independ-ently of a major utility grid to supplyreliable, low-carbon-emission energy.PPS DRI is compatible with AC gener-ators such as diesel or gas, and withphotovoltaic (PV) or wind inputs. Asmall community using a DRI is lessdependent on the grid and can reduceits carbon footprint and utility costs.The DRI can also provide grid servic-es, PV with storage and charging forelectric vehicles.

    XILINX SPARTAN TECHNOLOGYTo meet the demands of industrialproduct design, companies likePrinceton Power Systems leverageflexible development vehicles such asXilinxs Targeted Design Platforms(TDPs), with their rich ecosystem of

    design services support. In this case,however, the engineering team facedan initial challenge of determining howto expand the inputs and outputs of theDRI systems digital signal processor,and how to implement control andcommunication interfaces that func-tioned in parallel. PDS Consulting

    X C E L L E N C E I N G R E E N T E C H N O L O G Y

    Figure 1 The flexibility in Princeton Power Systems

    demand response inverter comes from FPGAs.

    P

  • offers design services in programma-ble digital systems for a variety ofmarkets including aerospace anddefense, broadcast, industrial, scien-tific and medical. The firm supportedwork on the project as a member ofXilinxs Alliance Program.

    The PDS Consulting team providedon-site, hands-on system debug andPCB bring-up, as well as off-site RTLand IP design services. We alsoadvised Princeton Power Systemsdevelopers on how to implement thesystem control interface for theirgreen power control algorithm. In theend, engineers chose a Xilinx

    Spartan XC3SD3400A FPGA marriedto a DSP as a prime system controlcomponent (Figure 3).

    The Spartan-3A FPGA, with itsextensive SelectIO capabilities,offered flexibility in implementation,particularly for trigger signals andADC input channels. XilinxsSpartan-3A family is a superior alter-native to mask-programmed ASICsbecause these FPGAs permit designupgrades in the field and avoid thehigh initial cost, lengthy develop-ment cycles and inherent inflexibilityof conventional ASICs. The integrat-ed technology afforded by theSpartan-3A made the implementation

    of Princeton Power Systems patentedcontrol algorithm for green powerconversion a possibility.

    It took more than 300 I/Os toimplement the DRI system interface,which enabled access to 8 Mbytes offlash, a 256-Mbit SDRAM andUSB/RS-232 at >900 kbits/second. Inaddition, the team also utilized thegenerous amount of fast, distributed32-bit dual-port RAM inherent to theSpartan architecture. The config-urable logic block (CLB) lookuptables used as dual-port RAMsenabled the efficient local storage ofnew energy waveform samples thatthe ADCs supplied, while the DSPread the previous samples and aPicoBlaze embedded processoranalyzed new values from the secondport concurrently.

    THE BENEFITS OF XILINX FPGASPrinceton Power Systems algorithmsrequired extensive calculations thatcan only be accomplished by floating-point DSPs, which traditionally donot have the same features as FPGAs.Some of the features of Xilinx FPGAsthat particularly suited the PPS proj-ect included multivoltage, multistan-dard SelectIO I/O pins; configurablelogic blocks; block RAM; and memo-

    ry interfaces that can implement alarge number of programmable trig-ger signals. These signals generateand execute pulse trains that triggerpower electronic switches like IGBTsand control a large number of fastADC channels to read important sys-tem measurements on every pulse orcustom high-speed serial interfaces.

    FPGAs not only allowed PrincetonPower Systems to design and imple-ment custom peripherals thatmatched its specific requirements,but also provided additional compu-tational resources for the processingof input values, which otherwisewould have to be done by the DSP.The Spartan-3 FPGA-based designcompletes several processes: Itaccomplishes system error checkingusing the values read from ADCs con-nected to the DSP. It implementstimer-driven activities like readingADCs precisely when necessary. Andit does an averaging of ADC values.

    Without the FPGA, some of thesefunctional requirements would havebeen impossible to implement. Otherfunctionalities would have requiredmore components on the DRIs controlboard or a significantly more complexsoftware architecture. The PPS teamknew it was crucial to avoid the latter,

    30 Xcell Journal Third Quarter 2012

    X C E L L E N C E I N G R E E N T E C H N O L O G Y

    PV Connection

    DC Energy Storage

    AC Grid Connection

    Motor/Generator

    Figure 2 Princeton Powers flexible, multiterminal DRI is here configured for an electrical microgrid.

  • since the control board acts as theheart of the DRI system.

    While an increasing number ofDSPs now offer peripherals that werepreviously absent, the importance ofhaving an FPGA still remains, saidFrank Hoffmann, the R&D managerat Princeton Power Systems. Witheach new generation, the amount ofcomputational resources inside theFPGA increasesfor example, froma Spartan-3 to a Spartan-6and it hasnow become possible to outsourcemore computational work to theFPGA. And this could mean runningour complex control algorithmsfaster and therefore improving thequality of a generated output like theone in the DRI.

    THE BOTTOM LINEWhile the technical benefits of usingan FPGA are clear (quick prototyping,flexible architecture, advanced sup-port tools like Xilinxs ChipScopeIntegrated Logic Analyzer for quick in-system debug), the decision has alsoaffected Princeton Power Systemsbottom line.

    Using an FPGA has made develop-ment much faster, reducing R&Dexpenses and time-to-market for newand innovative alternative-energy sys-tems, said executive vice presidentDarren Hammell. The programmingenvironment was easy to use andenabled us to rapidly develop and testour innovative software. This enabledus to complete the prototype for the

    demonstration much quicker than oth-erwise would have been possible.The product is now shipping, and PPShas added two new customers: BMWand SuperPlug have included a DRI innew power system designs.

    In fields like green power technology,engineers face new challenges, includ-ing determining how to optimize algo-rithm implementation while retainingnecessary functionality. With the righttools, technology and team, enhance-ments in this field lie just within reach.

    For more information on PrincetonPowers multiterminal DRI, please visithttp://www.princetonpower.com/prod_demand.shtml.

    You can reach PDS Consulting [email protected]

    X C E L L E N C E I N G R E E N T E C H N O L O G Y

    Third Quarter 2012 Xcell Journal 31

    1010000

    ==

    ==

    =~

    =~

    PV

    Power ElectronicsGrid

    DC/DC buck/boost

    bridges

    AC-to-DC Bridges

    Load Port

    Internet

    RemoteLogging &Monitoring

    Operator/User

    SensorsTrig

    ger

    s

    FPGA

    FPGADSP

    Battery

    Figure 3 Engineers chose a Spartan-3A FPGA, with its extensive SelectIO capabilities, as the main system peripheral.

  • 32 Xcell Journal Third Quarter 2012

    XCELLENCE IN SOLID-STATE DISKS

    A solid-state disk design based onPCI Express gains speed and performance thanks to Xilinx 7 series devices.

    Designing a 19-nm Flash PCIe SSD with Kintex-7 FPGAsby Yilei WangSenior Hardware EngineerMemblaze China [email protected]

    Xiangfeng LuCTO Memblaze China [email protected]

  • Solid-state disk (SSD) technologybased on NAND flash memoryprovides higher throughput andlower power consumption than tradi-tional mechanical-drive-based storagesystems. For that reason, SSD usagehas mushroomed over the last decade,moving from handheld devices to lap-top and desktop computers and, now,making incursions into the enterprisestorage market. The rapid rate ofexpansion has been further aided bythe enterprise storage industrys adop-tion of SSDs based on the SerialAdvanced Technology Attachment(SATA) standard.

    However, as SSD manufacturerslook toward next-generation systemsthat achieve new performance and den-sity highs by using flash memory that isimplemented in 19-nanometer processtechnology, SATA hasnt kept up. Evenwith the latest revision (SATA 3.0), the6-Gbps physical interface hardly meetsthe highest throughput of the SSDNAND flash arrays, and thus leavesextra performance on the table.

    To break the interface bottleneck,SSDs based on PCI Express are mak-ing a huge impact on the market.PCIe is an industry-standard localbus with higher performance and scal-ability than SATA. It is based on multi-lane high-speed serial links that sup-port one to 16 lanes, each operating atup to 8 Gbps (2.5 Gbps for Gen1, 5Gbps for Gen2, 8 Gbps for Gen3). ThePCIe interface for SSDs supports giga-byte throughput and better marginsfor the foreseeable future as NANDflash technology evolves.

    However, creating a PCIe-basedSSD system using 19-nm flash has itsshare of challenges. The PCIe inter-face requires more high-speed seriallinks and more-complex interconnectthan SATA. The throughput demandsrequire the PCIe direct memory access(DMA) to operate at a gigabyte band-width level. In addition, at the 19-nmprocess node, flash reliabilityorspecifically, the metric known aswear (the number of times a NANDcan read or write before encountering

    an error)is a growing issue. At 19nm, companies must perform wearleveling and error correction fasterthan ever before.

    Xilinx Kintex-7 FPGAs establisha new benchmark for FPGA high-endperformance at less than half the priceof previous-generation FPGAs. TheKintex-7 family is one of four productlines Xilinx built using TSMCs HPL(high-performance, low-power) 28-nmprocess, designed for maximumpower efficiency and delivering atwofold price/performance improve-ment while consuming 50 percent lesspower than previous generations.Kintex-7 FPGAs offer high-densitylogic, high-performance transceivers,memory and DSP, plus Agile MixedSignalall to enable higher system-level performance and the next levelof integration. These capabilities allowfor continued innovation and differen-tiation in designs at volume pricepoints. As such, Xilinxs Kintex-7series FPGAs are ideally suited for useas 19-nm flash PCIe SSD controllers.

    Third Quarter 2012 Xcell Journal 33

    X C E L L E N C E I N S O L I D - S T A T E D I S K S

    PCIe Gen 2 x 8

    7 SeriesPCIe Core

    TLP RXEngine

    DMARegister

    TLP RXEngine

    Interrupt

    DMA RX Engine

    TAG Module

    DMA TX Engine

    PCIe SG-DMASubsystem

    Kintex-7 325 T

    AXI 4 Bus

    AXI 4 Lite Bus

    9 x 2 GbitDDR3

    32-MB XOR Flash

    TemperatureSensor

    Identify Chip

    MIG DDR3Controller

    MicroBlaze 0

    MicroBlaze 1

    BRAMs

    Data AddressTranslate

    High-Speed Wear Leveling/

    Flash Block Manage

    Data AddressTranslate

    QSPI FlashController

    IIcController UART RS-232

    InterruptController

    SystemController

    On-Chip Register File CPUSubsystem

    High-Speed Intelligent ECC Ecoding

    High-Speed Intelligent ECC Ecoding

    Storage Subsystem

    19-nm Flash Controller 19-nm FlashArrays

    Figure 1 The Kintex-7 SoC solution for a PCIe 19-ns NAND flash SSD consists of three subsystems: CPU, storage and PCIe SG-DMA.

  • operations. This allowed our designteam to focus on the functions of theSG-DMA operation itself. The integrat-ed block for the PCIe solution sup-ports one-lane, two-lane, four-lane andeight-lane endpoint configurations atspeeds up to 5 GBps (Gen2), compli-ant with the PCIe Base Specification,rev. 2.1. Table 1 shows the 7 seriesFPGAs integrated block for PCIe con-

    figurations. The core can be config-ured as Gen1/Gen2 and for maximumsupport to x8 lanes, providing up to40-Gbps bandwidth.

    We used CORE Generator toolsto configure and generate the PCIeendpoint IP, which includes the userguide, source code, simulation codeand example designall of whichhelped us get up to speed quicklyusing the core. Figure 2 shows thePCIe hard cores top-level functionalblocks and interfaces.

    The main function of the SG-DMAcore is to process TLP packets fromthe host and respond. SG-DMA oper-ates as a PCIe master access to thehost memory, moving data betweenthe host and local memory. The host

    Figure 1 shows the Memblaze SSDcontroller architecture, featuring threesubsystems interconnected with ahigh-speed AXI4 bus. The PCIe SG-DMA subsystem, which includes theKintex FPGA hard core, scatters andgathers data between the host comput-er and the SSD data buffer (the SGstands for scatter and gather). TheCPU subsystem manages peripherals

    and executes SSD access commands,while the storage subsystem managesthe SSD sector data processing with amultichannel NAND controller, error-correcting code (ECC) block andwear-leveling block. These three sub-systems share a 2-Gbyte DDR3SDRAM with ECC function. Its easy togenerate an ECC DDR3 SDRAM con-troller with Xilinx Memory InterfaceGenerator (MIG) tools.

    In our design, the 7 series PCIehard core implements the physical-to-TLP layer and allows the design tofunction as a high-performance PCIeendpoint with minimal latency. Thenew embedded MircoBlaze core withARM AXI4 interconnect completelyremoves the bottlenecks of the on-

    34 Xcell Journal Third Quarter 2012

    X C E L L E N C E I N S O L I D - S T A T E D I S K S

    chip bus. The DDR3 hard core pro-vides a 51.2-Gbps ECC solution for thedisk cache. Meanwhile, the low-powerlogic resources make it easy toachieve high performance of wear lev-eling and intelligent ECC algorithmexecution. In addition, abundant high-performance I/O resources provide aneasy way to interconnect to the 19-nmNAND flash arrays.

    PCI EXPRESS SG-DMAOur designs PCIe interface required afast DMA controller to implementhigh-speed communications betweenthe host and the local AXI4 bus. Thethroughput of the SSD flash arrayscan reach up to 2.5 GBps. To simplifythe PCIe interface design and getgreater margin as flash chips evolve,we chose to use an eight-lane PCIeGen2/Gen3 architecture.

    The PCIe endpoint has many com-plex protocols to process in the phys-ical, data link and transaction layers.Luckily, designing the PCIe SG-DMAcontroller in the Xilinx 7 series FPGAswas quick and easy. The PCIe hardcore, which Xilinx had implementedin the devices fabric, handled all PCIe

    LogiCORE IP 7 Series FPGAsIntegrated Block for PCI Express

    UserLogic

    Physical LayerControl and Status

    HostInterface

    UserLogic

    7 Series FPGAsIntegrated Block for

    PCI Express(PCIE_2_1)

    Transceivers

    PCIExpressFabric

    User Logic

    ClockandReset

    PCI Express(PCI_EXP)

    Optional Debug

    System(SYS)

    TXBlock RAM

    RXBlock RAM

    AX14-StreamInterface

    Physical(PL)

    Configuration(CFG)

    Optional Debug(DRP)

    Figure 2 Top-level functional blocks and interfaces in the PCI Express hard core

  • X C E L L E N C E I N S O L I D - S T A T E D I S K S

    Third Quarter 2012 Xcell Journal 35

    sends commands to the DMA con-troller to control the DMA access.The command code is embedded inthe data of a specific host TLP regis-ter write. The SG-DMA controllerinitiates the SG-DMA write requestto move data from the local memoryto the host memory in response tothe hosts read commands. Similarly,for host write commands, the SG-

    DMA controller initiates a DMA readrequest to move data from the hostmemory to local memory. Figure 3illustrates the flow.

    AXI4 INTERCONNECTThe AXI interconnect IP connectsone or more AXI memory-mappedfaster devices to one or more memo-ry-mapped slave devices. The AXI

    interfaces conform to the AMBA

    AXI version 4 specifications fromARM, including the AXI4-Lite controlregister interface subset. The inter-connect IP is intended for memory-mapped transfers only; AXI4-Streamtransfers are not applicable. The AXIinterconnect IP can be used as apCORE from Xilinxs EmbeddedDevelopment Tool Kit (EDK) or as a

    Receive TLP register access TLP write?

    DMA command?

    DMA write?

    Send TLP write requestwith write data to host

    Await TLP DMA completionwith read data from host

    Send TLP register completewith register value

    Send TLP read request

    Set register with TLP value

    PCIe

    Gen (Integrated block)*

    Artix-7 Kintex-7 Virtex-7 T Virtex-7 XT Virtex-7 HT

    Gen2 Gen2 Gen2 Gen3 Gen3

    x4 x8 x8 x8 x8

    1 1 3-4 2-4 1-3

    5 5 8 8 8

    Width

    Number of Blocks

    Serial Date Rate (Gbps)

    *Based on symmetric filter implementation

    Table 1 7 Series FPGA integrated blocks for PCI Express

    Figure 3 Operation of the SG-DMA controller

  • every write to a previously writtenblock must first be read, erased, modi-fied and rewritten to the same location.This is very time-consuming, and highlywritten locations will wear out quickly,even when other locations on the flashare completely unused. Once a fewblocks reach their end of life, the driveis no longer operable.

    The first type of wear leveling is calleddynamic wear leveling. It uses a map tolink logical block addresses (LBAs) fromthe OS to the physical flash memory.Each time the OS writes replacementdata, the map is updated to mark theoriginal physical block as invalid data,and a new block is linked to that mapentry. Each time a block of data is rewrit-ten to the flash memory, it is written to anew location. However, blocks thatnever get replacement data sit with noadditional wear on the flash memory.The drive may last longer than one withno wear leveling, but some blocks, whilestill remaining active, will go unused.

    Another technique, called staticwear leveling, also uses a map to linkthe LBA to a physical memory address.Static wear leveling works the sameway as dynamic wear leveling exceptthe static blocks that do not change areperiodically moved so that other datamay access these low-usage cells. Thisrotational effect enables the SSD tooperate until most of the blocks arenear their end of life.

    Figure 4 shows flash pages with andwithout wear leveling after a longwrite/erase operation. The one withoutwear leveling, with black pages, is bro-ken and can no longer record any data,while the one with wear leveling stillfunctions with all pages.

    INTELLIGENT ECC ALGORITHMAnother key component of SSD systemdesign is error correction. There are anumber of anomalies that can cause biterrors, which in turn can affect dataintegrity and even the proper operationof the system itself. To deal with theseerrors, our design team employs com-plex ECC algorithms that get even more

    standalone core from Xilinxs COREGenerator IP catalog.

    The designer can select from twomodes of operation that the XilinxAXI4 IP supports. The performance-optimized crossbar mode has ashared-address, multiple-data (SAMD)crossbar architecture with parallelpathways for write and read datachannels. The area-optimized shared-access mode features shared writedata, shared read data and singleshared address pathways. Both ofthese modes support burst lengths upto 256 for incremental (INCR) bursts,and variable data width from 32 up to1,024 bits. Propagated USER signalsare also supported on each channel, ifany; an independent USER signalwidth per channel is optional.

    The AXI4 interconnect provides highperformance between the PCIe SG-DMA and the DDR3 memory. We found

    36 Xcell Journal Third Quarter 2012

    X C E L L E N C E I N S O L I D - S T A T E D I S K S

    that the AXI4-Lite shared bus is also aperfect solution for the low-speed on-chip interconnect, requiring minimalconsumption of logic resources.

    WEAR-LEVELING TECHNOLOGYWear leveling is a design technique thatstorage-media companies employ toprolong the service life of variouskinds of erasable computer storagetypes, such as the flash memory usedin solid-state drives. There are a fewwear-leveling mechanisms used in aflash memory systems, each with vary-ing levels of longevity enhancement.

    A flash memory storage systemwithout wear leveling will not last verylong if it is writing data to the flash.Without wear leveling, the flash con-troller must permanently assign thelogical addresses from the operatingsystem (OS) to the physical addressesof the flash memory. This means that

    Without wear leveling With wear leveling

    Figure 4 Flash pages with and without wear leveling

  • X C E L L E N C E I N S O L I D - S T A T E D I S K S

    Third Quarter 2012 Xcell Journal 37

    elaborate when we use new, smaller-geometry flash in these systems.

    One ECC algorithm we use for 19-nm NAND flash memory is called ananti-random data error record. Thealgorithm addresses bit errors causedby temperature changes, noise andreliability of the storage cell. In addi-tion, the storage cell of NAND flashnormally has limited lifetimes of eras-ing/programming. The bit error rate(BER) increases with the accumula-tion of erasing/programming opera-tions until the limited lifetimes runout. The SSD ECC function requiresthe algorithm also to detect the BERof each cell and to understand theirlifetimes. Designers set a certain BERthreshold to indicate that a lifetimehas been reached and to identify areplacement block. However, opti-

    mizing this threshold