of 68/68
R Issue 62 Fourth Quarter 2007 Xcell journal Xcell journal SOLUTIONS FOR A PROGRAMMABLE WORLD SOLUTIONS FOR A PROGRAMMABLE WORLD COVER Processing Signals from Outer Space with BEE2 INSIDE Easy FPGA Development Prototyping Image Processing Applications Boosting Wireless Subsystem Performance with FPGA Co-Processing Integrating HDL Design and Verification with System Generator Audio Sample Rate Conversion in FPGAs COVER Processing Signals from Outer Space with BEE2 INSIDE Easy FPGA Development Prototyping Image Processing Applications Boosting Wireless Subsystem Performance with FPGA Co-Processing Integrating HDL Design and Verification with System Generator Audio Sample Rate Conversion in FPGAs www.xilinx.com/xcell/ XtremeDSP Solutions: The Sky’s the Limit XtremeDSP Solutions: The Sky’s the Limit HIGH-PERFORMANCE DSP EDITION

Xcell - Xilinx · Xcell journal Fourth Quarter 2007 SOLUTIONS FOR A PROGRAMMABLE WORLD COVER Processing Signals from Outer Space with BEE2 INSIDE Easy FPGA Development Prototyping

  • View
    6

  • Download
    0

Embed Size (px)

Text of Xcell - Xilinx · Xcell journal Fourth Quarter 2007 SOLUTIONS FOR A PROGRAMMABLE WORLD COVER...

  • R

    Issue 62Fourth Quarter 2007

    Xcell journalXcell journalS O L U T I O N S F O R A P R O G R A M M A B L E W O R L DS O L U T I O N S F O R A P R O G R A M M A B L E W O R L D

    COVER

    Processing Signalsfrom Outer Spacewith BEE2

    INSIDE

    Easy FPGA Development

    Prototyping Image Processing Applications

    Boosting Wireless Subsystem Performance with FPGA Co-Processing

    Integrating HDL Design and Verification with System Generator

    Audio Sample RateConversion in FPGAs

    COVER

    Processing Signalsfrom Outer Spacewith BEE2

    INSIDE

    Easy FPGA Development

    Prototyping Image Processing Applications

    Boosting Wireless Subsystem Performance with FPGA Co-Processing

    Integrating HDL Design and Verification with System Generator

    Audio Sample RateConversion in FPGAs

    www.xilinx.com/xcell/

    XtremeDSP Solutions:The Sky’s the LimitXtremeDSP Solutions:The Sky’s the Limit

    H I G H - P E R F O R M A N C E D S P E D I T I O N

  • Support Across The Board.™

    Add Functionality to Your Prototype Platform

    Available EXP Modules

    • EXP to P160 Adapter

    • EXP Prototype

    • Analog Devices EXP Adapter Module

    • Video EXP Module

    • High-Speed ADC EXP Module

    • High-Speed DAC EXP Module

    • Interface EXP Module

    EXP Baseboards

    • Xilinx Spartan-3A DSP Starter Kit

    • Xilinx Spartan-3 PCI Express Starter Kit

    • Xilinx Virtex-4 FX60 PCI Express Board

    • Xilinx Virtex-5 LX Development Kit

    • Xilinx Virtex-5 LXT PCI Express Board

    • Xilinx Virtex-5 SXT PCI Express Board

    Avnet Electronics Marketing introduces the EXP specification –

    a versatile expansion interface for FPGA development boards

    that allows designers to add application specific daughter

    cards and easily connect to FPGA I/Os.

    Key EXP Features

    • Royalty free, public domain specification

    • Full and half card options

    • 168 (full module) and 84 (half module) user I/O

    • Single-ended and differential pair signaling

    • High-performance: > 100 MHz single-ended and

    > 300 MHz differential

    Learn more about EXP at www.em.avnet.com/exp

    Enabling success from the center of technology™

    800-332-8638www.em.avnet.com

    © Avnet, Inc. 2007. Al l r ights reserved. AVNET is a registered trademark of Avnet, Inc.Al l other brand or product names are trademarks of their respect ive owners.

  • InnovationLeadership above all…

    Xilinx brings your biggest ideas to reality: Virtex™-5 FPGAs — Highest Performance. Leading the industry inperformance and density, the Virtex-5 family offers multiple plat-forms optimized for logic, serial connectivity, DSP and embeddedprocessing. Plus our Virtex™-5 EasyPath™ FPGAs give you a conversion-free, cost-reduction path for volume production.

    Spartan™-3 Generation FPGAs — Lowest Cost. A unique balanceof features and price for high-volume applications. Multipleplatforms allow you to choose the lowest cost device to fit yourspecific needs.

    CoolRunner™-II CPLDs — Lowest Power. Unbeatable for low-powerand handheld applications, CoolRunner-II CPLDs deliver more forthe money than any other competitive device.

    ISE™ Software — Ease-of-Design. With SmartCompile™ technology,users can achieve up to 6X faster runtimes, while preserving timing and implementation. Highest performance in the fastesttime — the #1 choice of designers worldwide.

    Visit our website today, and find out why Xilinx products areworld renowned for leadership . . . above all.

    www.xilinx.com

    At the Heart of Innovation©2007 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc. All other trademarks are the property of their respective owners.

    Get started quickly with easy-to-use kits

    Order online at www.xilinx.com/getstarted

  • IL E T T E R F R O M T H E P U B L I S H E R

    Xilinx, Inc.2100 Logic DriveSan Jose, CA 95124-3400Phone: 408-559-7778FAX: 408-879-4780www.xilinx.com/xcell/

    © 2007 Xilinx, Inc. All rights reserved. XILINX, the Xilinx Logo, and other designated brands includedherein are trademarks of Xilinx, Inc. All other trade-marks are the property of their respective owners.

    The articles, information, and other materials includedin this issue are provided solely for the convenience ofour readers. Xilinx makes no warranties, express,implied, statutory, or otherwise, and accepts no liabilitywith respect to any such articles, information, or othermaterials or their use, and any use thereof is solely atthe risk of the user. Any person or entity using suchinformation in any way releases and waives any claim itmight have against Xilinx for any loss, damage, orexpense caused thereby. Forrest Couch

    Publisher

    PUBLISHER Forrest [email protected]

    EDITOR Charmaine Cooper Hussain

    ART DIRECTOR Scott Blair

    DESIGN/PRODUCTION Teie, Gelwicks & Associates1-800-493-5551

    ADVERTISING SALES Dan Teie1-800-493-5551

    TECHNICAL COORDINATOR Larry Caputo

    INTERNATIONAL Piera Or, Asia [email protected]

    Christelle Moraga, Europe/Middle East/[email protected]

    Yumi Homura, [email protected]

    SUBSCRIPTIONS All Inquirieswww.xcellpublications.com

    REPRINT ORDERS 1-800-493-5551

    Xcell journal

    www.xilinx.com/xcell/

    A New Era of Signal ProcessingIt is no accident that Xilinx® FPGAs serve an increasingly vital role in the design and developmentof today’s most demanding digital signal processing (DSP) systems. Superior performance, system-level cost, power efficiency, faster time to market, and unrivaled flexibility are the hallmarks ofFPGA-based DSP designs – value propositions that have found increasingly appreciative receptionamong leaders in markets like the communications industry.

    Driven by the global demand for higher quality, higher bandwidth, and inexpensive wired andwireless communications of voice, computer, and video data, the number and complexity of newcommunications standards has grown exponentially. This is in large part because of the need forinteroperability and data exchange across myriad layers of legacy and next-generation networks.Keeping pace with these standards and the extremely critical price/performance/power ratios theypose has been anything but trivial for system vendors. Nonetheless, the flux continues to produceopportunities for industrious innovators willing to tackle the challenge.

    In the dynamic markets served by high-performance DSP solutions, the inherent flexibility of theFPGA equates to:

    • Faster time to market with leading-edge algorithmic solutions and standards implementations

    • Lowered operational costs with easy-to-perform remote adaptation to unforeseen environmentaland functional changes

    • Reduced capital expenditures due to extended life cycles for existing designs

    • Easy migration to keep pace with changing customer demands and market requirements

    In This IssueThis high-performance DSP edition of Xcell Journal offers several interesting – and in some casesout of this world – applications of high-performance FPGA-based DSPs. For example, the coverstory shows how several UC Berkeley astronomers programmed the BEE2 platform using aprogramming environment that leverages Linux, Xilinx System Generator, and Xilinx EDK foradvanced radio telescope applications. Other articles discuss the growing ease of programmingan FPGA-based DSP design, interesting system-design challenges using development platformsfor FPGA-based image processing applications, analyzing different hardware/software partitioningschemes to boost wireless subsystem performance, designing and verifying the CABAC functionalityof an H.264 video encoder with System Generator for DSP, and an efficient implementation ofaudio algorithms in programmable logic.

    As Jeff Bier, president of BDTI, notes, “Today, FPGAs play an increasing role in a wide range ofDSP applications. We expect this trend to continue over the next several years.” Xilinx sees an ever-broadening horizon for high-performance DSP processing solutions in the demanding digital com-munications, multimedia video and imaging, and defense systems markets. And XilinxXtremeDSP™ solutions offer support every step of the way, with advanced DSP-optimized FPGAdevices, hardware and software development tools, development kits, and evaluation platforms, aswell as comprehensive training and support to ensure your success.

    We hope you enjoy this issue.

  • O N T H E C O V E R

    3131F E A T U R E

    3838

    F E A T U R E

    99V I E W P O I N T

    Easy FPGA DevelopmentIf current trends continue, FPGAs may become easier to program than DSPs.

    4444F E A T U R E

    Integrating HDL Design and Verification with System GeneratorDesigning and verifying the CABAC functionality of an H.264 video encoder is easy with System Generator for DSP.

    5050F E A T U R E

    Audio Sample Rate Conversion in FPGAsAn efficient implementation of audio algorithms in programmable logic.

    Prototyping Image Processing ApplicationsDevelopment platforms for FPGA-based image processing applica-tions present some interesting system-design challenges.

    Boosting Wireless Subsystem Performance with FPGA Co-ProcessingAnalyzing different hardware/software partitioning schemes.

    Processing Signals from Outer Space with BEE2 UC Berkeley astronomers programmedthe BEE2 platform using a programmingenvironment that leverages Linux, XilinxSystem Generator, and Xilinx EDK.

    1313 Cover

  • F O U R T H Q U A R T E R 2 0 0 7, I S S U E 6 2 Xcell journalXcell journalVIEWPOINTS

    Letter from the Publisher. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    Vin Ratford: Savoring the Highs and Lows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    Kenton Williston: Easy FPGA Develpoment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    Jeff Bier: DSP Performance of FPGAs Revealed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    Ivo Bolsens: Virtual Worlds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

    FEATURES

    Processing Signals from Outer Space with BEE2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    Prototyping Applications with the Spartan-3A DSP Starter Platform . . . . . . . . . . . . . . . . . . . . . 18

    Future-Proofing Military Applications Using FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    Using MATLAB to Create IP for System Generator for DSP . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    Automatic IP Block Selection with IP-Explorer Technology. . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    Prototyping Image Processing Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    The Virtex-5 SXT Option for High-Performance Digital Signal Processing. . . . . . . . . . . . . . . . . . 34

    Boosting Wireless Subsystem Performance with FPGA Co-Processing . . . . . . . . . . . . . . . . . . . . 38

    Selecting the Right Memory Controller for Real-Time Applications . . . . . . . . . . . . . . . . . . . . . . 41

    Integrating HDL Design and Verification with System Generator . . . . . . . . . . . . . . . . . . . . . . . 44

    Accelerating System Development Cycles with the Radar Blockset Library . . . . . . . . . . . . . . . . 47

    Audio Sample Rate Conversion in FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

    RESOURCES

    Technical Papers and Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

    Development Boards and Kits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

    Education and Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

    Software Tools and IP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

    Technical Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

  • by Vin RatfordVice President and GeneralManager, PSG DivisionXilinx, [email protected]inx.com

    At around the time ofour last Xilinx DSPMagazine release, I

    took a much needed (and planned) sabbat-ical. As I departed, Xilinx was preparing tolaunch the first family of devices targeted atlower performance signal processing applica-tions. The Spartan™ DSP family debutedwith the Spartan-3A DSP 3400 device andsoon followed with the Spartan-3A DSP1800. While watching from sidelines, therollout looked great and sampling wasunderway for both devices upon their April2007 introduction.

    However, what really excited me wasnews that we beat the production releasedates for both devices, delivering them amonth ahead of schedule. In addition, thecrisp execution of the follow-up Spartan-DSP low-power version delivered on a keyvalue that is and will continue to drive high-performance signal processing decisions.

    The LP version delivers a power-efficientspecification that is an increasingly impor-tant value when meeting performanceneeds. DSP power efficiency refers to theamount of power consumed in performingsignal processing calculations. DSP powerefficiency measurements can be applied tosystems, functions, building blocks, and

    common operations. The Spartan3A-DSPLP delivers 4.06 GMACs/mW at a maxi-mum speed of 250 MHz when analyzingthe common multiply-and-accumulateoperation. Overall, the device delivers a50% static power savings and a 70% sav-ings while in suspend mode compared tonon-low-power devices. This complementsthe dynamic power advantage inherent inthe Spartan-3A DSP series given the inte-gration of dedicated DSP circuitry(DSP48A slices).

    With that execution theme and powerefficiency idea in mind, in this, the high-performance DSP edition of Xcell Journal,we dig further into the uses of theXtremeDSP™ solution.

    Xcell Journal is honored to again haveDSP industry icon Jeff Bier from BDTI

    expand on why FPGAs excel in signal pro-cessing, using independent benchmarkresults described in his company’s report.Also, we welcome DSP DesignLine EditorKenton Williston, who shares his views onwhat the future may hold for FPGAs anddigital signal processing. In addition, wethank all of our contributors in this issuefor their outstanding articles and insightin the use of our XtremeDSP solutions inthe areas of digital communications,video, and other application areas.

    I am excited to be back at work in theProcessing Solutions Group and look for-ward to delivering a host of new solutionsto meet your demands. I encourage you toexplore the latest advances in XtremeDSPsolutions and solicit your input in how wecan put this solution to work for you.

    8 Xcell Journal Fourth Quarter 2007

    Savoring the Highs and LowsHigh performance and low power distinguish our signal processing solutions.

    Viewfrom the top

  • by Kenton WillistonSite EditorDSP [email protected]

    The evolution ofFPGAs has paral-leled that of DSPs inmany ways. In the

    early days of DSP, programs were writtenentirely in assembly language. Today, DSPprogrammers typically start with a high-level language such as C and then opti-mize the “hot spots” using intrinsics orhand-coded assembly.

    Similarly, FPGA designs were oncecrafted in Verilog or VHDL. Today,FPGA designers often start with a high-level description in an ESL tool, optimiz-ing hot spots using IP blocks orhand-coded HDL.

    This transition is a good thing. Fiveyears ago, you couldn’t build much ofanything in an FPGA without hard-coreFPGA skills. This kept FPGAs out ofreach for most DSP programmers.Thanks to high-level tools, the averageDSP programmer can now make the leapto FPGA design with a reasonableamount of effort – and produce reason-ably optimal designs.

    Good examples of this trend are theAccelDSP™ synthesis tool and SystemGenerator for DSP tool, both from Xilinx,which integrate into the ubiquitous Simulinkenvironment. These tools allow DSP devel-opers to go straight from algorithm design toFPGA implementation in a familiar tool flow.

    FPGAs are also following the path ofDSPs in the area of intellectual property.Not long ago, DSP programmers had towrite all of their own code. Today, DSP

    developers can turn to outside sources for“commodity” software. FPGA design hasfollowed a similar path – designers whoonce had to develop applications on theirown now have access to IP blocks rangingfrom FFTs to video encoders. These IPblocks make development faster and easier,giving DSP designers a way to translate

    their algorithm knowledge into FPGAdesigns. For example, in SystemGenerator, designers can use the FFT IPblock to try out different FFT variants andexplore different trade-offs between per-formance and resource utilization.

    Finally, FPGA families are growing –much in the same way as DSP processorfamilies have grown – to include productsthat span a broad range of price and per-formance points. This makes FPGAs suit-able for a wider range of applications. Forexample, Xilinx® products range from theVirtex™-5 FPGA, which is suitable forhigh-performance DSP applications, tothe low-cost Spartan™-3A DSP device.

    The growing product lines also let design-ers migrate from one FPGA to another. Forexample, a design that starts out on a smallerFPGA can migrate to a larger FPGA if thedesign needs more functionality.

    Although the evolution of FPGAs hasmimicked that of DSP in many regards, thereis one area in which FPGAs are making a dra-

    matic departure. As processors move tosmaller process nodes, FPGA designers getmore gates to play with. This makes it easierand easier to produce a design with thedesired performance.

    In contrast, process scaling has reached apoint of diminishing returns for traditionalsingle-core processor architectures, forcingvendors to turn to multi-core architectures.Although multi-core processors are very pow-erful, they are much more difficult to programthan single-core architectures. Thus, DSPs aregenerally becoming harder to program whileFPGAs are becoming easier to program. If thistrend continues, we may see the day whenFPGAs are easier to program than DSPs.

    Fourth Quarter 2007 Xcell Journal 9

    Easy FPGA DevelopmentIf current trends continue, FPGAs may become easier to program than DSPs.

    V I E W P O I N T

  • by Jeff BierPresidentBDTI – Berkeley Design Technology, [email protected]

    FPGAs are increasingly considered for useas processing engines in high-performanceDSP applications such as wireless base sta-tions. In these applications, they may com-pete with DSP processors or workside-by-side with them.

    With more choices, system designersneed a clear picture of the signal processingperformance of high-end FPGAs, both rel-ative to each other and to high-end DSPprocessors. Unfortunately, the most com-monly used performance figures are unreli-able, confusing, and often contradictory.

    For example, because DSP applicationsoften rely heavily on multiply accumulate(MAC) operations, DSP processor andFPGA vendors sometimes use peak MACsper second as a simple metric for compar-ing digital signal processing performance.But MAC throughput is a lousy predictorof performance for FPGAs and DSPs alike.Let’s examine a few reasons why.

    Simple Metrics Fall ShortThe MAC performance numbers forFPGAs often assume that the hard-wiredDSP elements are operating at their highestpossible clock rate. In practice, typicalFPGA designs will operate at lower speeds.Plus, using hard-wired elements is not theonly way to implement MACs on FPGAs;you can achieve additional MAC through-put using programmable logic resourcesand distributed arithmetic. This approachcan yield higher MAC throughput thanusing hard-wired elements alone.

    Yet another consideration is that typicalDSP applications rely on many operationsother than MACs. Viterbi decoding, forexample, is a key DSP algorithm used intelecommunications applications thatmakes no use of MACs at all.

    Another approach for assessing signalprocessing performance is to use commonDSP functions like FIR filters. But thisapproach can have drawbacks too. Oneproblem is that each vendor typically usesa different implementation of these func-tions – perhaps using different datawidths, a different algorithm, or differentimplementation parameters such as laten-

    cy. This means that results from differentvendors are generally not comparable.

    Furthermore, small kernel functions typi-cally aren’t effective for FPGA benchmarkingbecause the way you would implement afunction within a full FPGA application isoften quite different from the way you wouldimplement the function alone. (For proces-sors, in contrast, these little benchmarks areusually pretty good at predicting overall DSPapplication performance.) Benchmarksimplemented by processor or FPGA vendorsoften lack independent verification, makingit difficult for engineers to make confidentcomparisons between devices.

    10 Xcell Journal Fourth Quarter 2007

    DSP Performance of FPGAs RevealedPopular report provides independent DSP benchmark results for FPGAs.

    V I E W P O I N T

  • Independent Benchmarks Fill the GapBDTI is the most respected source for sig-nal processing benchmarks. Our bench-marks are used by dozens of semiconductorvendors and thousands of chip users toevaluate, compare, and select signal pro-cessing engines. BDTI has benchmarkedthe signal processing performance ofprocessors for nearly 15 years, and hasexpanded its benchmarking activities toinclude FPGAs, multi-core chips, andother technologies.

    Several years ago BDTI recognized theneed for independent, accurate, apples-to-apples performance comparisonsbetween FPGAs and processors targetingDSP applications. To address this need,BDTI developed a new application-ori-ented benchmark, the BDTICommunications Benchmark (OFDM),based on an orthogonal frequency divi-sion multiplexing (OFDM) receiver. Thisbenchmark is designed to be representa-tive of the “baseband” signal processingworkloads increasingly found in commu-nications equipment for applications suchas DSL, cable modems, and wireless sys-tems. It is suitable for implementation onFPGAs, DSP processors, multi-core

    have suspected that FPGAs could providebetter chip-level cost/performance thanDSPs in some applications but were notsure how much better. In the report, BDTIalso compares cost/performance results forFPGAs from competing vendors.

    Of course, benchmark results alone arenot enough to answer the question ofwhether to use an FPGA in a new systemdesign, or which FPGA to choose.Designers need to understand how theirchoice of processing engine will affectdevelopment flow, implementation effort,and system design. For this reason,BDTI’s report explores the qualitative fac-tors that influence the decision of whetherto use an FPGA, a DSP, or both, and pro-vides guidance on how to make aninformed choice. The report also high-lights the key questions that will affect thelong-term success of FPGAs in high-endDSP applications, such as FPGA energyefficiency and the effectiveness of newhigh-level synthesis tools for FPGAs.

    ConclusionBDTI plans to conduct further analysis inthese areas and we will continue to evaluatethe signal processing capabilities of newFPGAs, DSPs, and massively parallel proces-sors. The competition among signal process-ing engines is heating up, and BDTI willcontinue to provide the data and analysisneeded to make confident choices.

    chips, and many other signal processingengines, yielding apples-to-apples bench-mark results.

    Last year, BDTI used the BDTICommunications Benchmark (OFDM) toevaluate several new high-performanceFPGAs and DSP processors. The full resultsof this analysis are published in our report,“FPGAs for DSP: Second Edition,” whichhas attracted considerable attention amongDSP system designers. The report includestwo sets of benchmark results: high-capacityresults (optimized to support the maxi-mum number of channels per chip) andlow-cost results (optimized for lowestcost per channel). Figure 1 shows nor-malized, low-cost results for a Xilinx®

    Virtex™-4 SX25 FPGA and a typicalhigh-performance DSP processor.

    As Figure 1 demonstrates, BDTI’sbenchmark results provide a dramaticdemonstration of the potential cost advan-tages of using FPGAs for high-performanceDSP applications. The Virtex-4 SX25device is more cost-effective by more thanan order of magnitude over a typical high-performance DSP processor on this bench-mark. This information is extremelyvaluable for system designers, who may

    Fourth Quarter 2007 Xcell Journal 11

    XilinxVirtex-4

    SX25 FPGA

    Typical High-Performance

    DSP Processor

    0

    10

    20

    30

    40

    BD

    TI C

    hann

    el/$

    (N

    orm

    aliz

    ed)

    Hig

    her

    is B

    ette

    r

    Figure 1 – BDTI Communications Benchmark (OFDM) BDTI-certified cost-/performance-optimized results

    BDTI’s report, “FPGAs for DSP,Second Edition,” is the authoritative

    source for independentFPGA signal processingbenchmark results. Xcell Journal readers canrequest detailed reportexcerpts via e-mail; send your request to

    [email protected] For more information about

    “FPGAs for DSP, Second Edition”(including an order form), visitwww.BDTI.com/fpgas2006. For more information about BDTI, visit www.BDTI.com.

    V I E W P O I N T

  • Why use FPGAs for signal processing?

    Why use FPGAs for signal processing?

    Explore the many benefits of usingXilinx high-performance DSPs in your digital communications,multimedia video and imaging,and defense systems applications.

    Xilinx offers a complete DSP processing solution, including devices, hardware and software development tools, reference designs, boards and kits,parameterizable algorithms (IP cores), training, and support.

    Download our XtremeDSP™ Solutions Selection Guide now and see how quickly you can get started.

    www.xilinx.com/publications/prod_mktg/pn0010944-3.pdf

    Download our XtremeDSP™ Solutions Selection Guide now and see how quickly you can get started.

    www.xilinx.com/publications/prod_mktg/pn0010944-3.pdf

  • by Chen Chang CTOBEEcube [email protected]

    Bob BrodersenProfessor EmeritusUC [email protected]

    John WawrzynekProfessorUC [email protected]

    Dan [email protected] Chief ScientistDirector, Center for Astronomy Signal ProcessingUC [email protected]

    Kees VissersPrincipal EngineerXilinx, [email protected]

    Modern radio telescopes look at the sky infrequency bands ranging from near zero to11 GHz. After processing, these kinds oftelescopes can provide information aboutvery interesting phenomena, like collidingblack holes, as shown in Figure 1.

    These telescopes were historically con-structed out of gigantic single-dish antennassuch as the Arecibo telescope. However,because of the prohibitive cost of steel forconstructing telescopes, Arecibo hasremained the single largest collecting area tel-escope in the world for more than 50 years.

    Fourth Quarter 2007 Xcell Journal 13

    Processing Signals from Outer Space with BEE2 Processing Signals from Outer Space with BEE2 UC Berkeley astronomers programmed the BEE2 platform using a programming environment that leverages Linux, Xilinx System Generator for DSP, and Xilinx EDK.UC Berkeley astronomers programmed the BEE2 platform using a programming environment that leverages Linux, Xilinx System Generator for DSP, and Xilinx EDK.

  • Since the invention of digital signal pro-cessing, radio telescope design using a largecollecting area has moved to large arrays(hundreds to thousands) of small-diameter(6-12 m) antennas, where the steel cost canbe balanced with the electronics cost. Theantennas are physically spread across largegeographic areas, providing extremely longand various length baselines and betterangular resolution. The Allen TelescopeArray near Hat Creek in NorthernCaliforinia is an example of this modernkind of telescope. In this article, we’ll focuson the signal processing with FPGAsemployed in these telescopes by the SETIInstitute and the Radio AstronomyLaboratory at UC Berkeley.

    Large-N Small-D Antenna Array CorrelatorAntenna arrays pose significant signal pro-cessing challenges. To form proper images,all signals from the antennas must be cor-related with each other, thus requiring acomputing complexity of O (N 2 ). Toachieve a 1-square-kilometer collecting arearequires more than 8,000 antennas, each12 meters in diameter.

    To correlate the whole 11 GHz band-width, the computational requirement ison the order of 1,000 peta-operations persecond, which is more than 3,000 times thecapability of the current fastest supercom-puter. Clearly, only a novel computingapproach can achieve this computationalthroughput at a reasonable cost.

    Furthermore, radio telescopes are typi-cally designed to operate for more than 30years. It is not necessary to invest in all ofthe electronics up-front to meet the band-

    ning at 200 MHz (400 DDR) with a 72-bitdata interface. Therefore, the peak aggregatememory bandwidth is 12.8 Gbps perFPGA. Each module uses four FPGAs forcomputation and one for control. The con-trol FPGA runs Linux OS on the embed-ded PowerPC 405 to manage the computeprocesses on each user FPGA, as well asmonitoring the overall module operation.

    The four user FPGAs are directly con-nected on a 2D grid, with each link pro-viding more than 40 Gbps of datathroughput, as shown in Figure 2. Thefour down links from the control FPGAto each of the computing FPGAs provideas much as 20 Gbps per link. All off-module connections use the MGTs onthe FPGA, four channel-bonded into 10Gigabit Base-CX4 Ethernet interfaces.There are a total of 18 such CX4 inter-faces on each BEE2 module – two fromthe control FPGA and four from each ofthe four compute FPGAs – for a total 180Gbps full duplex bandwidth. For applica-tions that require high bi-section band-width for random communication amongmany compute modules, the BEE2 sys-tem can be directly connected to com-mercial 10 Gigabit Ethernet switches. Inaddition, a 10 or 100 Base-T Ethernet

    width requirements; rather, the electronicscan be upgraded every several years, gradu-ally increasing the total observable band-width. This approach can achieve the bestprice/performance ratio overall by leverag-ing the exponentially decreasing cost ofsemiconductor technology.

    Today’s practice often involves special-ized electronics with ad-hoc software. Inthis article, we’ll explore the opportunity toleverage BEE2 systems and programmingenvironments that comprise racks of reli-able hardware and commodity routers. Theback-end computers and the display part ofthe system can be commodity computers.Modern FPGA-based systems can cover thesignal processing needs between the giga-hertz range analog-to-digital converters(ADCs) and the back-end computers.These FPGA systems can be very reliableand cost-effective.

    BEE2 SystemEach compute module in the BEE2 systemcomprises five Xilinx Virtex™-II Pro 70FPGA chips directly connected to fourDDR2 240-pin DRAM DIMMs, with amaximum capacity of 4 GB per FPGA. Thedesign organizes the four DIMMs into fourindependent DRAM channels, each run-

    138 bits 300 MHz DDR 41.4 Gbps

    64 bits

    at 300

    DDR

    100BTEthernet

    Five FPGAsXC2VP70FF1704

    4 GB DDR2 DRAM12.8 GB/s (400 DDR)

    MG

    T

    DR

    AM

    DR

    AM

    DR

    AM

    DR

    AM

    DR

    AM

    DR

    AM

    DR

    AM

    DR

    AM

    MG

    TM

    GT

    MG

    T

    MemoryController

    FPGAFabric

    FPGAFabric

    FPGAFabric

    FPGAFabric

    MemoryController

    MemoryController

    MemoryController

    MGT

    IB4X/CX440 Gbps

    IB4X/CX440 Gbps

    IB4X/CX440 Gbps

    IB4X/CX420 Gbps

    IB4X/CX440 Gbps

    DR

    AM

    DR

    AM

    DR

    AM

    DR

    AM

    DR

    AM

    DR

    AM

    DR

    AM

    DR

    AM

    DR

    AM

    DR

    AM

    DR

    AM

    DR

    AM

    MemoryController

    FPGAFabric

    Figure 1 – Information about black holes gathered from a radio telescope image.

    Figure 2 – Compute node connectivity

    14 Xcell Journal Fourth Quarter 2007

  • Fourth Quarter 2007 Xcell Journal 15

    interface on the control FPGA providesan out-of-band communication networkfor the user interface, low-speed systemcontrol, monitoring, and data archiving.

    BEE2 DSP Programming EnvironmentIn the past decade, block diagram-basedalgorithm description methods have beengaining popularity, especially in the DSPdomain. Software simulation environments,such as The MathWorks Simulink, provideparallel data flow execution models thatmatch the nature of DSP data flow process-ing. With a rich set of domain-specific,high-level block libraries, algorithm design-ers can quickly construct complex DSP andcommunication systems. Xilinx® SystemGenerator for DSP tools extend the model-ing and simulation capability of Simulink toa direct mapping of core DSP algorithms toFPGA implementations.

    However, most real-world DSPdesigns demand much more thanjust the core algorithms. With thelatest generations of Xilinx FPGAs,many system-level components cannow be integrated directly on a sin-gle FPGA, such as network inter-faces, embedded microprocessors,memory, and I/O devices. Alongwith these hardware subsystems, anFPGA design is no longer justhardware design but rather a designthat contains hardware and soft-ware. The software includes com-plex software OS and applications.

    The BEE2 DSP programmingenvironment, called BEE PlatformStudio (BPS) and shown in Figure3, provides an integrated design environ-ment in Simulink that abstracts away thecomplexity of advanced FPGAhardware/software co-design. This environ-ment was specifically created for BEE2 plat-forms. It enables algorithm designers tofocus on core DSP algorithms, while it auto-matically generates the code and bit files forthe complicated hardware and software sub-systems. It is built on top of existing Xilinxtool flows. Where Xilinx System Generatorprovides an excellent block set for mappingDSP algorithms, the Xilinx EmbeddedDevelopers Kit (EDK) provides micro-

    processor and system integration, andISE™ software provides the back end forlogic synthesis, place and route, and bit gen-eration for the hardware.

    A typical design in the BPS design envi-ronment starts with the core algorithmdesign in Simulink with Xilinx SystemGenerator for DSP. From the end-user per-spective, Simulink designs only exist in anidealized sandbox with the synchronousdata flow execution model; all connectionsoutside the core algorithm are virtuallymapped through BPS interface block sets.A typical design is shown in Figure 4. The

    BPS block sets are created by FPGA expertsto replace generic XSG “gateways” as inter-faces between the core algorithm design inSimulink and the system-level devices.

    A processor core, either in the form of ahard core (PowerPC 405) or soft core(MicroBlaze™ processor), is implicitlyincluded in all BPS designs. The processorcore can communicate with the user XSGdesign through software registers, FIFO, orshared memory. Users can specify thedesired communication method by select-ing the corresponding BPS blocks inSimulink. All external network, I/O, andmemory devices are abstracted intoSimulink data sources or sinks, with a sim-ple FIFO abstraction.

    For each of the supported FPGA boardplatforms, BPS framework provides a basesystem package as a complete XilinxPlatform Studio (XPS) project, and the cor-

    responding Simulink BPS block setfor all the external devices. Eachbase-system package includes theessential system device IP cores, ini-tial hardware system configuration,and available software packages.The back-end implementation filesfor the user-selected externaldevices are dynamically generatedby the BPS tools on top of the basepackage. These are then combinedand linked with all necessary hard-ware connections and softwaredevice drivers.

    Algorithm OverviewRadio astronomy typically observesphenomena that happened long

    ago and far away; hence the radio wavesreaching Earth are essentially plain waves.When observing the same phenomenonwith two or more radio telescopes that arephysically separate from each other, eachantenna receives the same wave fronts butat different times, because of the varyingincision angles at each physical antennalocation. The basic idea of correlation-based radio astronomy imaging is toreconstruct the wave front of interest bycorrelating received radio signals at differ-ent locations, and therefore at differentdelay values.

    Simulink

    XSG

    BPS

    EDK

    ISE Software

    FPGA

    Experts

    Algorithm

    Designer

    HDL

    Linux

    BEE2 Hardware

    Figure 3 – BEE2 Platform Studio design abstractions

    Figure 4 – BPS design example

  • For a large number of antennas, the FXcorrelator scheme is typically used becauseof its computational efficiency. Each anten-na signal is first transformed into the fre-quency domain through an FFT (fastFourier transform). Next, the signals of sev-eral antennas are multiplied and accumu-lated (MAC) for each of the frequencychannels independently. This basic correla-tor system is shown in Figure 5. The num-ber of MAC computations is one per inputsample – independent of the total numberof frequency channels. The FFT computa-tion in the FX correlator grows as N log2(N ) with respect to the total number ofantennas, while the MAC calculationsgrow as 1/2 N (N ± 1).

    The first step of the correlation algo-rithm digitizes the analog signals from theantenna, then frequency transforms the dig-ital signals into the frequency domain,which is also referred to as the F-engine.After I/Q digitization at 1 GSPS, a digitaldown conversion (DDC) block is used totune each digital signal input to the band ofinterest, followed by a poly-phase filter bank(PFB) for pre-filtering and FFT. Because thesignal-to-noise ratio (SNR) of the input sig-nal is below unity, the output of the PFB isquantized into lower precision fix-pointnumbers (4-bit real and 4-bit imaginary inthis case) such that both the overall correla-tion computation and network bandwidthutilization are more efficient. Finally, thepacket formatter collects a number of fre-quency data for each bin into a single pack-et, with a time stamp to be transmittedthrough the 10 Gigabit Ethernet interface.

    One of the most efficient X-engine imple-mentation schemes uses a tiled structure tocross-correlate data packets from each anten-na on a linear delay chain, as shown in Figure6. In an N-antenna system, each X-enginecontains exactly N delay elements; each is mdata samples deep, corresponding to the mdata samples in each packet. N/2 number ofMAC units compute the cross-correlations,with one dedicated MAC for auto-correla-tions. A multiplexer on each MAC unitchooses one of its inputs between the firstand last data packet on the delay chain.

    The data packets leave the packetreceive buffer in sequential antenna order,shifted along the delay chain and correlat-ed on the MAC units. The results are shift-ed left to the DRAM controller forlong-term accumulations.

    All 1/2 N (N ± 1) correlations are com-puted in exactly N time steps (each timestep is m clock cycles) for the particularfrequency channel in the data packets.The tile-based X-engine design makes iteasy to partition a single X-engine acrossmultiple physical FPGA chips. Multipleadjacent tiles can be grouped into eachFPGA, and the data stream can simplyflow from one FPGA to another. Whencorrelating dual polarization antennas,each MAC unit computes full Stokesparameters in parallel; hence it requiresfour complex multiplies per clock cycle,or equivalently 16 real multiplies andeight additions. Finally, the four complexterms (eight real terms) are accumulatedm times before shifting out to DRAM forlong-term accumulation.

    ScalabilityTraditional implementation of FX correla-tors use direct wired backplanes and cablesto construct the crossbar connections fromthe F-engine to the X-engine, whichrequires the whole hardware system to beglobally synchronous, limiting the scalabil-ity of the system. In this new implementa-tion, the time stamp on each frequencydata packet effectively decouples theabsolute sample time from the computinghardware clocks, and all X-engine compu-tations can be processed when the relevantpackets have arrived. Therefore, commer-

    Crossbar

    Antenna #1

    Ba

    ck-E

    nd

    Co

    mp

    ute

    r S

    erv

    ersFFT

    Antenna #N FFT

    XMAC (f=0)

    Antenna #2 FFT

    XMAC (f=1)

    XMAC (f=2)

    XMAC (f=3)

    XMAC (f=k)

    Z-m

    MAC

    Z-m

    Z-m

    Z-m

    Z-m

    Z-m

    Z-m

    Z-m

    selsel sel sel

    MAC MAC MAC

    PacketReceiveBuffer

    X-tile #0 X-tile #1 X-tile #2 X-tile #3 X-tile #N/2

    DRAMAccumulator

    MAC

    Reg Reg Reg Reg Reg

    Figure 5 – N-antenna frequency division FX correlator overview

    Figure 6 – Linear tile-based X-engine (simplified view)

    16 Xcell Journal Fourth Quarter 2007

  • Fourth Quarter 2007 Xcell Journal 17

    cial network switches with varying packetlatency can be used as virtual crossbarinstead of hardwired backplanes.

    An LTX implementation can fit a 256-antenna X-engine design in a single BEE2FPGA running at 250 MHz. The totallogic slice utilization including globalcontrol and memory interfaces is kept ataround 80% of the maximum slices toallow sufficient room for high-clock raterouting. A 1,024-antenna X-engine canbe implemented on a single BEE2 boardby chaining the four user FPGA chips,each implementing one-fourth of theLTX tiles. The input buffers need to beimplemented on the center control FPGAusing external DRAM. At 256 KB perdata packets per frequency bin, 4,096 fre-quency bins worth of data can be storedon a single 1 GB DIMM, allowing a max-imum of more than 2 seconds delay vari-ation from F-engines through the 10Gigabit Ethernet switch.

    The 10 Gigabit Ethernet switch band-width grows linearly with the number ofantennas in the system. Because the num-ber of X-engines and F-engines is thesame, the crossbar switch essentiallyredistributes the output data packets fromF-engines frequency bin-wise to each ofthe corresponding X-engines. In a giventime step, F-engine output packets can beindividually transmitted to unique X-engines on a one-to-one basis. Round-robin rotation of frequency bins fromeach antenna ensure against long-termcongestions on the crossbar switch. As allF-engines are intrinsically synchronizedto the same sampling clock used on theADC boards, even temporary packet con-gestions should be rare.

    ConclusionSo far, we have deployed several correlatorsusing this solution, including a 200 MHzbandwidth, 32-anternna correlator at theAllan Telescope Array using four BEE2modules. Several scientists have used a 16-antenna version in Green Bank, WestVirginia, and in other radio astronomyprojects throughout the world.

    Because we designed the whole correla-tor using the BEE Platform Studio envi-

    ronment, the user effort of porting thedesign to future FPGA hardware plat-forms, such as the upcoming BEE3 systemusing Xilinx Virtex-5 FPGAs, is minimizedto simply recompiling the new hardwareplatform from the Simulink design. TheVirtex-5 FPGA can achieve more than fourtimes the compute throughput at half theprice, so rapid design migration is criticalto achieving the goal of reaching a collect-ing area of 1 square kilometer with morethan 8,000 antennas.

    After more than six years of research atUC Berkeley, the BEE2-related softwareand hardware development has beencommercialized via a startup company –BEECube Inc. – to further support awider range of applications, from high-performance DSP to other emergingbioinformatics applications. For moreinformation on the programming envi-ronment and future hardware systemdevelopment, please contact Chen Changat [email protected]

    AcknowledgementsThe BEE2 project’s radio astronomy applica-tion development is in collaboration with [email protected] and Serendip (Search forExtraterrestrial Radio Emissions from NearbyDeveloped Intelligent Populations) projects atthe UC Berkeley Space Science Laboratory(Dan Werthimer, Aaron Parsons, HenryChen), as well as the UC Berkeley RadioAstronomy Laboratory (Melvyn Wright, DaveMacMahon, Matt Dexter, Don Backer).Xilinx has generously donated FPGAs, soft-ware tools, and engineering assistance.

    Many thanks to all the hard work by peerstudents and staff members of the BEE2team: Pierre-Yves Droz, Greg Gibeling,Nan Zhou, Yury Markovskiy, Zohair Hyder,Adam Megacz, Alexander Krasnov, HaydenSo, Kevin Camera, Brian Richards, DanBurke, Ken Lutz, and Susan Mellers. TheBEE2 project is funded by the GSRC andC2S2 Focus centers, part of the FCRP, aSemiconductor Research Corporation pro-gram, National Science Foundation grantnumbers CNS-0551739 and CNS-0403427, and the BWRC and its sponsorcompanies.

  • by Jim BenekeVice President, Global Technical [email protected]

    Have you ever had a great idea for the nextkiller FPGA application but struggled tofind the right prototyping hardware thatwould allow you to prove out your con-cept? You’re not alone. Finding the perfectdevelopment platform is not easy. Boardsare often designed more for demonstra-tion than development, leaving designerswith little flexibility and limited access toFPGA I/O pins.

    With the recent introduction of theXilinx® Spartan™-3A DSP FPGA family,Xilinx has created a unique, low-cost, pro-totype-friendly starter platform to easeyour application development. Throughthe EXP expansion interface included onthe board, you can add application-specificdaughtercards to customize the board’s fea-ture set for your prototype needs.

    In this article, I’ll review the EXP stan-dard, showing you why it’s FPGA-friendlyand able to meet your most demandingexpansion needs. We’ll look at several of theEXP modules currently available and seehow they can be used to easily create video,embedded, and communications processingapplications around the Spartan-3A DSP.You’ll see how the EXP specification, theSpartan-3A DSP Starter Platform, and theEXP add-on modules can combine toquickly and cost-effectively provide youwith a useful prototype system.

    The EXP Expansion StandardThe new Spartan-3A DSP StarterPlatform (www.xilinx.com/s3adspstarter)includes a standard set of features asshown in Figure 1. Parallel flash and SPImemory are provided for configuration,while DDR2 memory is available forhigh-performance mass storage. AGigabit Ethernet PHY and serial portsupport standard communications links,while clocks, switches, LEDs, and somegeneral-purpose user I/O round out theboard interfaces. The remaining FPGAuser I/O pins – 168 in total – route totwo connectors. These connectors areconfigured to meet the EXP expansionslot standard, which was designed specif-ically for FPGA development boards.

    Most industry-standard buses, such asPCI, PMC, PCMCIA, or PC-104, imple-ment an address and data bus structure,which is ideal for processor-based systemsbut limiting for the wide range of FPGA

    applications. Instead, FPGA developmentboards require a more generic, universal I/Ostructure where you can define I/Os as theyare needed. Therefore, the EXP specificationprovides a great deal of flexibility by limitingthe number of fixed signal definitions,allowing a more free-form I/O assignmentas determined by your end application.

    The EXP specification defines a 120-pinconnector, with 84 user I/Os and a mix ofpower and grounds. The standard EXP con-figuration uses two connectors in a full EXPmodule configuration for a total of 168 userI/Os. Half EXP module formats are alsoavailable, using just a single EXP connector.Two half EXP modules can be connected toa full EXP-configured baseboard.

    The Spartan-3A DSP FPGA SelectIO™interface supports many popular single-ended and differential standards. Table 1shows the number of pins available at eachEXP connector, along with a breakdown ofthe single-ended and differential pairs sup-

    18 Xcell Journal Fourth Quarter 2007

    Prototyping Applications with theSpartan-3A DSP Starter PlatformThe new Spartan-3A DSP Starter Platform includes expansion capabilities for low-cost and easy-to-use application development.

  • ported. Each EXP connector can support upto 84 single-ended I/O signals or a mix ofsingle-ended and differential pairs.Providing as many as 24 differential pairsper EXP connector, 48 pairs total, is essen-tial for certain high-bandwidth LVDS inter-faces used in video and communicationsapplications. The Spartan-3A DSP StarterPlatform board provides a user-selectableI/O voltage jumper for each EXP connector.This allows each EXP connector to be con-figured for either 2.5V or 3.3V signaling.

    As you can see, the EXP slot included onthe Spartan-3A DSP Starter Platform opensthe door to a wide range of add-on applica-tion modules and custom user interfaces.Avnet has created a set of off-the-shelf EXPmodules that are interchangeable betweenany EXP-enabled baseboard, including theSpartan-3A DSP Starter Platform. Let’sexplore some of these modules to see howthey customize the starter platform and cre-ate powerful prototype systems that leverageSpartan-3A DSP features.

    Video ApplicationsThe Spartan-3A DSP FPGA is ideal for cost-sensitive DSP algorithmic and co-processingapplications. Video and image processing,especially the video surveillance and videosecurity markets, are good fits for this FPGAfamily. To help jump-start applications inthis area, Avnet has created the Video EXP

    and larger enhanced block RAM enable theDSP-intensive image processing pipe.Functions such as the Bayer filter, color-space conversion, chroma sub-sampling,and MPEG-4 video compression can all beimplemented and run in real time insidethe FPGA. A 32-bit MicroBlaze™ proces-sor helps manage the processing pipe aswell as the Ethernet interface.

    Wireless Communications SystemsThe increasing demand for Internet accesshas lead to the explosive growth of wirelesscommunication systems that provide a con-tinuous Internet connection. WiMAX is onewireless access technology that is gaining inpopularity and has significant market poten-tial. Based on the IEEE 802.16e-2005 stan-dard, WiMAX supports high-speed Internet

    module. The Video EXP offers a full compli-ment of professional/consumer (or “pro-sumer”)-level front-end video functions.With support for DVI, component, com-posite, S-video, image sensor, and VGAinputs, plus DVI, VGA, and LCD panelvideo outputs, the Video EXP addresses abroad range of video processing applications.

    Figure 2 shows an example video overEthernet application that can be built fromthe combined Video EXP and Spartan-3ADSP baseboard. The Spartan-3A DSPSelectIO interface can handle the LVDSinterfaces to the image sensors and LCDflat panel. The 250 MHz DSP48A slices

    Fourth Quarter 2007 Xcell Journal 19

    RS232

    User I/O

    EXP

    Switches

    Spartan-3A DSP

    XC3SD1800A

    Clocks

    10/100/1000

    PHY

    User LEDs

    VGA Port

    DDR2

    (32M x 32)

    System ACE

    Connector

    Parallel Flash

    (16M x 8)

    SPI Flash

    (64 Mb)

    Image Sensor

    Input

    Image

    Sensor

    LCD Panel

    Microphone

    Headset

    Image

    Sensor

    Video EXP

    Module

    Spartan-3 DSP

    Starter Platform Baseboard

    Audio

    Codec

    EMACRAM

    Image Sensor

    Input

    EXP

    Interface

    Image Sensor

    Interface

    MicroBlaze

    MPEG-4

    Encoder

    Image

    Processing

    LCD

    Controller

    Memory

    Controller

    DDR2

    Memory

    Ethernet

    Interface

    Ethernet

    PHY

    Audio

    Interface

    LCD Panel

    Spartan-3 DSP FPGA

    Signal CategoryPins Per Pins Per

    Connector Dual EXP Slot

    Single-Ended I/O 34 68Single-Ended Clocks 2 4Differential I/O Pairs (22) 44 88Differential Clock Input Pair (1) 2 4Differential Clock Output Pair (1) 2 42.5V pins (333 mA per pin) 12 243.3V pins (333 mA per pin) 12 24Grounds 12 24Total 120 240

    Figure 1 – Spartan-3A DSP Starter Platform block diagram

    Table 1 – EXP connector I/O assignments

    Figure 2 – Video over Ethernet application example

  • access using advanced signal processingschemes such as orthogonal frequency-divi-sion multiple access (OFDMA) and multipleinput multiple output (MIMO) technology.The Spartan-3A DSP is a perfect solution forsupporting these high-bandwidth, process-ing-intensive applications that require flexi-bility and fast time to market.

    To support wireless communicationsprototyping, Avnet and Texas Instrumentshave developed two EXP modules thatcomplement the digital IF processingcapabilities of the Spartan-3A DSP StarterPlatform. A High Speed ADC EXP mod-ule and a High Speed DAC EXP modulesupport 12-bit, 500 MSPS analog-to-digital conversion and 16-bit, 1 GSPSdigital-to-analog conversion. As halfmodules, each EXP requires just oneEXP connector; thus any combination oftwo half EXP modules is supported by

    the starter platform baseboard.Figure 3 is an example direct conver-

    sion wireless communications system thatyou can prototype with the Spartan-3ADSP Starter Platform, the High-SpeedDAC EXP and the High-Speed ADCEXP. The enhanced DSP48A slices insidethe Spartan-3A DSP make it ideal forimplementing digital front-end process-ing. Digital up conversion (DUC), com-prising two polyphase interpolating FIRfilters, a CIC filter, and a crest factorreduction block, can be implementedinside the Spartan-3A DSP FPGA. Theoutput drives the high-speed interpolat-ing DACs on the EXP module, which inturn drives the RF power amplifier fortransmission. On the receive side, the IFsignal can be directly sampled with the500 MSPS ADC and passed to the FPGAfor digital down conversion (DDC).

    Embedded Processing PlatformsAlthough the Spartan-3A DSPFPGA family has features thatmake it well suited for high-per-formance DSP applications, it isalso a capable choice for embed-ded processing solutions. Theenhanced on-chip block RAMmakes the Spartan-3A DSP anespecially good fit for the soft-coreMicroBlaze embedded processor.To better address MicroBlazeprocessor-based embedded appli-cations, you can add the Interface

    EXP half module as defined in Figure 4.The Interface EXP supports many of thecommon embedded processing interfacesfound in processor-based systems.Because the Interface EXP is a half-mod-ule format, you still have the option ofadding a second half EXP module. Whencombined with soft cores inside theSpartan-3A DSP FPGA, you have ulti-mate flexibility and control over yourdesign. Xilinx provides IP cores for all ofthe interfaces, including USB, CAN,Ethernet, SPI, I2C, and UART.

    ConclusionThe Xilinx Spartan-3A DSP StarterPlatform is a great way to explore applica-tions around the Spartan-3A DSP FPGAfamily. The EXP slot included on theboard sets this kit apart, enabling cus-tomization that addresses video process-ing, communications, embedded systems,and a wide range of other applications.The available EXP add-on modules allowyou to quickly build up real-world proto-types and turn your concepts into reality.

    Should a specific function not exist in anoff-the-shelf form, you can easily create yourown with a custom-designed EXP card.Avnet also provides a prototype EXPmodule that is great for accessing all ofthe I/O signals on standard .1” headers.

    Avnet will continue to introduce newEXP modules that will further the reachand capability of this starter platform aswell as other EXP-enabled baseboards. Fora complete listing of EXP modules and theEXP specification (available for download-ing), visit www.em.avnet.com/exp.

    20 Xcell Journal Fourth Quarter 2007

    Power

    Amp

    EXP RF

    Receiver

    Interpolating

    FIR FiltersCIC

    Mixer

    DDS

    Filter

    Spartan-3 DSP Starter Platform High-Speed ADC EXP Module

    High-Speed DAC EXP Module

    Crest

    Factor

    Reduction

    DAC

    I/F

    ADC

    I/F

    FractionalRe-sampler

    Spartan-3 DSP

    FPGA

    Decimating

    FIR Filters

    Baseband

    Processing

    User Clock(SMA)

    RS232/485(Selectable)

    10/100Ethernet PHY

    USB PHY(ULPI)

    CANTransceiver

    CANTransceiver

    User I/O(8)

    1/2 EXP

    Slot

    DIPSwitches

    RotaryEncoder

    LED

    I2C Port

    SPI Port

    Figure 3 – Example wireless communication system

    Figure 4 – Interface EXP module block diagram

  • by Mark Littlefield Product Marketing ManagerCurtiss-Wright Controls Embedded Computing [email protected]

    Manuel UhmSenior Marketing Manager, DSPXilinx, [email protected]

    Over the past decade, the concept of “tech-nology insertion” (the incrementalupgrade of key technology components ina deployed or soon-to-be deployed system)has become a sort of mantra for the aero-space and defense (A&D) community,providing fielded systems with the latest,most advanced technology and the leastdelay. The rapid pace of technologicalinnovation – and subsequent obsolescence– has made technology insertion criticalfor the success of multiyear A&D develop-ment, test, and deployment projects.

    Unmanned aerial vehicles (UAVs),radar, and signals intelligence (SIGINT)are examples of sophisticated platformsthat have benefited from technologyinsertion. To fully understand the chal-

    lenges of technology insertion, includingcomplexity and cost, it’s useful to con-sider what is involved in successfullyusing this approach for upgrading legacysystems. Using FPGAs in a reconfig-urable computing application provides agood example of the benefits and chal-lenges entailed in making technologyinsertion work. This example alsoenables us to examine some commonissues associated with technology inser-tion, how COTS vendors can best helptheir customers address those problems,and how the overall technology insertionproblem can be significantly eased byproper planning.

    Using FPGAs in Technology InsertionThe basic problem domains involved intechnology insertion can be roughly cate-gorized as:

    • How the equipment physically connects (hardware)

    • How the operating environment, sys-tem libraries and utilities, drivers, andmiddleware provide an infrastructurefor applications (system software)

    Using FPGA technology in an embed-ded multicomputing system provides aninteresting case example because it strad-dles both the hardware and system softwaredomains. FPGAs fall into the hardwaredomain because they are physically inte-grated to hardware elements; developersmust work very closely with these elementswhen designing an application.

    FPGAs present a software challenge aswell, because they must be programmedand often incorporate vendor or third-party supplied blocks or IP. In addition, ageneral-purpose processor using systemlibrary calls typically configures theFPGAs and issues the commands they useto communicate with other processorsand devices in the system. Such systemsare often complex and heterogeneous, asshown in Figure 1.

    For technology insertion, the ideal sce-nario is a plug replacement module thatrequires no hardware or software changes.Although this is uncommon for FPGA-based computing products, this scenariocan also often be undesirable, as state-of-the-art computing platforms have new,more advanced features that system inte-

    Fourth Quarter 2007 Xcell Journal 21

    Future-Proofing Military Applications Using FPGAsFuture-Proofing Military Applications Using FPGAsTechnology insertion using COTS FPGA-basedproducts promises to deliver significant benefits but requires adequate planning.

    Technology insertion using COTS FPGA-basedproducts promises to deliver significant benefits but requires adequate planning.

    This article first appeared in Military Embedded Systemsmagazine, July 2007 (www.mil-embedded.com). Copyright OpenSystems Publishing. Used with permission.

  • grators can use to their advantage. Thus,the goal for system integrators must be tomaximize the benefits of technology inser-tion while minimizing impact on the exist-ing system. The degree of flexibility in thehardware and software domains can bedirectly affected by early design choices andproducts offered by COTS vendors.

    The simplest approach for a COTSvendor to ease technology insertion hard-ware problems is to follow a commonhardware model from platform generationto platform generation and to minimizeconnector pin changes. Industry boardand module standards such as VME,VPX-REDI (VITA 46/48), PMC (VITA32), and XMC (VITA 42) address a largepart of this problem by specifying fixedform factors and pin definitions for boardI/O such as buses or switched fabric inter-connects. Vendors can extend this modelby maintaining pin footprints acrossproduct generations for such commoninterfaces as serial ports and Ethernet. Thesame also holds true for FPGAs. A com-mon, or at least similar, I/O footprintminimizes the need for radical redesignsduring a technology insertion project.

    The Importance of SoftwareAlthough FPGA development is very tight-ly linked to the target component and plat-form hardware design, there is a lack of

    project is the integration of the FPGA-basedapplication into a larger multicomputer sys-tem. General-purpose computing elementsare often closely tied to the system’s FPGAsby performing various command and con-trol functions such as DMA engine control.Subsequently, the ease of technology inser-tion can be directly affected by how well avendor can maintain APIs from one productgeneration to the next, which in turn isoften linked to the level of abstraction in theAPI. The more abstraction, the less likely itis that the API will change between productgenerations. One of the key tasks of anexternal processor is the command and con-trol of data movement, both within theFPGA and between the FPGA and othergeneral-purpose processing nodes.

    Making Technology Insertion RealSeveral COTS vendors offer commonhardware strategies to aid technology inser-tion. One example is Curtiss-Wright’s latestgeneration of VPX-based board products.The three primary computing platforms inthis product line – a dual 8641-based SBC,a quad 8641-based DSP engine, and thedual Xilinx® Virtex™-5 FPGA-basedCHAMP-FX2 board – all share a set ofcommon hardware design elements andwere specifically designed with commonpin footprints for common I/O elements.(See Figures 2 and 3.) For instance, the

    standardized or industry-accepted toolsand frameworks to abstract this linkage.The result is that software can typicallyhave an even greater effect on a technologyinsertion program than hardware.

    The software challenges can be some-what mitigated on general-purpose proces-sors by using off-the-shelf operating systemssuch as VxWorks or Linux, supported withfull-featured BSPs, communications mid-dleware, and application frameworks. ForCOTS vendors of FPGA products, thechallenge is to provide development tools tointegrators to ease technology insertionwhile maintaining the performancedemanded by application developers.

    A common approach is for COTS ven-dors to develop a standard “wrapper” orgasket that essentially represents the staticinfrastructure, such as interfaces to ADCs,memories, Ethernet, and so on. The blocksand infrastructure should be designed tosupport a set of commonly implementeduse cases in the most efficient manner pos-sible so as to minimize the size of the wrap-per. Ideally, such an infrastructure wouldrepresent only 5 to 10 percent of the totaldie size of the FPGA.

    Software SupportNo less important than the ease of integrat-ing existing application code into a newFPGA platform for a technology insertion

    ReconfigurableComputing

    Multi-PurposeComputing SBCs System I/O

    I/O Solutions

    Ethernet Switching

    PPCNode

    SRIO

    SRIO

    • Signal processing• Classification

    • Sensor data processing• Filters, DDC, FFT, etc.

    • System control• Command/control bus• Human machine interface

    • High-speed interfaces• Sensor interfaces• ADC, DAC

    • GbE• VITA 66.20

    High-speed Sensor I/O

    PCIe

    PCIePCIe/PCI PCIe/PCI

    GbE GbE

    MUX

    MIL-STD-1553B

    Graphics

    Analog I/O

    FPDP

    PPCNode

    PPCNode

    PPCNode

    PPCNodeFPGANode

    FPGANode

    PPCNode

    Serial RapidIO over VPX

    PCI Express over VPX

    GbE over VPX

    Figure 1 – High-performance embedded signal and image processing systems are often made up of numerous heterogeneous computer and I/O components connected by multiple communications fabrics.

    22 Xcell Journal Fourth Quarter 2007

  • Fourth Quarter 2007 Xcell Journal 23

    FPGA board not only shares thePowerPC node design elements with thetwo other VPX boards, but closely resem-bles its preceding VME version in its I/Oand memory configuration.

    The design of the VPX FPGA boardand its VME predecessor took a balancedapproach between the twin goals of maxi-mum technology insertion value and mini-mum insertion effort. Curtiss-Wright’sContinuum FXtools design kit provides ahighly optimized set of IP blocks focusedon peripheral I/O and memories, with onlya lightweight, scalable switching block anda few additional utility blocks to ease appli-cation integration. By providing a minimalbut highly optimized infrastructure,FXtools abstracts those common I/O andmemory objects without sacrificing per-formance. Designing with these IP blocksand adopting basic use cases can help easefuture technology insertion.

    To ease both the integration and technol-ogy insertion of FPGA-based computingelements, the board’s Continuum IPC com-munications middleware was extended todirectly control DMA-based data-transfersinvolving the FPGA node. The IPC soft-ware enables application developers to cre-ate named buffer and data transfer objectsthat are globally visible to the system.Thus, any processor in the IPC system cancreate named buffers, attach to already cre-ated named buffers, and create data trans-fer objects between buffers without havingto resort to code-manipulating, complexmemory map translations or DMA com-mand packet creation.

    Avoiding Technology Insertion HeadachesAlthough hardware and system softwarecommonality across product generations isan important element in planning a suc-cessful technology insertion project, theapplication developer and system integra-tor can themselves play an important rolein defining how difficult a future technolo-gy insertion project may be. For example, ifthe application developer bypasses vendoror OS APIs to directly manipulate hard-ware, the resulting code will be significant-ly less portable to future hardwareplatforms. However, application developersor system integrators can structure theirapplication/system in a number of ways tominimize technology insertion headaches.

    The first rule of designing for latertechnology insertion should be, “Don’tfight the system software or OS.” Most

    software frameworks, such as a BSP, utili-ty library, or communications middlewarelike Continuum IPC are designed withone or more basic use cases or design pat-terns in mind. One of the easiest ways fordevelopers to introduce problems in alater technology insertion project is tobend the software framework to matchtheir favorite design pattern. Doing so vir-tually ensures that later developers willhave to bend things in a different waysimply to make it work.

    The same holds true for FPGA IP devel-opers. Although implementation detailsand interfaces may shift from one productgeneration to the next, the basic design pat-terns usually do not. Using vendor-suppliedIP blocks and data flows in the manner thatthe vendor originally intended helps mini-mize the amount of recoding needed for atechnology insertion project.

    Another beneficial technique that can beemployed both for software and FPGA IP isto analyze the vendor-supplied APIs andblock interfaces, identifying those interfacesthat are likely to shift in future productsand creating an abstraction layer or “shim”between the interfaces and the applicationcode. A helpful clue is that if an API orblock interface closely mirrors the underly-ing hardware, it is likely to shift betweenproduct generations.

    Although such layers can add unnecessarycode and performance delays to a system, theperformance impact is often minimal, whilethe benefits for a later technology insertionproject can be enormous. Changing a shim ismuch easier than changing multiple instancesof an API call or IP block instantiation. It’simportant not to overuse this approach, how-ever, as adding shims or abstraction layerscomplicates designs and can sometimes makedebugging more difficult.

    ConclusionTechnology insertion can live up to itspromise of future-proofing, but only if it isadequately planned for up front. MostCOTS vendors have developed successivegenerations of products with this in mind.To capture this value, system integratorsneed to work closely with such vendorsearly in the design cycle.

    Figure 2 – High-performance FPGA families, such as the Xilinx Virtex-5 family, have common elements for logic, DSP, memory, and I/O, with a

    common package strategy enabling technology insertion at the component level.

    Figure 3 – Curtiss-Wright Controls CHAMP-FX2 Virtex-5-based VPX computing module

  • by Tim VanevenhovenDSP Tools Product ManagerXilinx, [email protected]

    DSP systems are best described byusing a combination of both graphical-and language-based methods. TheMathWorks, an industry leader in DSPmodeling software, caters to thisdichotomy by providing a cycle-accu-rate graphical design environmentcalled Simulink and a mathematicalmodeling language called MATLAB.

    Simulink is well suited for the “sys-tem” aspects of DSP design, includingcontrol and synchronization of dataflow to and from interfaces and memo-ries. Simulink also provides a rich set ofpre-defined DSP algorithms in the formof block sets that you can use to con-struct DSP systems. Simulink is notalways the most effective environmentfor modeling proprietary algorithms,however. It unnecessarily burdensdesigners with cycle-accurate considera-tions and forces low-level arithmeticoperations and array accesses to be con-structed from graphical block sets ratherthan concise textual expressions.

    Some DSP algorithm developershave found that the MATLAB languagebest meets their preferred style of devel-opment. With more than 1,000 built-in functions as well as toolboxextensions for signal processing, com-munications, and wavelet processing,MATLAB offers a rich and easy-to-useenvironment for the development anddebugging of sophisticated algorithms.

    24 Xcell Journal Fourth Quarter 2007

    Using MATLAB to Create IP for System Generator for DSP You can use MATLAB M-files with the AccelChip synthesis tool to optimize FPGA implementations.

  • Simulink unifies both modeling environ-ments with an embedded MATLAB block,which allows MATLAB models to simulatewithin Simulink and compile into C codethrough Real-Time Workshop for processor-based DSP hardware implementations.

    Xilinx® System Generator for DSP iswell established as a productive tool for cre-ating DSP designs in FPGAs. With a graph-ical environment based on Simulink and apre-defined block set of Xilinx DSP cores,System Generator for DSP meets the needsof both system architects who need to inte-grate the components of a complete designand hardware designers who need to opti-mize implementations. System Generatorfor DSP, however, lacks support for a designflow based on MATLAB.

    The Xilinx AccelDSP™ synthesis toolwas developed specifically for algorithmdevelopers and DSP architects who haveembraced language-based DSP algorithmmodeling. With the AccelDSP synthesistool, algorithm developers can use theirfloating-point MATLAB M-files to per-form stimulus creation, algorithm evalua-tion, and results post-processing.

    DSP Hardware SystemsSystem Generator for DSP is well suited formodeling the DSP system, which compris-es not only the core DSP algorithm butsynchronous interfaces to external buses,memory read/write accesses, system datasynchronization, and overall system con-trol. System Generator for DSP providescontrol-oriented blocks such as theMicroBlaze™ processor as well as individ-

    mented at a later point in time as Simulinkor System Generator for DSP diagrams.Calculating the fast Fourier transform(FFT) of a 4 x 1,024 matrix of data is pos-sible with a simple MATLAB statement,without concern for radix, scaling, buffer-ing, or synchronization of valid signals, asshown here:

    y = fft(data,1024);

    Second, when modeling an algorithm, agiven set of outputs corresponds to a givenset of inputs; therefore, synchronizationissues do not need to be addressed in thegenerated hardware. This makes an algo-rithm inherently “schedule-able” through atool like the AccelDSP synthesis tool.Hardware requirements may dictate theneed to use multiple clock cycles to com-pute an output, as in the case of a resource-shared MAC FIR filter, but this operationlends itself nicely to the AccelDSP synthe-sis tool’s automated flow. The introductionof a simple hardware handshaking interfaceenables easy integration into the overall sys-tem, as shown in Figure 2.

    ual register, delay, and memory blocks forimplementing the synchronous compo-nents of a DSP system (Figure 1).

    Custom DSP AlgorithmsThe heart of any DSP system is the algo-rithm. Algorithms distinguish themselvesfrom systems in that a resulting output is afunction of a given set of inputs without asense of clock or hardware. This isdescribed by the simple equation:

    y = f(x)

    You can execute an algorithm defined inMATLAB on an FPGA, DSP processor, orsoftware; each will have a different sense ofcycle accuracy.

    The unique characteristics of an algo-rithm offer two distinct advantages. First,algorithm developers are completely unen-cumbered by hardware implementationdetails and can focus solely on functionalbehavior. This is exactly why an estimated90% of the algorithms used today in DSPoriginate as MATLAB models, even when adesign flow dictates that they be re-imple-

    Fourth Quarter 2007 Xcell Journal 25

    z-16

    addrdatawe

    xn yn2 tap

    Inputs

    Outputs

    CurrentState

    b

    aa+b fpt dbl

    dblsimin

    FromWorkspace

    RandomNumber

    Sine Wave Gateway In2

    Gateway In1

    Gateway In

    Moore State Machine

    Single Port RAM FIR

    Add SubGateway Out Scope

    Delay

    fpt

    dbl fpt

    dbl fpt

    Data In

    Clock

    Output data available to downstream system

    New Data Valid

    Data In

    Done

    Data Out

    Data Out

    DoneNew Data Valid

    Clock

    Figure 1 – Xilinx System Generator diagram showing system control and synchronization logic

    Figure 2 – AccelDSP handshaking interface

  • Using the AccelDSP and System Generator Tools TogetherThe AcceDSP synthesis tool enablesSystem Generator for DSP to support bothDSP system and algorithm modelingmethods by generating System GeneratorIP blocks based on floating-point MAT-LAB models. This results in a design flowfor FPGAs that is similar to the functional-ity obtained through the use of the embed-ded MATLAB block (Figure 3). You cancapture the system aspects of the designusing the Xilinx DSP block set and capturethe algorithm using floating-point MAT-LAB. The System Generator IP blocks cre-ated by the AccelDSP synthesis tool arefixed-point and cycle-accurate.

    Kalman Filter ExampleLet’s take an advanced algorithm written inMATLAB, use the AccelDSP synthesis toolto synthesize the design, and then integratethe algorithm into a System Generatormodel. A Kalman filter is a special class ofrecursive, adaptive filters that is well suitedto combining multiple noisy signals into aclearer signal. Kalman filters start with amathematical model of an object – such asa commercial aircraft being tracked byground-based radar – and use the model topredict future behavior. The filter then usesmeasured signals (such as the signature ofthe aircraft returned to the radar receiver)to periodically correct the prediction.

    The following is a MATLAB M-file fora Kalman filter. The algorithm defines

    matrices R and I that describe the statisticsof the measured signal and the predictedbehavior. The last nine lines of the algo-rithm are the code that predicts forwardand the code that corrects itself.

    function [S] = simple_kalman(A)

    DIM = size(A,2);persistent p P_capif isempty(P_cap)

    P_cap = [8 0 0; 0 8 0; 0 0 8];p = ones(DIM,1)/2;

    end;

    I = eye(DIM);R = [128 0 0;0 128 0; 0 0 128];

    % estimate step:P_cap_est = P_cap+I;

    % correction step:K = P_cap_est * inv(P_cap_est+R);p = p + K * ( A’ – p );P_cap = (I – K)*P_cap_est;S = p’;

    Common operators like addition andsubtraction operate on arrays like A orP_cap without the need to write loops,which would be required in languages likeC. Two-dimensional arrays are automati-cally multiplied as matrices without anyspecial annotation.

    MATLAB operators such as matrixtransposition allow the MATLAB code tobe compact and easily readable. And com-

    plex operations like matrix inversion areperformed using MATLAB’s extensive lin-ear algebra capabilities. Although such analgorithm can be constructed as a block dia-gram, doing so will obscure the algorithmstructure so readily apparent in MATLAB.

    With the AccelDSP synthesis tool, youcan take complex MATLAB toolbox andbuilt-in functions (such as the matrixinverse used in the Kalman filter example)directly to hardware using the AccelWareIP tool kits. These tool kits offer multiplematrix inverse methods. Core selection willdepend on the size, structure, and values ofthe matrix. Returning to the Kalman filterexample, the most suitable approach is touse the AccelWare QR matrix inverse core.AccelWare cores are generated based onMATLAB syntax and are available in a vari-ety of implementations that let users opti-mize designs for speed, area, power, ornoise. This small code edit would berequired to use the AccelWare function:

    % K = P_cap_est * inv(P_cap_est+R);

    K = P_cap_est * accel_qr_inverse (P_cap_est+R);

    Synthesizing MATLAB with the AccelDSP Synthesis ToolWith the AccelDSP synthesis tool, you canperform a floating-point simulation toestablish a baseline. The design is then con-verted into a fixed-point representation sothat the fixed-point effects can be simulat-ed. An array of features helps analyze theseeffects and fixed-point design options likesaturation and rounding modes. Bit growthis automatically propagated through thedesign in a manner that allows for user-controlled overrides. This algorithmicdesign exploration process helps you obtainthe ideal quantization that minimizes bitwidths while managing overflows/under-flows, allowing early trade-offs of siliconarea versus performance metrics.

    26 Xcell Journal Fourth Quarter 2007

    Simulink

    Memory

    ProcessorInterface

    y = f(x)

    ControlLogic

    InterfaceLogic

    MATLABSimulink

    Figure 3 – DSP system block diagram

    MATLAB operators such as matrix transposition allow the MATLAB code to be compact and easily readable. And complex operations like matrix

    inversion are done using MATLAB’s extensive linear algebra capabilities.

  • Fourth Quarter 2007 Xcell Journal 27

    Once suitable quantizations have beendetermined, the next step with theAccelDSP synthesis tool is to generate RTLfor the target Xilinx device. You can dictatehardware intent through the use of the syn-thesis directives listed in Table 1.

    Using these directives constitutes hard-ware-based design exploration that allowsthe design team to further improve qualityof results. In synthesizing the RTL, theAccelDSP synthesis tool evaluates andschedules the entire algorithm, performingboundary optimization when possible.

    Throughout this flow, the AccelDSPtool maintains a uniform verification envi-ronment through the use of a self-checkingtest bench: the input/output vectors thatwere generated when verifying the fixed-point MATLAB design are used to verify

    the generated RTL. The AccelDSP synthe-sis tool will also report the throughput andlatency of the Kalman filter, which is essen-tial to gauge whether the design meetsspecifications and to produce a cycle-accu-rate System Generator model.

    Generating a System Generator ModelUpon successful completion of RTL veri-fication, the design is now ready to gener-

    ate a System Generator model by clickingon the “Generate System Generator” icon,as shown in Figure 4. The AccelDSP toolgenerates a System Generator IP blockthat supports both simulation and RTLcode generation.

    At this point, the design flow transitionsto System Generator for DSP, where thenew block for the Kalman filter is availablein the Simulink library browser. You onlyneed to select the Kalman filter block anddrag it into the destination model to incor-porate the AccelDSP-generated Kalman fil-ter into a System Generator design,illustrated in Figure 5.

    ConclusionCustom DSP algorithms are best modeledmathematically using MATLAB, while com-plete systems are best modeled as cycle-accu-rate using Simulink. The marriage of thesetwo modeling domains provides an efficientmeans to design DSP systems into FPGAs byallowing the use of the rich MATLAB lan-guage, including the built-in and toolboxfunctions, to create System Generator IPblocks of complex DSP algorithms.

    DSP Synthesis Directive Effect on Generated Hardware

    Rolling/Unrolling of For Loops Improves input sampling rate by reducing throughput

    Expansion of Vector and Matrix Improves input sampling rate by reducing throughputAdditions and Multiplications

    RAM/ROM Memory Mapping of Arrays Improves FPGA utilization by mapping arrays into dedicated Xilinx block RAM resources

    Pipeline Insertion Improves input sampling rate by improving clock frequency performance

    Shift Register Mapping Improves FPGA utilization by mapping shift register logic into SRL16s

    • See an online demo of the System Generator and AccelDSP tools.• Evaluate or purchase:

    • The AccelDSP synthesis tool.• System Generator for DSP.

    TAKE THE NEXT STEP (Digital Edition: www.xcellpublications.com/subscribe/)

    Figure 4 – “Generate System Generator” command in the AccelDSP synthesis tool

    Figure 5 – Kalman filter added to a System Generator design

    Table 1 – Synthesis directives to control the generated hardware

  • by Tom HillSystem Generator Product ManagerXilinx, [email protected]

    Developing DSP algorithms for siliconrequires careful selection of IP blocks andtheir macro-architecture specifications inthe context of the target application. Forexample, three methods are commonlyused to implement basic trigonometricfunctions in hardware:

    1. Bipartite tables – a good choice wheninput values require relatively smallword lengths.

    2. Linearly interpolated look-up tables(LUTs) – best for slightly larger inputword lengths.

    3. CORDIC – ideal for applications thatrequire large word lengths when per-formance is not an issue. If perform-ance is important, you can implementthe CORDIC algorithm using paral-lelism and pipeline stages.

    The graph in Figure 1 compares the areaof these methods over a range of inputword lengths.

    Knowing how to build and when touse each of these cores is a complex task,but necessary to achieve an optimal hard-ware solution.

    Introducing IP-ExplorerThe Xilinx® AccelDSP™ synthesis toolwith IP-Explorer technology eliminates thetrial-and-error process when using IP blocksby allowing the tool to select from various

    macro-architectures. The AccelWare™ DSPIP tool kit provides a set of IP generators thatproduce architecture-specific, synthesizableMATLAB for common built-in functionssuch as sine, cosine, log, and divide. Thesehardware architectures, developed by a teamof DSP hardware design experts, are charac-terized by area and performance for eachsupported FPGA technology using morethan 6,000 designs.

    28 Xcell Journal Fourth Quarter 2007

    Automatic IP Block Selection with IP-Explorer Technology

    You can use the AccelDSP synthesis tool to select from various macro-architectures

    based on your system requirements.

    Bipartite Tables

    LIL is best choice for

    "medium" size inputs –

    assuming the achievable

    throughput is acceptable

    (and assuming multipliers

    are available).

    BT - No Block ROM

    CORDIC

    LIL

    RS-CORDIC

    BT are always the best

    choice for -12 bits and

    below, but size grows

    faster than the other

    architectures.

    Heuristic Core Inference

    Eq

    uiv

    ale

    nt

    Ga

    tes

    IBITS

    107

    106

    105

    104

    103

    102

    2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

    Figure 1 – Area versus input word length

  • During the DSP synthesis process, IP-Explorer analyzes each use of these functionsagainst the overall system area and perform-ance requirements to determine the mostoptimal hardware solution. Once deter-mined, the tool automatically inserts theappropriate IP core into the design (Figure 2).

    Designing at a Higher LevelIP-Explorer raises the abstraction level ofDSP hardware design significantly closer tothat of DSP algorithm development. Basic in Figure 4: a Euler-to-quaternion angle

    conversion algorithm that performs a sineand cosine operation on three inputangles. In this example, each input has aword length of 12 bits.

    Let’s compare a typical approach –designing a single implementation of thetrigonometric function based on theCORDIC method and instantiated multi-ple times – versus IP-Explorer. Table 1shows the area and performance results forboth approaches.

    IP-Explorer selects a bipartite tableapproach because the input bit widths arereasonably small. A secondary benefit isthat this approach leverages the built-inDSP blocks of the targeted FPGA device.This would not have been the optimalchoice for an ASIC. Not only has the taskof designing hardware for the sine andcosine functions been eliminated, but thisapproach offers a 33% area savings and a12x performance advantage.

    ConclusionIP-Explorer represents a truly uniqueadvancement in the development of DSPhardware. By providing a seamless path tohardware for key DSP building blocks,hardware design has moved significantlycloser to the abstraction level enjoyed byalgorithm developers using MATLAB.

    MATLAB language primitives form thefoundation of this abstraction pyramid(Figure 3) by providing the ability to modelany function or algorithm at a low level. IP-Explorer elevates this abstraction level sig-nificantly by targeting MATLAB’s standardlibrary of pre-configured DSP buildingblocks and functions directly into hardware.

    Design ExampleTo illustrate the effect that IP-Explorer hason a design, consider the example shown

    Fourth Quarter 2007 Xcell Journal 29

    Sine/Cosine Architecture Area (LUTs) Performance

    Six COR