Issue 72 Third Quarter 2010 Xcell journal Xcell journal SOLUTIONS FOR A PROGRAMMABLE WORLD SOLUTIONS FOR A PROGRAMMABLE WORLD INSIDE Xilinx Rad-Hard FPGA Reaches for the Stars Biometrics App Rides Reconfigurable Hardware How to Maintain Repeatable Results in Xilinx FPGA Designs Timing Constraints Tutorial ISE Design Suite 12.2 Tips 4th-Generation Partial Reconfiguration INSIDE Xilinx Rad-Hard FPGA Reaches for the Stars Biometrics App Rides Reconfigurable Hardware How to Maintain Repeatable Results in Xilinx FPGA Designs Timing Constraints Tutorial ISE Design Suite 12.2 Tips 4th-Generation Partial Reconfiguration www.xilinx.com/xcell/ Xilinx 7 Series: FPGAs Make Play for Logic IC Dominance Xilinx 7 Series: FPGAs Make Play for Logic IC Dominance

Xcell Journal issue 72

Embed Size (px)


Xcell Journal is the premier magazine for those interested in Field Programmable Gata Arrays (FPGAs) and FPGA-based innovation in electronics.

Citation preview

Page 1: Xcell Journal issue 72

Issue 72Third Quarter 2010

Xcell journalXcell journalS O L U T I O N S F O R A P R O G R A M M A B L E W O R L DS O L U T I O N S F O R A P R O G R A M M A B L E W O R L D


Xilinx Rad-Hard FPGA Reaches for the Stars

Biometrics App RidesReconfigurable Hardware

How to Maintain RepeatableResults in Xilinx FPGA Designs

Timing Constraints Tutorial

ISE Design Suite 12.2 Tips 4th-Generation Partial Reconfiguration


Xilinx Rad-Hard FPGA Reaches for the Stars

Biometrics App RidesReconfigurable Hardware

How to Maintain RepeatableResults in Xilinx FPGA Designs

Timing Constraints Tutorial

ISE Design Suite 12.2 Tips 4th-Generation Partial Reconfiguration


Xilinx 7 Series: FPGAs Make Play for Logic IC Dominance

Xilinx 7 Series: FPGAs Make Play for Logic IC Dominance

Page 2: Xcell Journal issue 72

Learn more about the new Spartan-6 and Virtex-6 FPGA baseboards and FMC modules designed by Avnet at www.em.avnet.com/drc

Avnet Electronics Marketing introduces three new development kits

based on the Xilinx Targeted Design Platform (TDP) methodology.

Designers now have access to the silicon, software tools and

reference designs needed to quickly ramp up new designs. This

approach accelerates time-to-market and allows you to focus on

creating truly differentiated products.

Critical to the TDP methodology is the FPGA Mezzanine Card

(FMC) from the VITA standards body. Avnet has collaborated with

several industry-leading semiconductor manufacturers to create a

host of FMC modules that add functionality and interfaces to the

new baseboards, allowing for easy customization to meet design-

specific requirements.

©Avnet, Inc. 2010. All rights reserved. AVNET is a registered trademark of Avnet, Inc.

New baseboards for Spartan®-6and Virtex®-6 FPGAs» Spartan-6 LX16 Evaluation Kit» Spartan-6 LX150T Development Kit» Virtex-6 LX130T Development Kit

New FMC Modules for Baseboards

» Dual Image Sensor FMC


» Industrial Ethernet FMC

More are soon to be released!

Development kits help ramp up new Spartan®-6 or Virtex®-6 FPGA designs

1 800 332 8638www.em.avnet.com

Page 3: Xcell Journal issue 72


Xilinx, Inc.2100 Logic DriveSan Jose, CA 95124-3400Phone: 408-559-7778FAX: 408-879-4780www.xilinx.com/xcell/

© 2010 Xilinx, Inc. All rights reserved. XILINX, the Xilinx Logo, and other designated brands includedherein are trademarks of Xilinx, Inc. All other trade-marks are the property of their respective owners.

The articles, information, and other materials includedin this issue are provided solely for the convenience ofour readers. Xilinx makes no warranties, express,implied, statutory, or otherwise, and accepts no liabilitywith respect to any such articles, information, or othermaterials or their use, and any use thereof is solely atthe risk of the user. Any person or entity using suchinformation in any way releases and waives any claim itmight have against Xilinx for any loss, damage, orexpense caused thereby.

PUBLISHER Mike [email protected]

EDITOR Jacqueline Damian


DESIGN/PRODUCTION Teie, Gelwicks & Associates1-800-493-5551

ADVERTISING SALES Dan [email protected]

INTERNATIONAL Melissa Zhang, Asia [email protected]

Christelle Moraga, Europe/Middle East/[email protected]

Miyuki Takegoshi, [email protected]

REPRINT ORDERS 1-800-493-5551

Xcell journal


Teens to Technologists: Thanks for a Great Childhood’ve seen many things over my years covering the electronics industry, but at the DesignAutomation Conference this past June, I witnessed a first: a group of EEs and EDA folks, eyesabrim with tears of pride. What could cause such a reaction in what would otherwise seem a

prosaic event? No, it wasn’t a leak from one of the foundry demos, nor an announcement that allEDA tools will henceforth be government subsidized. In fact, the moment came in the closing sec-onds of an event called “High School Panel: You Don’t Know Jack,” when four very bright teen pan-elists looked over the crowd and then did the unexpected: thanked them. “Thank all of you for thechips and technologies you create—thank you for making my childhood so great,” said one of them,without a hint of irony. The gesture prompted engineers from Broadcom, Qualcom, Nvidia,Cadence and Synopsys alike…and yours truly…to all tear up.

“High School Panel: You Don’t Know Jack” is becoming a regular highlight of the DesignAutomation Conference’s Pavilion Panel series. As in years past, this year’s panel featured JasperDesign Automation CEO Kathryn Kranen interviewing four teenagers (two girls and two boys)about their technology usage, what products are in, what products are out and what features theywould like to see in future gadgets. The panel is meant to give attendees a glimpse into the tech-nology usage of this finicky yet vitally important set of consumers and purchasing influencers. Thisyear’s foursome was exceptionally impressive and surprisingly gracious. Over years of exposure totechnology and social media, these kids have become master multitaskers while still maintainingstellar GPAs (three are off to prominent colleges, while the fourth has one year of high school left).

If you are the parent of a teen, you probably won’t find it too shocking that all four panelistsdescribed how, from the moment school lets out, they immediately connect to the Internet, mostlyvia laptops. “I have to stay connected,” said one boy. A fellow panelist boots her laptop and down-loads the photos she took that day from her cell phone or camera. She opens the photo files in AdobePhotoshop to airbrush any skin blemishes and then downloads those modified pictures onto herFacebook page. All four panelists said they have tens of photo albums on Facebook and have friendswith hundreds. Increasingly, they are adding video to their Facebook profiles or launching their ownYouTube channels. Facebook is the hub of their social lives because it allows them, as one panelistsummarized, “to see what my friends are doing and what outings I was not invited to attend.”

The kids gave Facebook, Twitter, YouTube and Hulu big thumbs-up, while giving thumbs-downto the once popular MySpace, which panelists said has turned into a site to merely hear band demos.Panelists liked the iPhone, even though none of them own one because of the relatively high price ofthe phone and the access plan. But they did not like the iPad—“it’s just a big iPod Touch that I can’tfit into my pocket,” said one panelist. Panelists also said they prefer laptops over desktops but notethat desktops are more reliable and are upgradable, which is good for power gaming. They had mixedfeelings about TV, indicating they almost never watch TV on the flat-screen anymore but insteadcatch their favorite shows at a time of their choosing on the Web, typically via Hulu or YouTube.

The technology improvements these young people would most like to see are largely in line withthe top IC and system design challenges of the day. Longer battery life topped the list. A close sec-ond was devices and applications that better facilitate multitasking. “I have six different IM/chatsthat I use regularly and it’s hard to use all those at once,” said one panelist. “I’d like to have them allin one place.”

What all these data points mean, I’ll leave you to interpret. But certainly one thing is clear. Thetechnology you create is having a remarkable effect on our youth and, seemingly, the future they willmarch into. And if these DAC panelists are any indication, it’s a future that we can all be proud of.


Mike SantariniPublisher

Page 4: Xcell Journal issue 72




Letter from the Publisher Teen Panelists at DAC Say Thanks for the Memories…3

Xcellence in A&DNew Xilinx Rad-Hard FPGA Reaches for the Stars…12

Xcellence in Automotive Multiple MicroBlazes Ease Integration in Real-Time System…18

Xcellence in ISM Biometrics May Be Killer App for Dynamic Partial Reconfiguration…24

Xcellence in CommunicationsA Better Crypto Engine, the Programmable Way…32

Xcellence in Wireless CommunicationsLTE Simulator Rides Xilinx Virtex-5 FPGAs…36

Cover StoryXilinx Redefines State of the Art with New 7 Series FPGAs



Page 5: Xcell Journal issue 72

T H I R D Q U A R T E R 2 0 1 0 , I S S U E 7 2

Xperts Corner Maintaining Repeatable Results in Xilinx FPGA Designs…40

Xplanation: FPGA 101A Tutorial on Timing Constraints…46

Xplanation: FPGA 101Simplifying Metastability with IDDR…52



Xtra, Xtra The latest Xilinx tool updates and patches, as of July 2010…55

Are You Xperienced? Training at the cusp of the programmable imperative…58

Xamples… A mix of new and popular application notes…60

Tools of Xcellence A bit of news about our partners and their latest offerings…62

Xclamations! Share your wit and wisdom by supplying a caption for our wild and wacky artwork…66



Xcell Journal recently received 2010 APEX Awards of Excellence in the

categories “Magazine & Journal Writing” and “Magazine & Journal Design and Layout”.

Page 6: Xcell Journal issue 72

6 Xcell Journal Third Quarter 2010

Xilinx Redefines State of the Art With New 7 Series FPGAs Xilinx Redefines State of the Art With New 7 Series FPGAs


Page 7: Xcell Journal issue 72

The new 7 series is the first XilinxFPGA family created entirely under thewatch of Gavrielov, who joined the compa-ny in late 2007 after serving as a CEO ofVerisity, a design tool provider. Before that,he worked for many years in managementat ASIC house LSI Logic. Gavrielov has setXilinx on an aggressive path to growth,with the main driver being an industry-leading line of FPGAs, which culminates inthe 7 series, and the Targeted DesignPlatform strategy (see cover story, XcellJournal No. 68; http://www.xilinx.com/publications/archives/xcell/Xcell68.pdf ).

To enable this growth, the 7 series boastsseveral significant refinements, including anew unified and scalable architecture, a pri-mary emphasis on power reduction andmassive capacity, enabling better overall sys-tem performance (Figure 1).

Starts With a Unified ArchitectureUp until the introduction of the 7 series,Xilinx’s FPGA landscape has primarilycentered on the high-performance Virtexfamily and the high-volume Spartan®

family. When Xilinx originally intro-duced these two lines in the late 1990s,the Virtex and Spartan devices used radi-cally different architectures. From a userperspective, the two families had notabledifferences, and so did the IP for eachdevice and the design experience in work-ing with them. If you wanted to increasethe size of your end product from aSpartan design to a Virtex design or viceversa, the differences in architecture, IPand pin counts became apparent.

But with the unified architecture of the7 series, those variances disappear. Withthe 7 series, Xilinx will not be introducinga new device under the Spartan name andhas instead developed a complete lineup ofFPGAs—primarily in three families, fromlowest cost to highest performance—allbased on the familiar Virtex FPGA archi-tecture (Figure 2).

Virtex remains the moniker for the 7series’ highest-end FPGAs. This newVirtex-7 family delivers breakthroughcapacity with up to 2 million logic cells andbetter than twofold system performanceimprovement over previous generations.

by Mike SantariniPublisher, Xcell JournalXilinx, [email protected]

FPGAs have advanced remarkably ever sincethey first hit the market in the mid-1980s as1,500-ASIC-gate-equivalent devices. Twodecades later, with the launch of Xilinx’s new7 series, the FPGA stands poised to fulfill itshistoric promise of one day displacing ASICsas the electronics industry’s mainstream logicIC. With the introduction of the 7 seriesFPGAs, Xilinx® is transitioning from beingjust a PLD maker to a premier supplier oflogic ICs by providing lower total cost ofownership for low- to medium-volumeapplications and equivalent total cost ofownership for higher-volume applicationstraditionally addressed by ASICs and ASSPs.What’s more, this total-cost-of-ownershipbenefit combines with the traditional FPGAadvantages of faster time-to-market and riskreduction. Together, all of these factors meanthat FPGAs are emerging as the de factologic IC solution for most applications.

As part of the 7 series release, Xilinx willbring to market an unprecedented 2 mil-lion-logic-cell FPGA, which is 2.5x thecapacity of the largest Virtex®-6 device.Depending on whom you ask, how youdesign and what application you are target-ing, that means the largest 7 series FPGAdelivers the clout of anywhere from 15 mil-lion to 40 million equivalent ASIC gates.Thus, in the last 10 years, Xilinx hasincreased the capacity of its FPGAs by morethan 30x at equivalent price points of thedevices it produced 10 years ago.

But a huge capacity increase is onlythe beginning of the 7 series story. Thesebeefy FPGAs run faster than the previ-ous-generation Virtex-6, but at half thepower consumption.

“ASICs are not dead, nor will they dieentirely, but they really are only viable fora very small number of applications thathave the very highest volumes,” saidMoshe Gavrielov, Xilinx’s chief executiveofficer. “Where once you had to ask whywould you go with an FPGA, today youhave to seriously ask yourself why wouldn’twe use an FPGA?”

Third Quarter 2010 Xcell Journal 7

Three families of 28-nm devices attack the mainstream and high-end ASIC and ASSP markets.


Page 8: Xcell Journal issue 72

For a smooth transition from Spartan-6FPGAs in the low-cost market, the newArtix™-7 family leads the industry inprice, power and small form factor for cost-sensitive, low-power applications.

The final member of the trio smoothlyfills the space between the Virtex-7 on thehigh end and the Artix-7 line at the mass-market level. The Kintex™-7 introduces anew price/performance advantage andgives Xilinx a platform for displacing main-stream ASICs and ASSPs.

Victor Peng, senior vice president forprogrammable platforms development at

Xilinx, predicts that having a solidmidrange product in the form of Kintex-7will allow Xilinx to offer a comprehensiveFPGA lineup that is more highly applica-tion targeted.

“In previous generations, Xilinx wouldfill that middle ground by creating a high-er-performance, higher-capacity version ofSpartan and at the same time a lower-cost,lower-capacity and lower-performance ver-sion of Virtex,” said Peng. “But theSpartan and Virtex architectures, IP andpin counts were very different. With theArtix, Kintex and Virtex families all built

in a unified 7 series architecture, cus-tomers will find it much easier to migratetheir designs across the families, enablingbetter leverage of their developmentinvestments in IP.”

They will also be able to migrate designblocks in the 7 series to the logic portion ofthe upcoming Extensible ProcessingPlatform (see cover story, Xcell JournalNo. 71; http://www.xilinx.com/publications/archives/xcell/Xcell71.pdf ), because the EPPand 7 series devices all use the sameVirtex logic architecture structure. Further,the common logic architecture supports

the ARM AXI4 (Advanced ExtensibleInterface) protocol. This means that Xilinx’sinternal IP developers and hundreds of IPpartners can more easily target and imple-ment AXI-compliant IP on Xilinx FPGAs.Chances are many customers have theirown IP already built to comply with AXI,further facilitating moving designs fromASIC or ASSP to 7 series FPGAs.

Peng notes that in addition to offeringgreat benefits for customers and IP part-ners, the unified architecture allowsXilinx to become more focused andaligned in all product development efforts

going forward. “It means our organizationcan focus on doing things once ratherthan twice,” said Peng.

28-nm HPL: Right Mix of Power, Capacity and Performance With the new 7 series family, Xilinx hasalso modified its manufacturing strategy tobetter align with the realities of modern ICdesign by choosing to implement itsdevices on a newly refined high-k metalgate (HKMG) high-performance, low-power (HPL) process with Taiwanesefoundry TSMC.

Traditionally, FPGA vendors have imple-mented their designs on the highest-per-formance variation of each new siliconprocess as fast as foundries could make theprocesses available. However, starting with90-nm process technologies, leakage startedto become a big problem. It only got worseat 65 nm and 40 nm. At the 28-nm processnode, if unaddressed, leakage current canaccount for well over 50 percent of a device’spower consumption. In addition to usingpower when a device isn’t running, duringoperation leakage creates extra heat, whichin turn increases the leakage. Especially incontinual-use, high-performance applica-tions, this vicious cycle can lead to shorteneddevice lifetimes and catastrophic IC failures.This greatly impacts the viability of using anFPGA in a given application as well as thereliability of a system.

The foundries have made remarkablestrides to stem the leakage in their high-performance processes at 28 nm. Xilinxworked with its new foundry partner,TSMC, to refine the foundry’s newHKMG HPL process for the 7 seriesFPGA, emphasizing low power combinedwith the usual gains in capacity and systemperformance when shrinking the geometry.

Peng said that by going with the HPLrather than the HP process, Xilinx willreduce static power by 50 percent with lessthan a 3 percent impact on performance.The use of the HPL process combined withthe comprehensive power-savings enhance-ments implemented in the 7 series results ina 50 percent reduction in total power com-pared with devices at the same densities inthe last generation of products.

8 Xcell Journal Third Quarter 2010


2x Price / Performance


2x PowerReduction

2x System Performance




7 Series(28nm)

Figure 1 – The new Xilinx 7 series pushes the boundaries for FPGAs.

Page 9: Xcell Journal issue 72

Third Quarter 2010 Xcell Journal 9

The 50 percent lower-power benefitgives design teams two options, said Peng.“You either run a similar-size Virtex-6 orSpartan-6 design at half the power in the 7series, or you can double the size of the logicfunctions in your [new] design and remainat your previous power budget,” he said. “Bygoing with the HPL process, we have givencustomers much more usable performanceas well as more logic gates to implementmore functions in their designs.”

Xilinx CEO Gavrielov notes that bychoosing the higher-capacity but lower-power variant of the 28-nm process, Xilinxis leading the FPGA industry in aligningwith the microprocessor industry. Almost adecade ago, MPU makers realized thatcranking up clock rates in these newerprocess geometries would only createextremely leaky, thermally challengeddevices that would fail.

“We learned from the processor side ofthe semiconductor business that given therealities of processes today, the best way to

achieve performance is through higherintegration and efficiency, as opposed tosimply making things move faster,” saidGavrielov. “With today’s processes, if youjust make things faster, you drain morepower and create thermal problems—which degrades power and performance.We need to pay a lot of attention to end-customer applications and ensure we strikethe right balance between meeting low-power needs while simultaneously meeting

the system performance needs of the appli-cation. We believe we will deliver a greatvalue proposition with the 7 series FPGAsthat will delight customers.”

Peng notes that had Xilinx gone withan HP process for an incremental clockspeedup, the significant increase in powerfor the fairly insignificant increase in per-formance would have burdened users withpaying extra attention to designing aroundpower and thermal issues. They mighthave incurred extra systems costs associat-ed with adding elaborate heat spreading or

even fan or liquid cooling and relatedpower circuitry to the end system.

HPL is just one of about a dozen tech-nologies Xilinx employs to reduce power inthe 7 series, Gavrielov said. For example,Xilinx reduced configuration logic voltagefrom 2.5 to 1.8 V, and optimized each of thehard blocks—DSP, Block RAM, SelectIO™and others—using HVT, RVT and LVT tran-sistors to reduce static power while optimiz-ing performance and area. As a result, each

DSP slice consumes 1/12 the power of theequivalent logic implementation. By optimiz-ing the ratio of these tightly integrated hardblocks throughout the FPGA fabric, Xilinxwas able to achieve the greatest performanceand lowest power while preserving flexibility.

Customers can also use the intelligentclock-gating feature introduced in the ISE®

Design Suite 12 to give their 7 series designsan additional 20 percent reduction indynamic power consumption. And finally,users can get a dramatic savings in powerconsumption by leveraging Xilinx’s fourth-


Unpublished Work © Copyright 2009 Xilinx

3 New Families Based on a Unified Architecture

Industry’s Best Price / Performance

Compared to Virtex-6

✔ Comparable performance

✔ 50% lower cost

✔ 50% less power

Industry’s HighestSystem Performance

and Capacity

Compared to Virtex-6

✔ 2.5x larger

✔ Up to 2M logic cells

✔ 1.9Tbps serial bandwidth

✔ Up to 28Gbps line rate

✔ EasyPath cost reduction

Lowest Power and Cost

Compared to Spartan-6

✔ 30% more performance

✔ 35% lower cost

✔ 50% less power

✔ 50% smaller footprint

All Optimized for Power & Improved Price/Performance

Common Logic Cells, BRAMs, Interfaces

Easy Design Scalability

Figure 2 – The three new families in the 7 series unified architecture offer users a smooth path from lowest cost to highest volume.

Page 10: Xcell Journal issue 72

generation partial-reconfiguration method-ology to effectively “turn off ” portions ofthe design when they are not in use.

The upshot? By going with an HPLprocess, taking other power reductionmeasures and rolling out its new devices ina unified architecture, Xilinx now offers acomprehensive line of FPGAs, from thehigh-volume low-power lines to thoseboasting the highest system performanceand capacity the industry has seen to date.

The Virtex-7, Kintex-7 and Artix-7 FamiliesPatrick Dorsey, senior director of market-ing at Xilinx, said the three new families inthe 7 series will allow Xilinx to capture aneven greater share of the ASIC and ASSPmarket and penetrate even more deeplyinto a broader number of vertical markets,from low-powered medical devices to thehighest-performing wired and wireless net-working equipment.

At the entry level, “The new Artix-7family provides the lowest absolute powerand cost, with small-form-factor packag-ing,” said Dorsey. Densities range from20,000 to 355,000 logic cells. The devicesare 30 percent faster and consume 50 per-cent less power than Spartan-6 FPGAs at35 percent lower price points. When mov-ing from Spartan-6 FPGAs to Artix-7devices, designers can expect up to 85 per-cent lower static power and up to 35 per-cent lower dynamic power consumption.

GTP serial transceivers support linerates up to 3.75 Gbits/second. Other keyfeatures include 3.3-volt-capable I/O forinterfacing to legacy components and wire-bond packaging for the lowest cost, withoptional chip-scale packaging for the small-est form factor and 1.0-mm ball spacing forlow-cost PCB manufacturing.

Dorsey said that because the new familyis based upon the Virtex architecture, it alsonow includes many of the advanced featuresof the Virtex family that were not availablein the Spartan line. For example, the Artix-7includes an enhanced System Monitor ana-

log function, now called XADC (analogcapability), to allow users to monitor thefunctionality, temperature, touch sensor,motion control and other real-world analogactivities in the system. The integratedXADC technology will enable a whole newclass of mixed-signal applications.

Further, with these refined specs, theArtix-7 FPGAs better target the low-powerperformance requirements for applicationssuch as ultrasound equipment. The devicesalso now address the small-form-factor, low-power requirements of lens control modulesfor high-end commercial digital cameras aswell as next-generation automotive infotain-ment systems driven by 12 V. Artix-7 devicesalso meet the strict SWAP-C (size, weight,power and cost) requirements of militaryavionics and communications applications.

Kintex-7 FPGA FamilyDorsey said that with the new midrangefamily, Kintex-7, Xilinx now providesFPGAs with the best price-for-perform-ance on the market. “With the Kintex-7family, we offer devices that are less thanhalf the price and power consumption ofVirtex-6 FPGAs but equal in performanceand functionality,” said Dorsey.

The Kintex-7 devices will be especiallywelcomed in applications that require cost-effective signal processing, he said. That’sbecause they offer abundant DSP slices(from 120 to 1,540), up to 5,663 kbits ofdistributed static RAM and 28,620 kbits ofinternal block static RAM, and between fourand sixteen 10.3-Gbps GTX serial trans-ceivers. Dorsey said Kintex-7 devices will beequally attractive to Virtex users seeking alower-cost alternative as well as to customerswho have traditionally used Spartan FPGAsbut are scaling their designs to the next levelof system performance. Indeed, with logicdensities ranging from 30,000 to 400,000gates and with 40 percent higher perform-ance than Artix-7 FPGAs, Kintex-7 devicesequal the performance of Virtex-6 and aresignificantly faster than Spartan-6 FPGAs.

Dorsey said the Kintex-7 parts are idealfor implementing Long Term Evolution(LTE) wireless radio and baseband subsys-tems. And thanks to the recent release ofXilinx’s fourth-generation partial-reconfigu-ration technology, 7 series customers can fur-ther reduce power and cost, for widedeployment in femto, pico and mainstreambase stations. The serial connectivity, memo-ry and logic performance of these devices is agood fit for high-volume wired communica-tions as well, Dorsey added, citing equip-ment such as 10G passive optical network(PON) optical line terminal (OLT) line cardsthat bring high-speed networking to theneighborhood and home as one example.

In addition, Kintex-7 FPGAs are alsosuited for use in high-definition 3D flat-panel displays in consumer electronics mar-kets; video-over-Internet Protocol bridgesthat enable next-generation broadcast video-on-demand systems; and high-performanceimage processing required for militaryavionics and ultrasound equipment that cansupport up to 128 high-resolution channels.

Virtex-7 FPGA FamilyFor its part, the high-end Virtex-7 familytakes the industry’s most successful FPGAarchitecture to new heights by deliveringmore than a doubling in capacity and 30percent faster system performance alongwith 50 percent lower power than theVirtex-6 FPGA predecessors.

Dorsey said the Virtex-7 FPGAs are wellsuited for communications systems requir-ing the highest performance, capacity andbandwidth. With its Virtex-7T and Virtex-7XT variants, this FPGA line boasts ultra-high-end devices that push the limits ofFPGA technologies in terms of the numberand performance of embedded serial trans-ceivers, DSP slices, memory blocks andhigh-speed I/O to establish new bench-marks for the industry.

Virtex-7 devices offer up to 36 GTX10.3-Gbps serial transceivers, ultrahigh-endlogic capacity with as many as 2 million

10 Xcell Journal Third Quarter 2010


T h e Vi r t e x - 7 f a m i l y t a k e s t h e i n d u s t r y ’ s m o s t s u c c e s s f u l F P G Aa r c h i t e c t u r e t o n e w h e i g h t s b y d o u b l i n g t h e c a p a c i t y a n d b o o s t i n g s y s t e m p e r f o r m a n c e a t 5 0 p e r c e n t l o w e r p o w e r.

Page 11: Xcell Journal issue 72

Third Quarter 2010 Xcell Journal 11

logic cells and the highest parallel I/Obandwidth in the industry, with up to1,200 SelectIO™ pins. This I/O configura-tion enables the greatest number of parallelbanks of 72-bit DDR3 memory availableon the market, supporting 2,133 Mbps.

Meanwhile, the new Virtex-7XTdevices also provide the highest serialbandwidth in a single FPGA, with up to72 GTH transceivers at 13.1 Gbps, or 80GTH and GTX transceivers (24 runningat 13.1 Gbps and 56 at 10.3 Gbps, respec-tively). In addition, the devices featurehigher DSP-to-logic ratios for greaterthroughput, with up to 3,960 DSP slicesat 600 MHz delivering 4.7 TMACs. Also,the 7XT FPGAs have higher on-chipBRAM-to-logic ratios with up to 65Mbits, for low-latency data buffering.Dorsey said that Xilinx will eventually adddevices with 28-Gbps transceivers to thisfamily; the release details are forthcoming.

The new Virtex-7 FPGAs target the high-est-performance wireless, wired and broad-cast infrastructure subsystems, said Dorsey.The teraMACC signal-processing capabili-ties of Virtex-7 FPGAs enable advancedradar and high-performance computing sys-tems. Product developers can replace ASICsand multichip-set ASSP solutions with sin-gle-FPGA implementations of 100GE linecards to increase bandwidth, while simulta-neously lowering the power and cost. Otherapplications include 100-Gbit OpticalTransport Network (OTN) muxponders forintegrated multiplexer/transponder applica-tions, 300G Interlaken bridges and 400Goptical network cards.

In addition, these ultrahigh-end devicesprovide the logic density, performance andI/O bandwidth needed to build next-gen-eration test and measurement equipment.For systems where ASIC production is jus-tified, Virtex-7 FPGAs enable designers touse fewer devices during prototyping andemulation in order to lower cost andreduce interconnect/design complexity.

EasyPath—a Further Cost AlternativeDorsey said the company’s EasyPath™ pro-gram extends the value of Xilinx 7 seriesFPGAs to provide the lowest total cost ofownership for medium- to higher-volume

applications on the order of 100,000 units.This total cost of ownership requires thatcustomers assume only the development andunit cost. In addition, they receive the fulladvantages of time-to-market and risk reduc-tion that FPGAs offer. This further bolstersXilinx’s value as a strategic logic IC supplier.

EasyPath provides a cost reduction bycoupling Xilinx’s FPGA manufacturingprocess to the customer’s design. This resultsin the same silicon with the same features,but only guaranteed to work with a givendesign. Dorsey said that EasyPath-7 takes sixweeks to complete from design freeze andoffers a guaranteed 35 percent cost reductionand no minimum-order quantity, with nocustomer engineering effort required—all fora $300,000 nonrecurring engineering cost

“Now you have the peace of mind thatonce you design the FPGA, you can get tolower cost by targeting either Kintex-7 orArtix-7 and, if further cost reductions arenecessary to support higher volumes, bygoing to EasyPath-7,” said Dorsey. “What’smore, if you’ve already completed yourFPGA design and want to go with EasyPath,you can let your purchasing department han-dle the rest, since no further customer engi-neering resources are required.”

Next-Generation Targeted Design PlatformsAlong with announcing the new family,Xilinx is also launching a second generationof Targeted Design Platforms, application-specific design aids which the company firstrolled out in 2009 in tandem with therelease of the Virtex-6 and Spartan-6FPGAs. Xilinx’s Targeted Design Platformstrategy gives system designers access tosimpler, smarter design methodologies forcreating FPGA-based solutions through theintegration of five key elements: FPGAdevices, design tools, IP, development kitsand targeted reference designs.

Early-access ISE Design Suite softwaresupporting the new FPGA families hasshipped to a limited number of early-adopter customers and partners. First ship-ments of the devices will begin in the firstquarter of 2011.

For more information on the 7 seriesFPGAs, visit http://www.xilinx.com/technology/roadmap/7 series-fpgas.htm.


Page 12: Xcell Journal issue 72

12 Xcell Journal Third Quarter 2010

New Xilinx Rad-Hard FPGA Reaches for the StarsNew Xilinx Rad-Hard FPGA Reaches for the Stars


Page 13: Xcell Journal issue 72

by Maury WrightPresidentWDC Marketing

Electronic systems designs headed to spacenaturally require high reliability, but thedesign task is further complicated by expo-sure to radiation that can cause sporadic cir-cuit failures. From a functional perspective,FPGAs with inherent reconfigurable attrib-utes are a perfect match for space. FPGAsenable a single system to perform multipletasks and let mission teams remotely recon-figure a system, either fixing a bug oradding new functionality. Now Xilinx® hasan FPGA—the Virtex®-5QV—that is rad-hard and can deliver the full benefits ofprogrammability to space programs. Thedesign teams get an off-the-shelf solutionwith all the advantages of a 65-nanome-ter commercial SRAM-based FPGA,including ready access to developmentand prototyping tools.

It’s hard to underestimate the value anFPGA can offer in an application such asspace-bound systems. Once a system, satel-lite, rocket or spacecraft is deployed, thereis little or no ability to make hands-onchanges to it, so the reprogrammability ofan FPGA is a huge benefit. To be sure,microprocessors and microcontrollers canalso be reprogrammed. But FPGAs excel indata-flow applications where functionssuch as packet inspection or signal-process-ing algorithms implemented in hardwarelogic offer far more processing throughputthan do traditional microprocessors. Andthe FPGA hardware can be easily reconfig-ured to support new algorithms.

Given the advances in circuit densityand the mix of hardwired IP blocks andconfigurable logic, the latest FPGA tech-nologies can capture the bulk of a sys-tem’s functionality. For example, theVirtex-5QV includes Ethernet MACfunctions and high-speed transceivers togo along with DSP slices and config-urable logic (for details on the FPGAcapabilities, see sidebar, next page).

A rad-hard IC that's derived from acommercial FPGA family also offers signif-icant benefits in the development process.Design teams can do development work

Third Quarter 2010 Xcell Journal 13

Virtex-5QV device offers a flexible,cost-effective alternative for design teams working on advanced, reconfigurable space applications.

Virtex-5QV device offers a flexible,cost-effective alternative for design teams working on advanced, reconfigurable space applications.


Page 14: Xcell Journal issue 72

with readily available commercial devicesand development tools and then seamlesslymove the design to the rad-hard targetsystem platform at any point in the devel-opment process.

Space Presents Reliability ChallengesTo deploy FPGAs in space applications,however, designers have to understand theenvironment and learn how to mitigateissues that affect reliability. For example, anumber of radiation-induced effects havebeen identified as a problem area for space-based designs. The list includes single-event upsets, single-event functionalinterrupts, single-event latchups, single-

event transients and total ionizing doseeffects. (See the second sidebar for moreinformation on these effects.)

Designers working on space applica-tions haven’t traditionally had the free-dom to use ICs such as FPGAs withoutcarefully considering ways to mitigateradiation effects. Specialty ASIC houseshave radiation-hardened IC manufactur-ing processes. But ASIC design cycles arelengthy and expensive, and the quantityof devices the application will actuallyneed simply doesn’t justify the time andeffort, given viable alternatives.

The radiation-hardened ASIC processesare also many generations behind state-of-

the-art commercial IC processes. Forexample, the rad-hard ASICs are still in the150-nm or less-dense process nodes.Indeed, modern FPGAs offer performanceand circuit density that match those ofradiation-hardened ASICs, along withmuch faster development cycles.

Radiation Tolerance and TMRIn the past, designers who wanted to useFPGAs have had to combine radiation-tol-erant ICs with techniques that further mit-igate single-event upset (SEU) effects.Xilinx has long addressed the need for radi-ation resistance in space-targeted designs.Radiation-tolerant FPGAs such as the

14 Xcell Journal Third Quarter 2010


Rad-Hard FPGA Delivers State-of-the-Art Benefits

The Virtex-5QV offers a unique value proposition. This FPGA is rad-hard out of the box and also offers state-of-the-artreprogrammable-logic density and hardwired IP blocks. Design teams working on space applications get ASIC-likecircuit density without the ASIC NRE costs.

The FPGA includes more than 130,000 logic cells for large, complex designs. The architecture is based on six-input LUTsand the IC employs a diagonal interconnect structure that ultimately packs designs more efficiently in terms of silicon utiliza-tion and results in better performance and lower power consumption.

The design is based on the second generation of Xilinx’s Advanced Silicon Modular Block (ASMBL™) column-based archi-tecture. ASMBL has allowed Xilinx to produce mixes of configurable logic and hardwired IP that are optimized for specificapplications.

The Virtex-5QV includes 320 Enhanced DSP slices to complement the programmable logic. Each slice includes a 25 x 18-bit multiplier, an adder and an accumulator. Designers can cascade the IC’s 36-kbit Block RAM elements to produce large, gen-eral-purpose memory arrays. The device includes 298 such blocks. Each block can also be configured as two 18-kbit blocks, sothere is little wasted silicon for applications requiring smaller RAM arrays.

For networking and I/O operations, the Virtex-5QV includes a number of hardwired IP blocks. Six Ethernet media-accesscontroller (MAC) functions can operate in 10-, 100 and 1,000-Mbps modes. Eighteen RocketIO™ transceivers support datatransfers at rates ranging from 150 Mbps to 3.125 Gbps. The MACs can use some of the RocketI/O transceivers for physical-layer (PHY) connections or link to external PHYs via a soft Media Independent Interface implemented in programmable logic.

The IC also includes three PCI Express® blocks compatible with the PCI Express Base Specification version 1.1. Designs canimplement x1-, x4- or x8-lane channels with each of the three blocks. The RocketIO transceivers are also available for PCIExpress I/O.

The device features a number of other functions important in high-performance system designs. Six clock management tiles(CMTs) can each generate clocks that operate up to 450 MHz. Each CMT includes dual digital clock managers (DCMs) anda phase-locked loop (PLL). The DCMs enable zero-delay buffering, frequency synthesis and clock-phase shifting. The PLLs addsupport for input jitter filtering and phase-matched clock division.

Xilinx will manufacture the IC in a 65-nm copper CMOS process with a 1-V core voltage. A ceramic flip-chip column grid arraypackage will ensure signal integrity. And Xilinx will guarantee operation over the full military temperature range of -55ºC to +125ºC.

– Maury Wright

Page 15: Xcell Journal issue 72

Third Quarter 2010 Xcell Journal 15

Virtex-4QV have immunity to single-eventlatchup (SEL), and can withstand a totalionizing dose (TID) up to 300 krads(Si).Xilinx combines these radiation-tolerant ICswith system-level techniques such as triple-modular redundancy (TMR) to ensure relia-bility. In a TMR design, three separateinstantiations of a system perform the sametask. A voting circuit compares the resultsand considers it correct if at least two sys-tems produce the same result.

Xilinx has developed a tool that cangreatly simplify the implementation of aTMR methodology. The TMRTool acceler-ates the design cycle by allowing the designteam to focus on design and debug ratherthan TMR. The tool works seamlessly withany HDL and synthesis tool to automati-cally build TMR into a design.

The TMRTool also goes beyond base-line TMR functionality. It triplicates allclocks and throughput logic to protectagainst single-event transients (SETs). Italso triplicates feedback logic and insertsmajority voters on all feedback paths. Andthe tool triplicates all outputs and usesminority voters to detect and disable incor-rect output paths.

By inserting voters on all feedback paths,the TMRTool overcomes a problem with thetechnology in designs with finite statemachines. In most TMR-based state-machine designs, an SEU that causes an errorin one of the three state machines ultimatelyrequires that the state machines be reset forsynchronization. But the voters in feedbackpaths ensure that the state machines remaincontinuously synchronized and can operatecontinuously through SEUs.

Rad-Hard by DesignWhile earlier FPGAs such as the Virtex-4QV have been successfully deployed inspace applications and have been radiation-tolerant, the new Virtex-5QV was designed

from the ground up with a rad-hard-by-design (RHBD) methodology. The result-ing FPGA is truly a rad-hard space-gradeIC. Where the Virtex-4QV requires thatdesigners add mitigation, the Virtex 5QVis rad-hard out of the box.

The Virtex-5QV design team’s specificgoal was to provide intrinsic hardness from

SEU, SET and other effects to critical circuitelements in the device. As with all SRAM-based FPGAs, configuration memory con-trols all aspects of device operation and istherefore critical to reliable operation. TheVirtex-5QV design utilizes dual-node latchesthat control write operations to memorycells. Writes occur only when both latches areenabled synchronously. The implementationoffers 1,000 times the hardness to SEUs rel-ative to memory latches in commercial ver-

sions of the FPGA. Moreover, the latch is vir-tually impervious to proton interaction.

Upset Hardening in HardwareEffectively obviating the need for TMR atthe application design level, the Virtex-5QV design team used a variety of tech-niques in implementing the underlying

FPGA memory and circuit elements in thedevice. Xilinx took special care in harden-ing the 35 million configuration cells andthe 81,920 user flip-flops. Both incorpo-rate a clever self-redundant storage circuitthat has double the normal number oftransistors. The result is a very low suscep-tibility to SEU. Xilinx optimized the layoutof this important structure by fabricatingmany variants and subjecting them to in-beam irradiation studies.


Figure 1 – Xilinx built its latest space-grade FPGA, the Virtex-5QV, to be rad-hard.

The Vir tex-5QV design team aimed to provide intr insic hardness fromsingle-event upsets and transients to cr i t ical e lements in the device.

A novel la tch implementat ion of fers almost 1,000 t imes the hardness to SEUs of commercial versions of the FPGA.

Page 16: Xcell Journal issue 72

In addition, Xilinx made sure that all theclock, data and asynchronous inputs to theflip-flops are protected. These are capable ofsuppressing single-event transients and pre-vent them from turning into upsets.

Xilinx employs proprietary methods toprotect against SEUs during the critical start-

up period when the FPGA is configured.One result is that designers don't have totake extra steps to ensure the elimination ofsingle-event functional interrupts (SEFIs),which traditionally require an intrusiveFPGA restart and reconfiguration. GarySwift, senior staff engineer for space products

development and radiation testing at Xilinx,said the implementation reduces GEO (geo-stationary earth orbit) environment SEFI ratesby more than two orders of magnitude (rela-tive to the already low once per century for therad-tolerant Virtex-4QV family) to about thanone SEFI fault in 10,000 years. “Interestingly,


Understanding Radiation-Induced Effects

Going back to the 1950s, engineers have documented the adverse effect that radiation can have on electronic circuits.With the advent of ICs and the constant move to finer process geometries, the potential for radiation-induced errorsgrew. The impact ranges from soft errors that are easy to detect and correct to actual device failures.

Design engineers working on space applications must prepare for a number of problems.

• A single-event upset (SEU) is a change of state in an IC, such as a change in the value of a memory bit caused by a radi-ation strike. SEUs are also called soft errors because the instance of an SEU has no long-term effect on the IC and infact, the soft error can often be found and fixed.

• A single-event functional interrupt (SEFI) is similar to an SEU in that it is typically the result of a single ion strike. ButSEFIs result in a temporary instance of some element of the IC not functioning properly. In some cases, an SEFI-induced fault remains until power cycles, and in other cases the condition is truly temporary.

• A single-event latchup (SEL) is a potentially more damaging event typically caused by ions or protons generated by cos-mic rays or solar flares. The radiation induces a high-current state that results in a full or partial loss of IC functionality.In some cases, power-cycling an IC eliminates the SEL condition. In other cases the device may be permanently flawed.

• The single-event transient (SET) encompasses the concept of an SEU, but includes more-complex errors induced by aradiation strike. An SET, for instance, might affect a clock and propagate multiple errors throughout memory or logic.

• Total ionizing dose (TID) effects lead to the failure of an IC based on the aggregate exposure to radiation over time.Typically, performance parameters decline in an IC as the TID, measured in rads, increases over time. Radiation createselectron-hole pairs in the oxide layer of an IC—slowly changing the threshold voltage of transistors.

Xilinx and its partners have gone to great lengths to understand the radiation effects and to test and characterize mitiga-tion techniques in FPGAs. Back in 2002, Xilinx and the Jet Propulsion Laboratory founded the Xilinx Radiation TestConsortium (XRTC—originally referred to as the Single-Event Effects Consortium, or SEEC). The consortium currentlyhas 14 members, including universities, research laboratories, and defense and space contractors.

The partners are generally organizations that need to use FPGAs in space applications and have a vested interest in accu-rately assessing radiation-induced effects and radiation-tolerant and –hardened designs. The consortium website includes acomprehensive library of research papers (http://www.xilinx.com/esp/aero_def/radiation_effects.htm) that have validated thereliable use of SRAM-based FPGAs in space applications.

Memory—including SRAM—has long been considered among the most susceptible circuits to radiation-induced effectsand specifically to SEUs. That fact led some to question the suitability of SRAM-based FPGAs in space applications.

SRAM-based FPGAs, however, offer tremendous flexibility relative to programmable devices based on nonvolatile mem-ory. For starters, SRAM-based FPGAs, and specifically those from Xilinx, have consistently offered the highest level of inte-gration available in programmable ICs. Moreover, SRAM-based FPGAs are easily reconfigured, allowing a system to servemultiple applications and allowing teams to remotely reconfigure a system to fix a flaw in the system implementations thatisn’t revealed until after deployment. The capacity and performance of the most advanced SRAM-based FPGAs far exceedthose of the most advanced programmable nonvolatile devices.

— Maury Wright

16 Xcell Journal Third Quarter 2010

Page 17: Xcell Journal issue 72

Third Quarter 2010 Xcell Journal 17

the upset mitigation is so effective with vir-tually no susceptibility to protons that GEOis the worst-case orbit for SEFIs and SEUs,exactly the opposite of what space radiationexperts have learned to expect,” said Swift.

In addition, Xilinx’s Digitally ControlledImpedance (DCI) allows adjustment outputimpedance and input termination valueswithout external components. Finally,Xilinx ensured its Block RAMs are pro-tected from outputting erroneous data inspite of upsets via an error-detection-and-correction circuit to eradicate upsets.

Virtex-5QV Delivers ResultsThe Virtex-5QV delivers results unmatchedby previous FPGAs. The ICs are fully char-acterized for space radiation effects in heavyion and proton environments. TheseFPGAs will withstand a TID of 700krads(Si), based on method 1019 as definedin MIL-STD-833.

Immunity to single-event latchup isdefined by a threshold to linear-energytransfer (LET)—the amount of energytransferred to material as an ionizing particletravels through that material. The Virtex-5QV meets the MIL-STD-833 requirementthat LET is greater than 100 MeV/mg-cm2

(mega electronvolts per milligram per cen-timeter squared). In short, the device isessentially immune to SEL effects.

SEU immunity in the configurationmemory and control logic is defined in termsof deployment in a GEO environment rela-tive to a space platform that travels 36,000km per day. Based on 35 Mbits on an IC thatcould be subject to an SEU, the IC will suf-fer 3.8 x 10-10 errors per bit per day.

With the availability of the Virtex-5QVFPGA, space teams will have access to astate-of-the-art reprogrammable platformbuilt on a 65-nm copper process technology.The teams can prototype their work withwidely available commercial FPGAs and eas-ily accessible development tools, and thendeploy systems that the Xilinx Radiation TestConsortium has proven to be reliable in theharsh radiation environment in space.

The Virtex-5QV device will sample in thecurrent quarter, with general productionavailability planned for first half of 2011.For more information, visit http://www.xilinx.com/products/virtex5qv/index.htm.

Maury Wright is an electronics engineer turnedtechnology journalist and industry consultant,with broad experience in technology areas rangingfrom microprocessors to digital media, wirelessand power management. Wright worked at EDNMagazine for 22 years, serving as editor-in-chiefand editorial director for five years. Wright alsoserved as editor of the EE Times Digital Homeand Power Management websites.


Rad TolerantFPGAs

Rad TolerantFPGAs














e P




2004 2006 2008 2010


Is your marketingmessage reachingthe right people?

Hit your target by advertising your product or service in the Xilinx

Xcell Journal, you’ll reach thousands of qualified engineers, designers, and

engineering managers worldwide.

The Xilinx Xcell Journal is an award-winning publication, dedicated specifically to helping programmable

logic users – and it works.

We offer affordable advertising rates and a variety

of advertisement sizes to meet any budget!

Call today: (800) 493-5551 or e-mail us at

[email protected]

Figure 2 – The Virtex-5QV rad-hard FPGA is a crowning achievement in Xilinx’s long, rich history of providing leading-edge IC technology for space applications.

Page 18: Xcell Journal issue 72

18 Xcell Journal Third Quarter 2010

Multiple MicroBlazes Ease Integrationin Real-Time Automotive SystemMultiple MicroBlazes Ease Integrationin Real-Time Automotive System


Page 19: Xcell Journal issue 72

by Martin ThompsonPrincipal Product EngineerTRW [email protected]

It is a commonly held view that it is harderto develop software for multiple-processorsystems than single-processor systems. Butin fact, this is not always the case. Ourdesign team at TRW Conekt, the consul-tancy arm of TRW Automotive, recentlyundertook a project that demonstrates howpartitioning the hardware to match theproblem at hand allows development of veryefficient systems using many processors.

Our team was tasked with providingembedded processing electronics to run incars for a project known as “Foot-LITE”(led by MIRA Ltd. and sponsored by theU.K. government-backed TechnologyStrategy Board, the Department forTransport and the Engineering andPhysical Sciences Research Council). Thisproject provides feedback to drivers abouttheir driving habits from the perspectiveof both safety and fuel economy.

The system gives the feedback to thedriver in two ways. First, a dashboard-mounted smartphone display system(designed by Brunel University and devel-oped by HW Communications Ltd.) pro-vides real-time communication with thedriver about events that require immediateattention. In addition, the system collectscontinuous journey data, including videostreams of particular “events,” and thenuploads them to an Internet-based serverfor users to view at their leisure. The deci-sion about which events to flag to the useris made by an algorithm developed bypartner Ricardo UK, based on drivingadvice from another partner, the Instituteof Advanced Motorists.

The project will fit this system to a smallfleet of 30 vehicles (see Figure 1). Test driverswill be members of the public employed byproject partner Hampshire County Council.

The project has progressively incorpo-rated the research results obtained by thecollaboration of 12 industrial, govern-mental and academic partners. Thismeans that we’ve needed a very flexiblesolution to our processing challenges.

Third Quarter 2010 Xcell Journal 19

When facing a project with ever-changing requirements and software contributions from multiple locations, multiprocessors can actually help.

When facing a project with ever-changing requirements and software contributions from multiple locations, multiprocessors can actually help.


Page 20: Xcell Journal issue 72

Base SystemWe had available to us a processing sys-tem, already under development in anoth-er project, to perform image-processingtasks (Figure 2).

This system is based around a singleXilinx® Spartan®-3A XC3SD3400A deviceconnected to four independent blocks ofDDR memory, an architecture thatallows users to implement many different

processor/logic configurations. For exam-ple, you could use the whole FPGA fabricas pure logic resources crafted entirely inHDL. Alternatively, you could use high-er-level tools such as the Xilinx EDK toimplement four (or more) soft-coremicrocontrollers. Each of the four wouldhave access to its own private DDR mem-ory device, protecting data from interfer-ence by the other microcontrollers. Forother simple tasks, you could include

additional processors by making use ofembedded BRAM blocks.

In addition, I/O to the outside world isconfigurable using small daughterboards,which allows for quick turnaround of cus-tomized I/O sets for different projects.

The project partners decided very earlyon that a USB interface would be desirable,as it allows you to add a wide variety ofperipherals to the system. This necessitated

some form of USB stack—we obtained oneusing the Petalinux implementation ofucLinux—and a daughterboard with aUSB host device.

The use of Linux also gives us a simpleway to manage the SPI flash devices thatthe system provides for FPGA bitstreamand application code storage. Weinstalled a simple JFFS2 file system toallow in-field application updates, eitherover Ethernet (using FTP) or by booting

with a USB Memory Stick that contains ascript to upload new code to the internalflash. In a traditional embedded system,all this would require the software team towrite low-level application code.However, with Linux available to us, wecan easily write simple Bash scripts tocontrol these processes.

Foot-LITE AlgorithmsRicardo developed the core algorithmsthat assess the driver’s actions and imple-mented them on its rCube rapid prototyp-ing system (http://www.ricardo.com/en-gb/Engineer ing-Consul t ing/Automotive-E x p e r t i s e / C o n t r o l s - - E l e c t r o n i c s /Embedded-Software/rCube/). We used thisapproach for initial simulator trials andthree test vehicles. In the test vehicles, anembedded vision system (based around anexisting TRW product—coincidentallyalso containing an FPGA) measured dis-tance to the vehicle in front and assessedthe vehicle’s position within the lane. Aradar system provided an alternativesource of range information in the testvehicles. As a step toward a productionimplementation, we eliminated the radarsystem for the larger-scale trials, as thevision system provided sufficient informa-tion for the application

We fitted the vehicle with a forward-facing video camera and processing subsys-tem, which are combined into a small unitthat fits near the rear-view mirror.Embedded algorithms in this subsystemprocess the video images to measure thedistance between the car and the edge ofthe lane. In addition, a parallel algorithmdetects vehicles in front of the Foot-LITEvehicle and provides a measure of the head-way distance. This subsystem transmits itsdata to the Foot-LITE system unit usingthe automotive-standard controller-areanetwork (CAN) bus.

20 Xcell Journal Third Quarter 2010

Home PC


BackOffice DB


Camera &





On Car

Figure 1 – Foot-LITE system


At the beginning, we envis ioned provid ing a s ingle-processor system. However, i t was soon apparent that

a dedicated processor would ease the task o f in tegrat ion for each i terat ion o f a lgor i thmic development .

Page 21: Xcell Journal issue 72

Third Quarter 2010 Xcell Journal 21

We integrated a three-axis accelerometerand yaw-rate sensing package into theFoot-LITE unit, potentially providing theFoot-LITE algorithms with access to high-rate, low-latency vehicle dynamics infor-mation when required.

The Foot-LITE algorithms fuse all thisdata to provide a set of simple outputs tothe driver relating to his or her driving style.

Algorithm ImplementationAt the beginning, we envisioned providinga single-processor system. However, it wassoon apparent that a dedicated processorwould ease the task of integration for eachiteration of algorithmic development. Weisolated the main host processor and theFoot-LITE algorithm processor by imple-

menting communications using theMicroBlaze™ Fast Simplex Link (FSL) bussystem. This allows a complete isolation ofthe processors’ memories (unlike the popu-lar shared-memory methodologies), whichgreatly eases the integration task, since bugscannot “migrate” from one processor toanother via memory corruptions.

In addition, there is no competition forprocessor cycles, which means our partnerscan be confident that any changes we make

to the host application will not affect theirapplication’s performance.

We developed a collection of wrapperfunctions that allow us to “drop in” thecode-generated C from the Simulink®

compiler without having to make majorchanges to the interface. We provide asmall amount of nonvolatile memoryonboard via an I2C bus, which is requiredfor storing various tune parameters withinthe Foot-LITE algorithms. This necessitat-ed a simple wrapper to provide the algo-rithms with easy access from the Simulinkenvironment to read this memory at start-up and write it back at shutdown.

The system needed to measure accelera-tions and yaw rate, as well as communicat-ing over the CAN bus with the lane- and

vehicle-detection system. As we already hadlow-level CAN drivers and were concernedas to the timeliness of a Linux applicationmeasuring the vehicle dynamics informa-tion at 40-millisecond rates, we decided toinsert a third MicroBlaze into the system.This saved porting CAN drivers to Linux,and allowed deterministic performance viaanother isolated processing node—criticalto the algorithms—which made use of thedynamics measures. In addition, this

approach allowed us to split the task ofwriting the software to allow parallel devel-opment. Once again, we used FSL as theinterface between the dynamics processorand the Foot-LITE algorithm processor.

Video Capture and CompressionThe initial conception of the system pro-vided simple measures of lane width andoffset, distance to the vehicle in front andso on from the vision system, transmittedover CAN to the Foot-LITE algorithmunit. The project partners decided toenhance this setup by capturing videoframes for transmission to the server, toprovide off-line contextual assistance forinterpreting the advice the system gave.Given that the requirement was only for“Internet-quality” video (300 x 200 pixelsat 5 Hz), we felt we could easily assign afourth MicroBlaze to the task of compress-ing the video stream to a simple set ofJPEG images in real time. The image com-ing from the camera was a wide-VGA (720x 480 at 30 Hz) video stream. Clearly,downsampling the image was a task to beperformed in hardware.

We designed a simple peripheral to han-dle the downsampling operation by simplydropping alternate pixels and lines to pro-duce a 360 x 240 image. This peripheralalso drops four in five frames to producethe required frame rate. Nothing morecomplex is needed to produce visiblyacceptable results, since the JPEG processrenders aliasing artifacts invisible. We usedSystem Generator to develop this peripher-al, as it makes export to EDK very straight-forward, and we already have experience ofusing System Generator for more-compleximage-processing tasks.

The downsampling peripheral busmasters the data into the SDRAM con-nected to the JPEG processor, which thencompresses each frame, as it arrives, into acircular buffer until the Foot-LITE algo-rithm sends a flag. The JPEG processorsends the compressed video frames (againover FSL) to the host MicroBlaze. Weused a code library from the IndependentJPEG Group and found that it neededvery little optimization to operate at 5 Hz.Again, having an isolated processor

Figure 2 – Processing module is Spartan-based.


Page 22: Xcell Journal issue 72

enabled another software engineer (basedat a different site) to work on this aspectof the system in parallel.

Bluetooth to Smartphone and OBDEase of installation was a critical factor forthe project. Minimizing the number of

wires that the system requires was also animportant consideration. We choseBluetooth as the interface to the smart-phone. Drivers for standard USBBluetooth dongles are standard in theucLinux kernel, although we had to buildthe user-space tools ourselves. These have a

number of dependencies on other items ofcode, which are also cross-compiled withthe Petalinux tool set and added to theucLinux file system.

Once we had decided on Bluetooth forthe smartphone interface, it was a naturalchoice to use Bluetooth for the interface to

22 Xcell Journal Third Quarter 2010



Linux MicroBlaze60MHzFPU,

32KB I-Cache,32KB D-Cache












8KB I-Cache,2KB D-Cache








8KB I-Cache,2KB D-Cache





JPEG MicroBlaze60MHzFPU,

8KB I-Cache,2KB D-Cache









Bus Key:










Figure 3 – FPGA block diagram showing major external elements


Page 23: Xcell Journal issue 72

Third Quarter 2010 Xcell Journal 23

the On-Board Diagnostic System. Wemade use of a standard off-the-shelfBluetooth-OBD interface module, remov-ing another wired link from the system.

Easier DebuggingDebugging a system with multiple, parallelthreads of execution is always challenging.But splitting the system across multipleprocessors actually makes things easier.There is no requirement for a multithread-aware debugger (as might be needed whentrying to debug multiple processors withinthe Linux environment). The Xilinxdebugger (XMD) can connect to multipleprocessors, and by using TCL (the ToolCommand Language, which XMD under-stands), we can automate the setup anddownload of the code under test to multi-ple processors. Of course, the commonembedded-system debug approach usingprintf statements was also available, sinceeach processor has its own serial port.

Another tool of great value when debug-ging the interprocessor communicationswas ChipScope™ Pro. This embeddedlogic analyzer built into the FPGA fabricallowed us to capture the data passing overthe FSL links and narrow down subtle bugsto either the sender or the receiver, andfrom there to the offending line of code.

The isolation provided by using fourprocessors means that once a particular ele-ment is debugged it will (to a large extent)not need to be looked at again. There arenone of the weird interactions that alwayscause problems when integrating disparatecode into a large, monolithic application,or when running multiple processes on asingle processor.

FPGA ImplementationIn this project, there is almost no HDL—simply a top-level wrapper integrating theEDK-based design with a tiny piece ofwatchdog code to guarantee the systemshuts down after the driver has turned theignition off. EDK generates the vastmajority of the FPGA (the MHS file ismore than 1,300 lines long!), with SystemGenerator producing the video downsam-pler. We configured all four microcon-trollers with caches and floating-point

units. With four processors, four DDRmemory interfaces and a collection ofperipherals (including Ethernet, SPI, IIC,CAN, UARTs, timers, GPIOs), around70 percent of the device’s lookup tablesare occupied (around 28,000 LUTs). As isusual with microcontroller-based FPGAs,the block memories are very highly uti-lized at better than 90 percent, or 119BRAMs, but the DSP blocks are relative-ly lightly used: Only the floating-pointunits in each processor (eight in each, fora total of 32) need them.

Bringing it All TogetherThe host microprocessor boots the Linuxkernel from internal flash and then mountsits local file systems. The slave processorseach have an FSL-based boot loader, whichaccepts a standard S-record, parses it,copies it into local memory and executes it.The Linux processor simply streams the S-record from the file system direct to theFSL pseudo-file (using the built-in dd util-ity). As described already, all interprocessorcommunication takes place over a fullyconnected mesh of FSL links. These are all32 bits wide and operate at 60 MHz, pro-viding plenty of low-latency communica-tion bandwidth. Although avoiding sharedmemory may seem limiting, the upside isthat this system provides the isolationbenefits already discussed. The hardwarearchitecture matches the applicationrequirement subdivision well, which cre-ates an intuitive software partition.

The Foot-LITE algorithm microproces-sor sends triggers to the JPEG compressorwhen required and communicates with thesmartphone display. The Linux processorintermediates between the Bluetooth com-munications and the rest of the system(Figure 3). In addition to the immediatesignals to the driver, it sends a continuousstream of information about the state ofthe vehicle and occasional streams of videofor onward transmission to a central servervia the smartphone.

At the end of a journey, when the driv-er switches off the ignition, the mainprocessor informs the slave processors,which can then perform their own shut-down procedures (such as writing updated

parameters to the nonvolatile tune storage)before informing the main processor thatthey are in a safe state for shutdown. At thispoint, the host processor signals to thepower supply and the system enters a verylow-power sleep mode, awaiting the nextturn of the ignition. In the unlikely casethat the software does not send the shut-down signal within a couple of minutes ofthe ignition turnoff, a hardware timer inthe FPGA fabric pulls the power, to avoidflattening the vehicle battery.

In the final stages of the project, twoacademic members of the consortium(Newcastle University and the Universityof Southampton) will analyze the datastreamed out of the vehicle in actual high-way use to evaluate the effectiveness of thesystem in altering driver behavior.

The FPGA AdvantageFPGAs provide huge flexibility, whichmeets the needs of evolving projectsmuch more easily than a fixed hardwareplatform. The ability to mix in customhardware for intensive applications (forexample, video) is also beneficial. If youuse Linux to gain the huge benefits ofready-made high-level access to periph-erals such as Ethernet, you don’t need tocompromise on real-time performance,as you can push those critical tasks intotheir own microprocessor. Finally, if alarge, geographically distributed team isdeveloping the software, having a hard-ware architecture that matches the func-tional split provides benefits indevelopment and integration.

For more information, contact the authorat [email protected]. You can readmore about the Foot-LITE project athttp://www.foot-lite.net. MIRA Ltd. is theproject lead. The other industrial projectpartners are Auto-txt Ltd., HampshireCounty Council, HW CommunicationsLtd., The Institute of Advanced MotoristsLtd., Ricardo UK Ltd., Transport forLondon, TRW Conekt and Zettlex PrintedTechnologies Ltd. The academic partners areBrunel University, Newcastle UniversityTransport Operations Research Group andUniversity of Southampton TransportationResearch Group.


Page 24: Xcell Journal issue 72

24 Xcell Journal Third Quarter 2010

Making Biometrics the Killer App of FPGA Dynamic Partial Reconfiguration

Making Biometrics the Killer App of FPGA Dynamic Partial ReconfigurationRun-time reconfigurable hardware technologybrings key advantages in the design of automatic personal recognition systems.

Run-time reconfigurable hardware technologybrings key advantages in the design of automatic personal recognition systems.


by Francisco FonsPhD CandidateUniversity Rovira i Virgili, Tarragona, [email protected]

Mariano FonsPhD CandidateUniversity Rovira i Virgili, Tarragona, [email protected]

Page 25: Xcell Journal issue 72

In the current era of communications andinformation technologies, automatic bio-metric personal recognition systems repre-sent the state of the art in high-performancesignal- and image-processing applications. Infact, it is not difficult to find in our daily livessystems requesting our personal authentica-tion/identification before allowing us to usethem; electronic tellers, computers, mobilephones and even cars require such authoriza-tion. Many end-user applications thatdemand better levels of security than PINs,passwords or ID cards use personal recogni-tion algorithms based on biometric (physio-logical or behavioral) characteristics, usuallydelivering them as a kernel.

As a proof of concept, we developed anautomatic fingerprint authentication system(AFAS) on the second smallest Xilinx®

FPGA device in the Virtex®-4 LX family,making use of the Xilinx Early Access PartialReconfiguration design flow and tools. Theexperimental results demonstrate it is possi-ble to embed a full, highly demanding bio-metric recognition algorithm in such a smallFPGA at an extremely low cost, processing itin real-time while preserving data accuracyand precision in its physical implementationby multiplexing functionality on the fly overa reduced set of resources placed in a partial-ly reconfigurable region (PRR) of the device.These promising results, together with theproven maturity of the technology we used,encourage us to move this solution fromresearch to industry, in an attempt to makepartial reconfiguration (PR) available to theconsumer world in the way of secure com-mercial products.

Basics of BiometricsComputationally complex applicationsprocessed in real time, driven at low rates ofpower consumption and synthesized at lowcost are unavoidable requirements today inthe design and development of embeddedsystems, particularly when addressed tomass-production niches. In this context,dynamic partial self-reconfiguration of sin-gle-context FPGAs arises as a firm techno-logical alternative, able to deliver a highfunctional density of resources to efficient-ly balance all those demands for time-,power- and cost-sensitive applications.

Finally, cost-effectiveness is probably themost important reason for biometrics tomake use of partial reconfigurability. Inaggressive markets like consumer electronicsor automotive, vendors must market theirsystems at a competitive cost. Customersdemand products with the highest level ofsecurity at the lowest possible price point.

The way to improve security and reliabil-ity is by increasing the computational powerof the biometric recognition algorithm. Thisincrement of computation usually involves alike increment in execution time and also incost (resources). However, the cost is hardlyaffected in those scenarios where the design isbased on dynamic-partial-reconfigurationtechnology. Using PR, designers can parti-tion that new computation and schedule it as

new processing stages added to the currentsequential execution flow of the application.Thus, cost often can be held invariant tofunctional changes of the algorithm.

Designers can partition the biometricrecognition algorithm into a series of mutu-ally exclusive stages that are processedsequentially, where the outputs or results ofone stage become the input data for the next.This sequential order means designers canmultiplex hardware resources in time andcustomize them to execute a different task orrole at each moment, increasing their func-tional density and thus keeping constantthe total number of resources needed toprocess the entire algorithm. Moreover, thereconfiguration overhead is short enough

Software-defined radio, aerospace mis-sions and cryptography are some of theknown applications that exploit the bene-fits of dynamic partial reconfiguration ofprogrammable logic devices today. In thiscontext, our group is applying PR to anapplication space that hasn’t traditionallyleveraged it: biometrics. As security hasbecome a major issue in today’s digitalinformation environment, especially forapplication fields like e-commerce, e-health, e-passports, e-banking or e-voting,among others, we believe the use of PR inbiometrics holds great promise.

However, biometrics is complex. Itrequires stringent and computationallyintensive image/signal processing in realtime, along with a great deal of flexibility.

In addition, personal recognition algo-rithms are in continuous evolution. As theresearch community expends major effortin this field, error rates like false acceptanceand false rejection are improving. As a con-sequence, consumers are growing moreconfident about biometric systems, andacceptance is increasing. Given thatprogress in biometrics technology is expect-ed to continue in the future, biometricproducts already in the market will have toadmit upgrades in the field just to avoidgetting obsolete, and for this they requireopen system architectures. In this regard,the flexible hardware found in run-timereconfigurable FPGA devices enables theversatility and scalability needed.

Third Quarter 2010 Xcell Journal 25


Given that progress in b iometr icstechnology is expected to cont inue in the fu ture , b iometr ic productsalready in the market wi l l have

to admit upgrades in the f ie ld jus t to avoid ge t t ing obsole te ,

and for th is they require open system archi tec tures.

Page 26: Xcell Journal issue 72

so as not to eclipse the benefitsgained by hardware acceleration.

Furthermore, reconfiguring oneset of resources on the fly will notinterrupt the rest of the resourcesavailable in the FPGA. In this way,the resources that are not reconfig-ured continue to operate and guar-antee the link with the exteriorworld for the entire life cycle of theapplication.

Our challenge in this work con-sisted in demonstrating that PR fitswell in the development of complexpersonal recognition algorithmsbased on biometric characteristics,making use of a two-dimensionaldesign abstraction level throughwhich the functionality is managednot only in space but also in time.We describe this target step by stepin the next sections.

Automatic Fingerprint Authentication SystemFingerprint verification is one ofthe most popular and reliable bio-metric techniques used in automat-ic personal recognition. Essentially,the technique splits the AFASapplication into two processes orstages carried out at different timesand in different conditions: enroll-ment and recognition.

Enrollment is the system config-uration process through which theuser gets registered. Generally, theuser exposes his or her fingerprint tothe system, which submits it to a setof computationally intensive image-processing phases aimed at extract-ing all relevant, permanent and distinctiveinformation that will permit the system tounequivocally recognize the fingerprint’sgenuine owner. This set of characteristicsbecomes the user ID, which the system storesin its database. This process is normally con-ducted off-line, in a secure environment andunder the guidance of expert staff.

Once the user is registered, the next timehis or her fingerprint is exposed to the systemin the recognition stage, the system willcheck to see if it corresponds with any

authorized member within the database. Allthe processing tasks performed in the enroll-ment are repeated now to again extract thosedistinctive characteristics from the live fin-gerprint sample. The system then comparesthese characteristics with the informationstored as user templates in the database toconclude whether the live scan matches anyof the registered templates. Recognitioncomes in two modalities depending on thesize of the database: authentication, when aone-to-one (or one-to-few) matching is

processed; and identification, when thematching is one-to-many due to the factthat many users are registered in the sys-tem. Recognition is normally performedonline in a less-secure environment andunder real-time constraints.

Each of these stages is, in its turn,partitioned into a series of mutuallyexclusive tasks designed to extract fromthe fingerprint image such informationas will distinguish one user from theothers. With that object in view, thesystem carries out specific computa-tions, such as image processing (2Dconvolution, morphologic operations),trigonometrics (sin, cos, atan, sqrt) [1]or statistics (average value, variance).

Thus, the biometric application isorganized in a set of tasks that areprocessed following a sequential flow.A task cannot start unless the previoustask has finished, since the output dataof a given task is the input data for thenext one in the chain. Moreover, mostof these tasks are repeated in bothenrollment and recognition stages.

Figure 1 enumerates the tasks thattake place in the presented algorithm.The first task is the image acquisition.Depending on the size of the sensor, asystem may acquire the whole image atone touch (complete image sensor) orin slices (sweeping sensor). In the sec-ond scenario—which was the case weused—an additional image reconstruc-tion phase is necessary. The full finger-print image gets composed by the setof consecutive and partially overlappedslices acquired [2].

Once we have the whole recon-structed image, the next task consists of

segmenting it in the foreground (that is, theregion of interest, based on the ridges andvalleys of the fingertip skin) from the back-ground. We perform this process by con-volving the image, pixel by pixel, withdirectional filters made up of Sobel masks ofkernel 5x5. Afterwards, we normalize theimage at a specific mean and variance.

Next, we enhance this normalized imagethrough an isotropic filtering, whichretrieves relevant image information fromsome potential regions of the captured

26 Xcell Journal Third Quarter 2010


Figure 1 – Spatial partitioning and floorplanning of the AFAStake place in one static region and one reconfigurable region of the Virtex-4. Temporal partitioning of the application in

sequential stages occurs in the reconfigurable region.

Page 27: Xcell Journal issue 72

Third Quarter 2010 Xcell Journal 27

image initially lost or disturbed by the noisein the acquisition phase, making use of akernel 13x13 [3]. Once this step hasimproved the quality of the image, the nexttask is to compute the field orientationmap, which determines the dominant direc-tion of ridges and valleys in each localregion of the image foreground. The result-ant field orientation is then submitted to anew filtering stage (kernel 5x5) to obtain arefined field orientation map.

Until this point, the image has beenworked at 8-bit gray scale. Now, in thebinarization process, Gabor directionalfilters of kernel 7x7 convolve the gray-scale image to improve the definition ofthe ridges and valleys and convert each ofthe gray-scale pixels to a 1-bit binary(black or white) dot. The image is thensubmitted to a new loop to smooth andredraw the shapes of the resultant ridgesand valleys. Later, the thinning or skele-tonization task converts the black-and-white image to one with black ridges onepixel wide. From that image it is not diffi-cult to extract the fingerprint characteris-

tic points or minutiae, that is, the ridgeendings and bifurcations.

Finally, with the minutiae and the fieldorientation data already obtained, the fin-gerprint template and sample can bealigned. The first way of accomplishing thisis through a brute-force algorithm thatmoves one image over the other—takinginto consideration both translation androtation movements as well as some admis-sible tolerances due to the image distortioncoming from the skin elasticity in theacquisition phase—to find the best align-ment between them [4]. The next step is tomatch the sample and template to obtain alevel of similarity between them, which theautomatic system will use to decide if bothimages correspond to the same person [5].

All this processing, illustrated in Figure4, is performed on fingerprint images of500-dpi resolution, 8-bit gray scale and upto 280 x 512 pixels, acquired throughsweeping technology via the thermal fin-gerprint sensor FingerChip from AtmelCorp. and computed in the Xilinx Virtex-4XC4VLX25 FPGA device.

System ArchitectureThe Virtex-4 FPGA device becomes thecomputational unit of the AFAS platform.Flash memory plays the role of system data-base, storing nonvolatile information likebitstreams as well as specific application datasuch as user fingerprint templates or config-uration settings of the biometric algorithm.The system also uses DDR-SDRAM mem-ory to temporarily store intermediate data orimages obtained in each processing stage.We implemented a serial communicationlink, in our case an RS-232 transceiver con-nected to a UART controller—the lattersynthesized in the resources of the FPGA—to use for debugging purposes, just to trans-fer the resulting image of each stage to a PCin order to visualize the fingerprint imagesor results of each step. Finally, a sweepingfingerprint sensor, addressed to capture thebiometric characteristic of the user, acts asinput of the recognition algorithm, asdepicted in Figure 2.

Regarding the computation unit, theFPGA is detached in two regions, as shownin Figure 3: a static region occupied by a






































Figure 2 – System architecture and functional components breakdown of the suggested AFAS.

Page 28: Xcell Journal issue 72

whole multiprocessor CoreConnect bus sys-tem; and a partially reconfigurable regionthat is used to place—on demand and mul-tiplexed in time as long as the processingadvances—the custom biometric coproces-sors or IP responsible for the different

sequential tasks of the recognition algorithm.The multiprocessor CoreConnect bus sys-tem mainly comprises a MicroBlaze™processor and other standard peripheralsalong with a custom reconfiguration con-troller, this one linked to the ICAP port.

All the processing tasks are enumeratedfrom 0 (static) to B in Figure 1, accordingto sequential execution order. Customhardware coprocessors implement all thetasks in the PRR, with the exception of thefingerprint acquisition process, which theMicroBlaze performs in software.

The reason behind this specific hard-ware/software partitioning is that thesweeping sensor needs an integration timeof 5 milliseconds to acquire consecutiveslices. That’s enough time for it to performthe image reconstruction on the fly directlyin software under MicroBlaze control.Therefore, it is not necessary to implementthis image reconstruction with a customhardware coprocessor.

The image acquisition consists of cap-turing 100 slices at a rate of 5 ms per slice,with each slice consisting of 280 x 8 pix-els. Software handles the reconstruction inreal time by detecting the overlapping ofrows of pixels between each two consecu-tives image slices.

We implemented the rest of the tasks,however, as custom hardware coprocessorsin the PRR of the FPGA simply because ofreal-time constraints. Once the processingof each particular task is finished, the recon-figuration controller, located on the staticregion of the device and instructed by theMicroBlaze processor, replaces the coproces-sor currently instantiated in the PRR by theone corresponding to the next stage of thebiometric algorithm. The reconfigurationcontroller does this job by simply down-loading the new partial bitstream into thePRR and transferring this data directly fromDDR-SDRAM to the internal FPGA con-figuration memory via the ICAP interface.

It is important to note that we used astandard interface based on FIFO memo-ries and flip-flop registers between the stat-ic and the reconfigurable regions. Thisallows us to develop standard biometriccoprocessors or IP placed in the PRR thatare totally independent of the multiproces-sor bus the system uses, be it AMBA®,CoreConnect, Wishbone or some other, asdepicted in Figure 2. This point is funda-mental in order to guarantee standardiza-tion and portability of the biometricalgorithm to different platforms.

28 Xcell Journal Third Quarter 2010


Figure 3 – Composition of the full bitstream placed in the FPGA at a given time along the application execution flow. The static region (far left) and one of the dynamically

reconfigurable biometric coprocessors instantiated and shared in time in the PRR (center) make up the full bitstream (far right).

Figure 4 – Resultant images obtained in each of the sequential stages of the biometric recognition algorithm (fingerprint template processing on the left

and fingerprint sample processing on the right).

Page 29: Xcell Journal issue 72

Third Quarter 2010 Xcell Journal 29

Reconfiguration ControllerThe design of an efficient reconfigurationcontroller is key to success in the deploy-ment of PR systems oriented to single-con-text FPGAs. Although the nonreconfiguredarea of the FPGA remains in operationwhile the PRR is reconfigured, the PRRresources are not operative at that time, soit is desirable to speed up the reconfigura-tion process as much as possible so as tominimize this overhead. The reconfigura-tion time depends on three factors: databus width, reconfiguration frequency andbitstream size—the first two relate tointerface aspects, while the last is closelytied to the PRR size and the design com-plexity of the partially reconfigurablemodule (PRM) located there.

Our work implements a reconfigura-tion controller that is able to transfer par-tial bitstreams from external memory tothe FPGA’s on-chip configuration memo-ry at run-time with a high bandwidth. Itis possible to reach the maximum recon-figuration bandwidth of Virtex-4 tech-nology with no constraints in the partialbitstream size and with the external mem-ory as a shared resource that differentprocessors can access concurrently fromthe system buses.

In the system initialization, the partialbitstreams to be downloaded at run-timeinto the FPGA configuration memorymove from the external nonvolatile memo-ry (flash) to the external DDR-SDRAM.This memory is connected to a multiportmemory controller (MPMC), so it becomesa shared resource accessible by any masteror slave processor in the system. Differentbuses can be connected to the MPMC, forinstance the CoreConnect PLBv46 bus,used as general-purpose system bus, or eventhe Xilinx CacheLink (XCL) bus, orientedto fast instruction and data caches of theCPU. The system CPU (MicroBlaze) is, infact, connected to these two buses.

Our reconfiguration solution, however,is based on a new bus, the Native PortInterface (NPI), which is specificallydesigned to establish a fast link between theexternal DDR-SDRAM repository and theICAP primitive. As part of our reconfigura-tion controller, we have designed a master

memory-management unit (MMU) thathandles the NPI protocol. The link betweenexternal DDR-SDRAM (partial bitstreams)and the ICAP primitive goes through aninternal FIFO memory. In this way, we canimplement two different made-to-measureinterfaces, with independent data bus sizeand speed—one coupled to the NPI proto-col and the other to the ICAP protocol.

The write port of the FIFO is connect-ed to the NPI and uses a 64-bit data bus.The read port of the FIFO, joined to theICAP, uses a data width of 32 bits—themaximum data width of ICAP in Virtex-4devices. Regarding frequency, both readand write ports of the FIFO (on the NPIand ICAP sides) run at 100 MHz,although the NPI side could work at ahigher rate if necessary. To keep the trans-fer latency to a minimum, the masterMMU performs the bitstream reconfigura-tion in 64-word (32-bit) burst transfers tothe internal FIFO. This is the maximumlength of burst accepted, so all the partial

bitstream transactions are done at the low-est burst latency. On the other side, thereconfiguration controller reads the storedFIFO data and transfers it in 32-bit formatto the ICAP primitive, as long as the FIFOis not empty. The reconfiguration con-troller (just the master MMU) is handlingthe direct memory access (DMA) to hugeDDR-SDRAM memory. We set up thispart with several configuration registersimplemented in another custom slaveMMU controller connected to the PLBv46bus and directly managed by the CPU.

In this way, the CPU only needs to dotwo things: configure the initial addressand size of the partial bitstream to bedownloaded in the PRR, and then give thego-ahead command to the MMU master tostart the reconfiguration process. At thatpoint, the MMU master starts the bit-stream DMA transfer to the internal FIFOand from this to the ICAP primitive. Oncethe transfer is finished, the reconfigurationcontroller notifies the CPU.


Acquisition 500.000 500.000 500.000

Segmentation 2.810 232.046 0.672

Normalization 0.470 33.087 1.691

Enhancement 7.030 512.171 3.608

Field Orientation 2.500 337.419 1.694

Filtered Orientation 0.620 22.178 1.465

Binarization 15.940 774.750 3.572

Smoothing 14.220 287.507 1.492

Thinning 1.410 417.350 1.794

Features Extraction 0.630 32.497 8.549

Alignment 3224.530 139935.838 158.716

Matching 4.220 108.608 21.772

TOTAL 3774.380 143193.451 705.025




1.83 GHzSW MicroBlaze

Virtex-4 100 MHzPR-HW & SW

Virtex-4 50/100 MHz


Table 1 – Processing time breakdown (in milliseconds) of the different tasks executed in different AFASplatforms: a software-only approach on a personal computer platform, embedded-software approach on a

Xilinx Virtex-4 XC4VLX25 FPGA and HW/SW co-design based on partial reconfiguration.


Page 30: Xcell Journal issue 72

As a result, we achieve the transfer of thepartial bitstream at maximum throughputeven if the DDR-SDRAM is accessed by theCPU via XCL or PLBv46 buses at the sametime. That’s because, in the end, the CPUruns the program flow in internal BRAMcache, freeing the access to the externalDDR-SDRAM to the reconfiguration con-troller. It is important to note here that thisDDR-SDRAM memory where both partialbitstreams and software application are allo-cated is not a dedicated resource but a sharedresource. Even so, this scheme significantlyimproves upon other existing reconfigura-tion controller approaches, since it reachesthe maximum reconfiguration throughput ofVirtex-4 technology (transfer of the partialbitstream to the ICAP through a 32-bit databus at a rate of 100 MHz, or 3.2 Gbps).

Experimental ResultsThe embedded automatic fingerprintauthentication system described here isessentially a high-performance image-

processing application, since it exhibits agreat deal of parallelism and demands areal-time authentication response. From anergonomic standpoint, that could mean,for instance, not to exceed 2 or 3 seconds inthe authentication process of any user.

The design flow entails several devel-opment loops. Initially, we fully devel-oped the algorithm in software inMATLAB® on a PC platform. Afterward,we ported this software code to embeddedsoftware in the C programming languageand executed it first in the same PC, justto confirm that we would obtain the sameresults, and then on an embedded micro-processor like the MicroBlaze synthesizedin our FPGA device.

In this approach, the Virtex-4 deviceimplements a software-only solution basedon MicroBlaze, without any custom hard-ware coprocessor in use and without reach-ing real-time performance. To improve thetime, and based on the resultant tasks pro-filing we obtained, our next step consisted

of switching to a HW/SW co-design solu-tion by introducing the PRR, where welocated the different custom biometriccoprocessors. At this point, we have fullydeveloped the system in both the C pro-gramming language and VHDL hardwaredescription language.

We have conducted some recognitiontests with 8-bit gray-scale fingerprintimages of 268 x 460 pixels. We deployedthe same tests in two platforms: in our PRsystem based on Virtex-4 and also in a per-sonal computer based on an Intel Core 2Duo T5600 processor running at 1.83GHz. We then ran the same algorithm,either implemented purely in software orby combining software with flexible hard-ware, just to compare the performance inboth enrollment and recognition stages.

We obtained identical recognitionresults in both platforms, as expected.However, the processing time spent ineach case differed dramatically. Table 1shows the time needed when the algo-rithm is deployed on different platformsand architectures: a software approach onthe Intel Core 2 Duo PC platform;embedded-software approach on anML401 platform powered by a Virtex-4XC4VLX25 FPGA based on a MicroBlazeprocessor at 100 MHz; and HW/SW co-design approach on an identical ML401platform equipped with dedicated bio-metric coprocessors running at either 50or 100 MHz, instantiated in the PRR andreconfigured on demand.

Without considering the acquisitiontask, which is fixed at 500 ms due to thesweeping-sensor restrictions (100 slicescaptured with an integration time of 5 msand image reconstructed from them on thefly), the PR approach reduces latency dueto the rest of the processing tasks to 205ms. That compares with latency of 3,274ms in the pure-software approach on thePC, which means a speedup of 16x in favorof the PR solution.

Thus, Table 1 makes it evident that real-time authentication is feasible with HW/SWco-design that exploits parallelism andpipeline techniques, along with PR technol-ogy, thanks to its low reconfiguration laten-cy. Furthermore, in the PR approach, each

30 Xcell Journal Third Quarter 2010


Application Flow (static) — — 7005 8888 41 4

Acquisition — 500.000 — — — —

Segmentation — 0.672 4978 4612 8 20

Normalization 0.841 0.850 371 334 0 8

Enhancement 1.045 2.563 5275 5831 5 28

Field Orientation 1.025 0.669 3339 3166 5 8

Filtered Orientation 1.046 0.419 2857 2983 7 0

Binarization 1.107 2.465 5462 4166 17 29

Smoothing 1.045 0.447 4892 3265 8 0

Thinning 0.974 0.820 1013 2821 13 0

Features Extraction 0.943 7.606 487 3379 3 0

Alignment 1.045 157.671 2632 8943 21 0

Matching 1.035 20.737 642 4379 14 5

TOTAL 10.106 694.919 38953 52767 142 102



PERFORMANCE PR-HW & SW APPROACHTime (ms) Hardware Resources

RECONF.(100 MHz)

PROCESS.(50/100 MHz)





Table 2 – Time and resources breakdown of the different tasks executed by the AFAS driven by partial reconfiguration technology on a Virtex-4 XC4VLX25 FPGA composed of

21,504 flip-flops, 21,504 four-input LUTs, 72 RAMB16 blocks and 48 DSP48 blocks.

Page 31: Xcell Journal issue 72

Third Quarter 2010 Xcell Journal 31

task can run at a specific frequency; this fre-quency is established each time we reconfig-ure the PRR to download a new modulewith new specific characteristics. Ourapproach ran all the tasks performed inhardware at either 50 or 100 MHz.

Furthermore, the reconfiguration processwas always performed at 100 MHz, trans-ferring 32-bit words per clock, a fact thatguarantees the lowest reconfiguration latencyon the Virtex-4. Each reconfigurationprocess took between 0.8 ms (for example,normalization) and 1.1 ms (e.g., binariza-tion), depending on the bitstream com-plexity of each PRR hardware context. Thisreconfiguration time is negligible in com-parison to the total processing time of thebiometric recognition application, asdepicted in Table 2.

But we not only addressed time in thisPR design. We also carefully consideredcost-effectiveness by means of the time-sharing of the resources involved. TheXC4VLX25 FPGA device contains 21,504slice flip-flops, 21,504 four-input LUTs,72 18-kbit RAMB16 blocks and 48DSP48 blocks. Regarding the partitioningof resources in both static and reconfig-urable regions, the reconfigurable regiontakes 11,264 slice flip-flops, 11,264 four-input LUTs, 22 18-kbit RAMB16 blocksand 44 DSP48 blocks, while the rest of theresources of the device keep static for theentire life cycle of the application.

The PRR is in charge of the executionof up to 11 different sequential tasks of therecognition algorithm. As shown in Table2, the same application synthesized on afully static design would not fit fully on theXC4VLX25 FPGA; therefore, that wouldtypically force designers to choose a biggerand more expensive device with the properamount of resources. However, using PReliminates this issue. Table 2 definitelydemonstrates that automatic personalauthentication can be performed atextremely low cost today with the reuse oflogic resources thanks to PR technology.

The set of tools we used, available in theXilinx Early Access Partial ReconfigurationTools Lounge, are ISE® 9.02.04i togetherwith the PR_12 patch, EDK 9.02.02i, andPlanAhead™ 9.2.7. Finally, we validated

the system on real fingerprint imagesacquired by the system as well as other fin-gerprint images that exist in public data-bases based on the same fingerprintsweeping sensor (Fingerprint VerificationCompetition databases).

Now that we have successfully complet-ed the proof of concept, we plan to portthis prototype to the coming next genera-tion of low-end Xilinx 28-nanometerFPGA devices provided with PR capabilityin the Artix™-7 family, and the new PRdesign flow based on partitions that Xilinxrecently released. Our goal is to design asystem able to embed high-performanceand real biometric security in any con-sumer electronics product at the lowestpossible cost.

The time for run-time reconfigurablecomputing in biometric applications is def-initely now. For further information aboutthis project, you can contact the authors at{francisco.fons, mariano.fons}@estudiants.urv.cat.


[1] F. Fons et al., “Trigonometric ComputingEmbedded in a Dynamically Reconfigurable CORDICSystem-on-Chip,” Reconfigurable Computing:Architectures and Applications, Lecture Notes inComputer Science, Vol. 3985, pp. 122-127, ISSN0302-9743, Springer, 2006.

[2] M. Fons et al., “Hardware-Software Co-design of anAutomatic Fingerprint Acquisition System,” IEEEInternational Symposium on Industrial Electronics, ISIE2005 Conference Proceedings, pp. 1123-1128,Dubrovnik, Croatia, June 2005.

[3] F. Fons et al., “Approaching Fingerprint ImageEnhancement through Reconfigurable HardwareAccelerators,” IEEE International Symposium onIntelligent Signal Processing, WISP 2007 ConferenceProceedings, pp. 457-462, Alcalá de Henares, Spain,October 2007.

[4] M. Fons et al., “Design of a Hardware Accelerator forFingerprint Alignment,” IEEE International Conferenceon Field Programmable Logic and Applications, FPL2007 Conference Proceedings, pp. 485-488,Amsterdam, The Netherlands, August 2007.

[5] M. Fons et al., “Hardware-Software Co-design of aFingerprint Matcher on Card,” IEEE InternationalConference on Electro/Information Technology, EIT2006 Conference Proceedings, East Lansing,Michigan, USA, May 2006.


Page 32: Xcell Journal issue 72

by Endric Schubert, PhDManaging DirectorMissing Link Electronics, [email protected]

Leo SantakMember of the Technical StaffMissing Link Electronics, Inc.

Every now and then designers face the needto extend the lifespan of an existing embed-ded system by adding more compute poweror additional inputs (or both). This is a jobfor which having a programmable systemplatform really helps.

In our case, we wanted to upgrade a net-worked programmable system with secureInternet connectivity. Secure Internet con-nectivity requires encryption to run proto-cols such as Secure Shell (SSH), TransportLayer Security (TLS), Secure Sockets Layer(SSL) or virtual private network (VPN).This need for security is growing in pacewith the demand to connect all manner ofsystems to the Internet to enable remoteadministration and distributed control sys-tems, for example.

32 Xcell Journal Third Quarter 2010

Building a Better Crypto Engine the Programmable Way Building a Better Crypto Engine the Programmable Way Hardware acceleration based on Xilinx FPGA delivers a speedy system.Hardware acceleration based on Xilinx FPGA delivers a speedy system.


Page 33: Xcell Journal issue 72

Because this field is still evolving andstandards are not yet set, costs are domi-nated by nonrecurring-engineering fees.Therefore, FPGA technology offers thebest value for implementations.

Our system was built on top of theMissing Link Electronics (MLE) “Soft”Hardware Platform, where the flexibleI/Os of the FPGA enable connection to awide range of sensors and actuators. Thisplatform uses the programmable logic toimplement a system-on-chip with eitherthe MicroBlaze™ CPU or the PowerPC®

CPU at its heart. The CPU runs the MLELinux software stack for the operating sys-tem and the user-space application soft-ware. With the MicroBlaze or thePowerPC as the main CPU, the systemwas obviously not suitable for deliveringthe required compute performance whenrunning embedded Linux plus strongencryption on top. And changing thephysical hardware was not an option.

Instead, we utilized the power of pro-grammable systems to migrate computa-tions from the software domain to thehardware side for system acceleration.

Coprocessing Hardware A programmable system is basically acombination of one or more CPUs—run-ning an operating system and applicationsoftware—plus an FPGA. The FPGA isthere as a flexible interface “adapter” andas coprocessing hardware. You can makeprogrammable systems from separatecompanion chips or integrate everythinginto one single device. Depending on howthe FPGA device and the CPU are com-municating with each other, you have dif-ferent options in adjusting the system forperformance and functionality.

One possibility is to add a peer proces-sor, which synchronizes with the CPU viamemory-mapped status and control regis-ters. Because running all communicationover the same system bus may quickly suf-focate performance, you really want toseparate the data stream of the CPU fromthe peer processor. This is easy to do byusing system-on-chip components such asthe Xilinx Central DMA or the MultiportMemory Controller (MPMC).

(FCM) readily supports that. The advantagehere is to free up the memory-to-system busby using a dedicated communication chan-nel between the CPU and the coprocessor.For the PowerPC this is the AuxiliaryProcessing Unit (APU) and for MicroBlaze,the Fast Simplex Link (FSL).

Alternatively, you can add a coprocessor,in which case you effectively extend theinstruction set of the CPU by adding cus-tom instructions (also called compiler-known functions). This is, for example, thecase for floating-point units, and the Xilinxtechnology of Fabric Coprocessor Modules

Third Quarter 2010 Xcell Journal 33



1 x



5744 x



8587 x



5720 x



7164 x

2856 x



8537 x





7152x 8538x





28121 x 12787 x

AES_encrypt MD5_Update


5508 x



Figure 1 – In an SCP transfer using the Valgrind tool, the AES encryption occupies two-thirds of the computations.


Page 34: Xcell Journal issue 72

AES: the Gold StandardBut how do you really accelerate encryp-tion without a major system redesign?

For encryption, the AdvancedEncryption Standard (AES) is really thede facto standard. With AES encryption,the computations are irreducible by defi-nition, bringing an embedded system

quickly to its performance limits. This isclearly illustrated in Figure 1, whichshows the profiling results of a file trans-fer with SCP (SSH session) using theValgrind analysis tool. In this case, theAES encryption takes up two-thirds ofthe computations.

AES-128, with a key and block length of128 bits, utilizes many concurrent 8-byteoperations. AES is a block cipher and oper-ates on fixed block sizes organized as a 4 x 4

array of bytes. We used a 128-bit block size,which withstands all known attacks and iseven supposed to be more secure than the192-bit and 256-bit versions.

With 128-bit AES, it takes 12 rounds,each with several steps, to perform theencryption and decryption. The first task isto compute the round keys from the secret

key by means of the so-called key expan-sion process. In every round, the plain textis bit-wise XOR-ed with its own round key.Then sub-byte, row-shifting and column-mixing operations follow, and the roundkey gets XOR-ed once again.

The final round slightly differs, omittingsome steps. The encryption process per-forms substitution using a so-called S-box,which provides nonlinearity. We can arrangeit in a 16 x 16 x 8-bit matrix so that it gen-

erously fits into the common Xilinx BRAMprimitives. Several S-box instances speed upthe IP core and supply the core in place withthe data needed, without waiting on long-lasting bus accesses to main memory. Thedecryption occurs in a similar fashion, usingthe same secret key, but in the oppositedirection and with a different S-box.

12 Times FasterIn encryption and decryption, most of theoperations are performed on either the rowsor the columns, leaving four operations thatcan be calculated in parallel—a job wellsuited for hardware. Thus, various hardwareimplementations of AES are available fromdifferent sources. To accelerate our system,we took an AES core from the great andfast-growing OpenCores.org repository(http://opencores.org/project,avs_aes). Weremoved the original bus interface, whichwas targeted for another FPGA architecture,and added an interface for the APU to con-nect the AES core as an FCM coprocessorto a PowerPC. We used a total of eight so-called UDI commands to transfer databetween the PowerPC and the AES FCM.

The result of that work was very satisfy-ing (see Figure 2). The hardware-acceleratedsystem ran 12 times faster than the originalimplementation. It took 17.8 microsec-onds to encrypt one single block using astandalone PowerPC running at 300 MHz,but only 1.5 μs to do this with an AESFCM running at 150 MHz. For those whoare tempted to just switch to a faster CPUfor a speedup, our hardware-acceleratedspeed of 1.5 μs outperformed a pure-soft-ware implementation on an Intel Atom1.6-GHz CPU, which took 2.7 μs.

These results demonstrate the out-standing potential of hardware accelera-tion using FPGA technology. For detailsof the analysis and exemplary code, con-tact our applications team at http://www.missinglinkelectronics.com.

34 Xcell Journal Third Quarter 2010

0 2 4 6 8 10 12 14 16 18





Intel Atom N270 (1.6 GHz)

PowerPC (300 MHz) + coprocessor (150 MHz)


PowerPC (300 MHz)

time t / μs

Figure 2 – The hardware-accelerated system (green bar, center) ran faster than a standalone PowerPC or an Atom processor.

In encryption and decryption, most of the operations are performed on either the rows or the columns, leaving four operations that can

be calculated in parallel—a job well suited for hardware.


Page 35: Xcell Journal issue 72

Six powerful Vir tex®-6 FPGAs, up to 24 Million ASIC gates, clock speeds to710 Mhz: this new board races ahead of last generation solutions. The Dini Group

has implemented new Xilinx V6 technology in an easy to use PCIe hosted or standalone board that features:

• 4 DDR3 SODIMMs, up to 4GB per socket

• Hosted in a 4-lane PCIe, Gen 1 slot

• 4 Serial-ATA ports for high speed data transfer

• Easy configuration via PCIe, USB, or GbE

• Three independent low-skew global clock networks

The higher gate count FPGAs, with 700 MHz LVDS chip to chip interconnects, provideeasier logic partitioning. The on-board Marvell Dual Sheeva processor provides multiplehigh speed interfaces optimized for data throughput. Both CPUs are capable of2 GFLOPS and can be dedicated to customer applications.

Order this board stuffed with 6 SX475Ts—that’s 12,096 multipliers and more than 21million ASIC logic gates—an ideal platform for your DSP based algorithmic accelerationand HPC applications.

Don’t spin your wheels with last year’s FPGAs, call Dini Group today and run yourbigger designs even faster.

www.dinigroup.com • 7469 Draper Avenue • La Jolla, CA 92037 • (858) 454-3419 • e-mail: [email protected]


Page 36: Xcell Journal issue 72

by Catello Antonio De RosaSenior Design Engineer and FPGA SpecialistPrisma [email protected]

LTE (Long Term Evolution), the new3GPP standard for broadband mobility,disrupts the existing paradigms of cellularnetworks. In addition to high-spectral-efficiency radio techniques, LTE boasts avery simplified architecture in compari-son to the prior-generation UMTS andGSM standards. Evolved Node-B’s, theradio-access part of the LTE system, arethe edge between the radio and all-Internet Protocol core networks. Thisarchitecture makes it impossible to moni-tor and test the equivalent of intermedi-ate links in UMTS. An effective testing ofLTE network elements must involve theradio interface.

This is exactly the challenge addressedby our design team in Prisma Engineering’sLine Server Unit (LSU) UeSIM LTE. Thesimulator is a complete solution for all LTEtesting needs, allowing network equipmentdesigners to stress and monitor both the airinterface and the core network. This singlehardware platform can simulate up to1,024 pieces of user equipment per sector.Load-and-stress and functional testing overthe radio interface encompass completeLTE protocol stacks and their applications.A radio front end handles bandwidths of 5,10, 15 and 20 MHz in a native multiple-input, multiple-output (MIMO) design.

36 Xcell Journal Third Quarter 2010

LTE Simulator Relies on Xilinx Virtex-5 FPGAsLTE Simulator Relies on Xilinx Virtex-5 FPGAsPowerful programmable logic platform enablesPrisma Engineering to provide reconfigurableradio test equipment for all cellular networks.

Powerful programmable logic platform enablesPrisma Engineering to provide reconfigurableradio test equipment for all cellular networks.


Page 37: Xcell Journal issue 72

Three Xilinx® Virtex®-5 FPGAs(XC5VSX50T) reside at the heart of thisadvanced simulator, enabling a high level ofsoftware-defined radio reconfiguration. Ourteam at Prisma Engineering, which is head-quartered in Milan, Italy, quickly realized weneeded a powerful and reprogrammablearchitecture in order to gain the flexibility toaddress a multitude of radio access standardsusing the same board. Our main goal was, asour CEO, Enrico Bendinelli, put it, “to createthe industry’s most flexible and easy-to-usemanagement software.”

Two user test tools—the LTE TestManager (primarily for LTE equipment ven-dors) and the Quick GUI (primarily for LTEnetwork operators)—are available. TheQuick GUI provides pass/fail-type testingscenarios while the Test Manager allows formore complex analysis.

LSU UeSIM LTE ArchitectureThe LSU UeSIM LTE Simulator is based ona CompactPCI standard architecture compris-ing a protocol-processing unit (PPU) board, asoftware-defined radio (SDR) board and tworadio modules for MIMO operations.

Based on Intel technology, the PPUboard, which is the main processor card, isable to manage multiple SDR boards inorder to improve the load-and-stress capaci-ty. The software-defined radio board isdesigned to extend the operation of our pre-vious LSU systems on the radio interfaces.The CompactPCI radio mezzanine cardsprovide radio-frequency (RF) transmis-sion/reception capability at different radiostandard bandwidths: GSM (850 and 900MHz; 1.8 and 1.9 GHz), LTE (700 MHz,2.1, 2.3, 2.5 and 2.6 GHz) and WiMAX(2.4, 3.5 and 5 GHz).

SDR Card ArchitectureThe SDR card is a high-performance plat-form integrated within the LSUhardware/software environment to extendthe connectivity of the system with the base-band (CPRI/OBSAI), the radio interface orboth. The card supports different wirelessstandards such as GSM/EDGE, UMTS,HSPA, WiMAX and LTE using differentexternal radio modules operating in the spe-cific frequency bands.

LTE Elaboration DatapathPrisma divided the LTE elaboration data-path into two sections: the radio front end,which we implemented in an FPGA, andthe physical-resource allocation and data-and control-channel termination, whichwe implemented in a DSP.

In the uplink direction, one DSP han-dles MAC-layer to physical-layer exchangeand some functions of the physical layer. It

We completed the design of the Xilinx-based SDR card with three 1-GHz TexasInstruments DSPs (we chose theTMS320C6455 device) and two pairs ofAnalog Devices analog-to-digital (AD9640)and digital-to-analog (AD9779) converters.The clocking network, based on an AnalogDevices AD9549, provides a very high, flex-ible timing base for the conversion and digi-tal signal-processing devices (FPGAs, DSPs).

Third Quarter 2010 Xcell Journal 37





LSUv3UeSIM + S1/X2

Figure 1 – In an LTE test scenario, the simulator either replaces a radio sector or provides a test interface for the core network.

Figure 2 – Xilinx Virtex-5 FPGAs reside on the LSU’s software-defined radio card, along with TI DSPs.


Page 38: Xcell Journal issue 72

provides coding, interleaving, scrambling,symbol mapping and subcarrier allocationwith reference signal (pilots), source dataand control channels. Discrete Fouriertransform (DFT) functions transform datafrom different terminals according to theSC-FDMA standard. The system transfersevery OFDM symbol to the uplink FPGAusing an EMIF interface.

This FPGA changes the data rate from125 MHz (DSP EMIF interface clock) to245.76 MHz (the FPGA elaborationrate). Then the FPGA performs a numberof other operations: a 2,048-point inversefast Fourier transform, a cyclic prefixinsertion, a PRACH data channel inser-tion, a half-shift function that translatesthe OFDM symbol spectrum at 7.5 kHz,a shaping and interpolation filtering andan intermediate-frequency (IF) conver-sion at 24 MHz. The device sends IF datato the DAC at a clock rate of 122.88MHz. The radio card, meanwhile, con-verts the analog signal to RF and sends itto the transmitter amplifier.

In the downlink direction, after theLNA amplification, programmable-gainand conversion stages, the radio card willsend IF received data to the SDR card (140MHz). The ADCs subsample the analogdata at 122.88 MHz and the FPGA han-dles the final 17.12-MHz frequency con-version to baseband. This data can berelated either to two single-input, single-output channels or to one MIMO channel.

The IF data enters into the downlinkFPGA, which converts it to baseband andthen filters it. Polyphase decimation filtersimplement Nyquist filtering, spectrumimage rejection and data-rate reduction at asymbol rate of 30.72 MHz, even thoughthe chip rate remains at 245.76 MHz.

The FPGA incoming data flow lookslike a stream of data instead of a series ofOFDM symbols. The synchronizationfunction slices the data stream properly todelineate the OFDM symbols. (To achievethis result, the synchronization circuit mustdetect Zadoff-Chu primary synchroniza-tion signals using multiple correlators on

deeply decimated input data. Afterwards, itwill be possible to obtain OFDM symbols.)Finally, FFT transformation follows theremoval of the cyclic prefix and the result-ing data passes to another DSP using theEMIF interface.

The downlink flow involves two DSPsmutually connected by means of a serialRapidIO interface. These DSPs performfrequency correction, channel estimation,equalization and MIMO decoding. Thenthey do data- and control-channel extrac-tion, Viterbi and turbo decoding, deinter-leaving and descrambling prior toMAC-layer interworking.

On the uplink side, the third FPGAhandles the loopback test between uplinkand downlink FPGAs and ensures theSDR board’s conformance to theCPRI/OBSAI standards.

Our design team extensively used XilinxCORE Generator™ IP to produce filters,DDS, FFTs, Block RAMs, FIFOs andMACC functions, using DSP48E andDCMs for the clocking deskew section of

38 Xcell Journal Third Quarter 2010




2048 x 32


2048 x 32

125 MHz 245.76 MHz

2048 samples

512 x 32





7.5 kHz


Complex Multiplier

Complex Multiplier

Complex Multiplier

Complex Multiplier

Complex Multiplier

Cyclic PrefixInsertion




Shaping andInterpolation Filters











Tx On/Off


Demodulator RDY




0.96 - 1.92 - 2.88 - 3.84 - 4.8 -5.76 - 6.72 - 7.68 - 8.64 MHz


Ctrl Block



8 FIRTransit MEM









Figure 3 – The front-end “uplink FPGA” implements inverse FFT, cyclic prefix insertion, filtering, IF upconversion and other operations for time-division duplex and PRACH handling. The system sends the same signal into two DACs for redundancy.


Page 39: Xcell Journal issue 72

Third Quarter 2010 Xcell Journal 39

the design. This massive instantiationmethodology produced a compact designin a reduced development time.

FPGA Design StrategyBecause this project had a very aggressivetime-to-market deadline, our team made acareful analysis of functions partitioningamong FPGAs and DSPs. It’s worth notingthat the FPGAs would have accommodatedeven more LTE functions, but one of ourdesign goals was to find a balance betweenthe FPGA and DSP sections of the system.

The FPGA clock rate was one of thetougher challenges in this design. Using aclock rate of 245.76 MHz for a large design

like a modulation system is not a trivialmatter. Our design team had many issuesto consider, such as power consumption,design constraints, placement and routingefforts. Nevertheless, thanks to the ISE®

Design Suite, which produced stable andquality results over the various design itera-tions, an oversampled factor of eight(FPGA clock rate/OFDM symbol rate)kept design items like filters and FFTtransforms as small as possible with respectto the required LTE functionality. The ISEsoftware also helped us achieve a reason-able synchronization circuit area.

Key to our design was devising a radiocard architecture that in uplink, instead of

using a direct-conversion methodologywith I/Q unbalance drawback, receivedthe FPGA data from an intermediate fre-quency. Using Xilinx Direct DigitalSynthesizers, an 18-bit sine/cosine waveperformed a perfect signal carrier to thecomplex modulation, as confirmed by theerror vector magnitude measured on thetransmitted radio signal.

Thanks to the use of Xilinx Virtex-5FPGA and TI DSP technologies, the LSUUeSIM LTE Simulator has become theleading-edge test equipment for load-and-stress solutions in the cellular world. It pro-vides a powerful, flexible and scalablesolution for SDR systems.

(2x2048x32) Antialias andDecimation Filters

Zadoff-ChuDecimation Filters

LPB data125 MHz 245.76 MHz








18 14


4 FIRRateAdaptMEM


3 CorrelatorsSynchronizer 128 FIR



2048 samples

2048 x 322048 x 32




Ctrl Block


(2x2048x32) Antialias andDecimation Filters

Cyclic PrefixRemoval



18 14


4 FIRRateAdaptMEM


2048 samples

2048 x 322048 x 32



Complex Multiplier

Complex Multiplier

Figure 4 – The front-end “downlink FPGA” implements IF downconversion, polyphase decimation filtering, synchronization, cyclic prefix removal and direct FFT. The system uses two chains to support MIMO operations for TDD and FDD modes.

Because this project had a very aggressive time-to-market deadline, we made a careful analysis of functions partitioning. The FPGAs would have

accommodated even more LTE functions, but one of our design goals was to find a balance between the system's FPGA and DSP sections.


Page 40: Xcell Journal issue 72

40 Xcell Journal Third Quarter 2010

Maintaining Repeatable Results in Xilinx FPGA DesignsMaintaining Repeatable Results in Xilinx FPGA DesignsHere are some tips and tricks to use during the HDL design, synthesis and implementation phases to sustain the required timing for your design.Here are some tips and tricks to use during the HDL design, synthesis and implementation phases to sustain the required timing for your design.


Page 41: Xcell Journal issue 72

by Kate Kelley Staff Product Marketing EngineerXilinx, [email protected]

Meeting the timing requirements in adesign can be difficult in itself, but pro-ducing a design whose timing is 100 per-cent repeatable can sometimes seemnearly impossible. Fortunately, designershave access to design flow concepts thatcan help to maintain repeatable timingresults. The four areas that have the mostimpact are HDL design practice, synthesisoptimizations, floorplanning and imple-mentation options.

Designs with very high resource utiliza-tion and frequency (QoR) requirements arethe most challenging in terms of obtainingrepeatable results. They are also the designsthat need a repeatable-results flow themost. The first step in getting repeatableresults is to use good design practices dur-ing the HDL design phase. Following goodhierarchical-boundary practices helps tokeep logic together, which helps to main-tain repeatable results when making designchanges. One good rule is to put logic thatneeds to be optimized, implemented andverified together in the same hierarchy.Also, register the inputs and outputs ofmodules. This keeps the timing paths con-tained within a module, so that changes toone module are less likely to affect anothermodule. Finally, keep all logic that needs tobe packed into larger FPGA resources(Block RAM or DSP, for example) in thesame level of hierarchy.

Logic LevelsRepeatable results are very difficult toobtain from designs that have too manylookup-table (LUT) logic levels for therequired QoR results. Often, it is not theLUT delay that is the issue, but rather therouting delay between the LUTs. This isextremely important in the high-perform-ance areas of the design.

Some common sources of too many logiclevels include large if/else constructs andlarge case statements. When appropriate, use“full_case” and “parallel_case” Verilog direc-tives to optimize the case statement with less

Think Local, Not Global” (http://www.xilinx.com/support/documentation/white_papers/wp272.pdf). For more informationon control sets, see WP309, “Targeting andRetargeting Guide for Spartan®-6 FPGAs”(http://www.xilinx.com/support/documentation/white_papers/wp309.pdf ).While this whitepaper is specific to Spartan-6 devices, itcontains good general information applica-ble to all FPGAs.

Understanding FPGA ResourcesIt is important to understand what FPGAresources are available and when it is bestto use them. Often, there are synthesisdirectives to define which resources touse. For instance, Block RAM is best fordeep-memory requirements, while dis-tributed RAM works well for wide buses,especially where regional clocks are clock-ing high-speed data. Both Block RAMand distributed RAM can have issueswith large fanout on control signals.Duplicating control signals and usingfloorplanning techniques to keep blockswith the same signals together can helpmaintain repeatable results.

Shift registers can reduce the utiliza-tion of a design, which helps repeatabili-ty. There are several performance issues tokeep in mind. The clock-to-out of anSRL is slower than clock-to-out of a flip-flop; therefore, it is best to use a flip-flopas the last stage of a shift register. Mostsynthesis tools do this automatically, butif there is an issue with a path involvingshift registers, it is good to confirm thatthe last stage is a register.

Similar issues are associated with theinitial register. Having a flip-flop in frontof an SRL gives the placer more options tomeet timing, therefore maintaining results.Again, most synthesis tools do this auto-matically, but if there is an issue with a pathinvolving shift registers, it is good to con-firm that the first stage is a register.

FPGAs have many registers, makingpipelining a useful technique to improveperformance. It is important to disable SRLinference on multiple pipelined flip-flops.

The white paper cited above on HDLcoding practices (WP231) offers moreinformation on Block RAM. For more

logic, a technique that often results in fewerlogic levels. Large multiplexers or decoderscan create routing congestion, resulting inunrepeatable results. A multistage registeredmultiplexer/decoder path can help with thisissue. For adders, using a registered adderchain instead of a registered adder tree canimprove performance. The chain will intro-duce more latency than the tree if all theadders are registered.

For more information on best practicesfor coding, see the Xilinx® white paper“HDL Coding Practices to AccelerateDesign Performance” (WP231) athttp://www.xilinx.com/support/documentation/white_papers/wp231.pdf.

Resets and Other Control SignalsThe choice of resets affects the perform-ance, area and power of a design. A globalreset is not necessary for circuit initializa-tion on power-up, but it can have a majoreffect on the type of resources you can usein a design. Shift registers (SRLs) cannot beinferred if there is a global reset in theHDL. One shift register produces more-repeatable results than 10 registers.

Also, the DSP and Block RAM registerscontain only synchronous resets. If you putan asynchronous reset in the code, theseregisters cannot be used, forcing the designto use configurable logic block (CLB) reg-isters instead. The same results are easier tomaintain if the registers are packed into theDSP, Block RAM or both.

Using synchronous resets on generallogic might reduce the levels of logic. Theslice registers can have asynchronous orsynchronous resets. If the design uses thesynchronous reset, then the synchronousset is available for use by the combinator-ial logic. This could reduce the logic levelsby one LUT.

A control set consists of a unique group-ing of clock, clock enable, set, reset and, inthe case of distributed RAM, write-enablesignals. Control set information is impor-tant because registers must share the samecontrol set to be packed in the same slice.This can affect packing and utilization, cre-ating repeatable-result issues.

For more information on resets, seeXilinx WP272, “Get Smart About Reset:

Third Quarter 2010 Xcell Journal 41


Page 42: Xcell Journal issue 72

information on shift registers, see WP271,“Saving Costs with the SRL16E”(http://www.xilinx.com/support/documentation/white_papers/wp271.pdf ).

Clock Domain IssuesDesigners must take care to correctly con-strain paths that cross unrelated clockdomains. The tools automatically relateclocks from the same source clock (forexample, a DCM). The PERIOD con-straint can also relate external clocks.Unrelated clocks that are not created inter-nal to the device take special consideration.By default, the tools will not constrainthese clocks. If there are special timing con-siderations, designers must use theFROM:TO constraint to correctly con-strain the path. The DATAPATHONLYkeyword tells the tools not to include clockskew in the equations.

For more information, see the“Asynchronous Clock Domains” section inUG625, Xilinx Constraints Guide(http://www.xilinx.com/support/documentation/sw_manuals/xilinx11/cgd.pdf), or WP257,“What Are PERIOD Constraints?”(http://www.xilinx.com/support/documentation/white_papers/wp257.pdf).

It is also important to ensure that raceconditions do not occur. FIFOs can helpwhen crossing from one domain to anoth-er. Otherwise, designers should double-synchronize one—and only one—controlsignal, and use it in the receiving clockdomain to receive other signals.

High-Fanout SignalsOften, high-fanout signals can be the gat-ing factor in a design. Even though mostsynthesis tools have fanout control, it’s agood idea to duplicate these signals in theHDL to get more-repeatable results.

Designers should combine this tactic withdirectives to ensure that the synthesistools do not remove the duplicates. If ahigh-fanout signal is in the top-levellogic, one method is to duplicate the sig-nal and then drive each top-level modulewith a separate signal.

If the synthesis tool fanout control isnot giving the desired results and modify-ing the HDL is not a palatable option,then using the Register Duplication con-straint within the MAP logic of theBRAM, along with the max-fanout con-straint, often makes better register dupli-cation choices than synthesis. For moreinformation, see MAX_FANOUT in theConstraints Guide (UG625).

As a general debug issue, meanwhile, itis easier to trace a problem path if a signalname is kept constant when crossing hier-archies. If the name constantly changes, itis difficult to follow in the timing reportsand other debug output. It is also helpful toput the signal direction on the port defini-tions for all modules or entities.

Synthesis OptimizationsSynthesis has a big effect on repeatableresults. If the output netlist from synthesisis not optimal, then it is impossible to haveideal conditions in the implementationtools. Designers can use several synthesistechniques to help improve implementa-tion results.

It is important to use timing constraintswhen running synthesis. Often, users over-constrain during synthesis, then relax thetiming constraints in the Xilinx implemen-tation tools. This makes the synthesis toolwork harder, relieving the burden on theimplementation tools.

Use the timing report from the synthe-sis tools. If a path is failing timing in syn-

thesis and implementation, modify theHDL or synthesis options to meet timingafter synthesis. This saves time duringimplementation runs.

The best way to have repeatable resultsin the implementation tools is to haverepeatable results during synthesis. Mostsynthesis tools support a bottom-up flow,with separate synthesis projects for the toplevel of the design and each of the lower-level modules. The user is in control ofwhich netlist is updated, based upon HDLchanges. Most commercially available syn-thesis tools have an incremental flow.

Importance of FloorplanningFloorplanning locks placement of compo-nents to a specific location in the design orto a range. This reduces the variability ofplacement, increasing the repeatability of adesign. You can almost always obtain betterperformance by floorplanning or by usinglocation constraints, or both.

That said, a bad floorplan or poorlocation-constraint choices can make itimpossible to meet timing. Floorplanningis somewhat of an art and requiresadvanced knowledge of the tools and thedesign. You can use implementationresults that meet timing as a guide to cre-ating a good floorplan.

If board requirements are the mainmeans for selecting pinouts, FPGA imple-mentation tools might have a difficult timemaintaining repeatable results. But design-ers have access to several techniques thatcan help achieve repeatability.

First, be aware of the data flow. Forexample, data can go from the center I/Osto the side I/Os. Keep all of the pins associ-ated with the bus in the same area of theFPGA to limit the routing distance on con-trol signals. Place I/O bus control signals

42 Xcell Journal Third Quarter 2010


It is important to use timing constraints when running synthesis.

Often, users overconstrain during synthesis, t hen relax the timing

constraints in the Xilinx implementation tools. This makes the synthesis

tool work harder, relieving the burden on the implementation tools.

Page 43: Xcell Journal issue 72

Third Quarter 2010 Xcell Journal 43

near the address and data buses. Signalsthat are to be optimized together need tobe placed together. If board routing is abigger concern, pipelining registers on theI/Os can help FPGA routing with less-than-ideal pinouts.

Area Group FloorplanningArea group floorplanning is a high-levelfloorplanning technique that defines wheremodules are located within the FPGA. It isvery easy to do, but it is often misused,delivering poor floorplans that create moreissues than they resolve. There are generalguidelines for good floorplans that willhelp you avoid these pitfalls.

Keep utilization similar across all areagroups. For example, do not have one at60 percent and another at 99 percent. Donot overlap area groups. The one excep-tion is if two different area groups havesome logic elements that need to be

placed together, it is acceptable to overlapby one or two rows or columns of CLBs.The user is then responsible for makingsure there are enough resources for botharea group constraints.

If two different logical portions of thedesign need to be in the same physical loca-tion, put both of them into the same areagroup. One level of nesting, a child areagroup within a parent area group, is typi-cally acceptable. This can be necessary if asmall portion of a larger area group needsto be located in a tight region.

It is important to floorplan only thecritical portion of the design and let thetools determine the placement of the non-critical logic. Logic connected to fixedresources (for example, I/O, transceiver orprocessor blocks) might benefit fromfloorplanning. Use the results of a goodimplementation run as a guideline toidentify placement or timing issues. Tools

like Xilinx’s PlanAhead™ software(Figure 1) and Timing Analyzer can helpvisualize the issues.

It is often helpful to minimize thenumber of regions used for each globalclock and the number of clocks (regionaland global) in each region. Do not over-constrain and plan accordingly if you aregoing to add more logic to a clock region.When all clocks in a clock region are used,it can be difficult to find a valid place-ment. The PlanAhead software’s ability tosnap to a clock region will make this floor-planning easier. For Virtex® FPGAdesigns with more than 10 global clocks,the clock regions used in the currentimplementation are in the .map reportfile, along with UCF constraints.

For more information on area groupfloorplanning, see UG632, PlanAhead User Guide (http://www.xilinx.com/support/documentation/sw_manuals/xilinx11/


Figure 1 – PlanAhead shows connections between modules, providing guidance when creating an area group floorplan.

Page 44: Xcell Journal issue 72

PlanAhead_UserGuide.pdf), and UG633,Floorplanning Methodology Guide (http://www.xilinx.com/support/documentation/sw_manuals/xilinx12_1/Floorplanning_Methodology_Guide.pdf ).

Locating Blocks, Module, PathOften, locating the core components, suchas the Block RAMs, FIFOs, DSPs, DCMsand global clocking resources, can help

achieve repeatability. This is best done bylooking at a good placement and using thedesign knowledge to verify these are goodlocations. You need to take control-signaland data flow (bus alignment) into accountwhen locating these BRAM, FIFO and DSPcomponents. Constraints to locate the clockregions of an existing design are found in the.map report file. Keeping the same clockregions prevents the placer from makingchanges to the clock region partitioning,which could change the floorplanning of thedesign. Use reportgen – clock_regionsdesign.ncd to create the report.

PlanAhead software has the ability tolock down all the placement informationon critical modules. On the next run, theplacement is the same, but the routing

information is not saved. More informa-tion on location constraints in thePlanAhead software can be found in the“Floorplanning the Design” chapter inUG632, PlanAhead User Guide; inUG633, Floorplanning MethodologyGuide, and in the PlanAhead tutorial.

If locking down an entire module isoverkill, it is possible within PlanAheadsoftware to lock down a critical path. But

you should use this technique only in avery limited manner. If there is a specificpath that is causing the majority of prob-lems, it is better to fix the timing issue bychanging the HDL. If this is not possible,limited use of locating specific timingpaths can be helpful.

Implementation OptionsSeveral options in the implementationtools improve repeatability. Designpreservation using partitions is the bestmethodology to preserve implementa-tion, but it is not a good fit for all designsand it does have HDL design require-ments. Xilinx SmartGuide™ technologyis another option for maintaining repeat-able results. This is best for designs that

are not pushing the absolute maximumQoR or utilization. If neither designpreservation nor SmartGuide technologyis appropriate for a design, then useSmartXplorer or PlanAhead softwarestrategies to maintain timing.

For designs with high QoR requirements,there are advanced implementation optionsto help maintain timing. Often, managingutilization is key to maintaining repeatable

results. As designs increase in size it is moredifficult to maintain results. Staying with thesame software release for the entire designphase helps achieve repeatable results.

Design PreservationThe design preservation flow in PlanAheadmakes use of partitions; this is the onlymethod that guarantees repeatable results.The main goal of design preservation is toenable consistent module performance toreduce the amount of time spent in thetiming-closure phase. It also requires thegreatest commitment from the user to fol-lowing good design practices.

Partitions preserve unchanged portionsof the design that have been previouslyimplemented. If a partition’s netlist is

44 Xcell Journal Third Quarter 2010


Figure 2 –The design preservation flow locks down unchanged portions of the design and implements the rest.

Page 45: Xcell Journal issue 72

Third Quarter 2010 Xcell Journal 45

unchanged, the implementation tools use acopy-and-paste process to guarantee thatthe implementation data for that partitionis preserved. By preserving implementationresults, partitions let you implement themodified portions of the design withoutaffecting the preserved portion. In Figure2, the red module has been changed and istherefore implemented, while the rest ofthe modules are locked in place.

In version 12.1 and future releases, thePlanAhead software and command-linetools support design preservation. For moreinformation, see WP362, “RepeatableResults with Design Preservation”(http://www.xilinx.com/support/documentation/white_papers/wp362.pdf ), and UG748,Hierarchical Design Methodology Guide(http://www.xilinx.com/support/documentation/sw_manuals/xilinx12_1/Hierarchical_Design_Methodology_Guide.pdf).

SmartGuide TechnologySmartGuide technology uses the previousimplementation results as a starting pointwhen running an implementation. Themain goal is to reduce run-time. Guidedplacement, routing or both can be movedin order to route the design or meet timing.SmartGuide technology works best fordesigns that are not trying to push the limiton QoR or utilization.

Previous versions of the tools featured anexact guide and a leveraged guide. Often,the exact-guide methodology resulted inunroutable designs. If exact preservation isrequired, then the suggested flow is designpreservation. SmartGuide technology is thereplacement for the leveraged guide.

Designers often ask whether to useSmartGuide technology or partitions. Theanswer depends upon where you are in the

design flow. SmartGuide technologyworks best at the end of the design cyclewhen you are making small designchanges. Using this flow, it is easy to see ifthe proposed changes work with thedesign. Partitions require a greater com-mitment up front to following gooddesign hierarchy rules. You should decidewhether to adopt the design preservationflow with partitions when starting toorganize the HDL. An exception to thisrule is when the design already follows thehierarchical rules for partitions.

For more information, refer to UG748,the Hierarchical Design Methodology Guide(http://www.xilinx.com/support/documentation/sw_manuals/xilinx12_1/Hierarchical_Design_Methodology_Guide.pdf ).

SmartXplorerSmartXplorer and PlanAhead softwarestrategies are similar tools that help achievetiming closure. They run different sets ofimplementation options to find the best fitfor the design. You can then use these resultsto see what placements tend to have bettertiming results and to create good area groupfloorplanning. The different results can alsopoint to a design issue. If the same path isfailing across all runs, it is beneficial tochange the HDL to remove the timing issue.

In the beginning of the design, it is bestto use the default effort levels for MAP andPAR. Using too many advanced options inthe beginning can hide a timing issue thatmight be best solved by modifying theHDL. When the device utilization increas-es, it becomes harder and harder for thetools to converge on a solution that meetstiming. If you use the default options, thenthe higher-effort options are available toget the last few picoseconds of timing later

in the design flow, allowing timing resultsto be maintained.

Designs with low utilization ofLUTS/FFS (<25 percent) or high utiliza-tion of LUTS/FFS (>75 percent) can bedifficult to place and route with consis-tency. For designs with high utilization,look at slice control sets, resets (FPGAsoften do not require synchronousresets/sets), modules with higher-than-expected logic usage (easily done inPlanAhead) or SRL/DSP48 inference.

The flip side of high utilization is low uti-lization. With designs that have 25 percentutilization or less of all component types, thelow-utilization algorithm takes effect andkeeps components tightly placed. However,if I/O utilization exceeds 25 percent, thenthe implementation tools could spread outthe design in order to keep logic near theI/Os. Careful placement of I/Os and use ofarea groups can minimize this issue.

Software ReleasesTry to use the same major software releaseduring the timing-closure phase. Becausealgorithms change from one release toanother, a technique that worked in onemight not work the next time. Also, meth-ods that rely on the previous results (parti-tions and SmartGuide technology) mightnot work across major releases.

The best way to influence design repeata-bility is to follow good design methodologyin the HDL and fix any timing issues bychanging the HDL. If that is not possible,synthesis, floorplanning and implementationtechniques can help. Design preservationusing partitions is the flow that guaranteesinstance performance. SmartGuide technol-ogy is another solution that uses previousimplementation results.


The best way to in f luence design repeatabi l i ty is to fo l low

good design methodology in the HDL and f ix any t iming issues

by changing the HDL. I f that is not possible, synthesis,

f loorplanning and implementat ion techniques can help .

Page 46: Xcell Journal issue 72

46 Xcell Journal Third Quarter 2010

A Tutorial on Timing Constraints for Xilinx FPGA DesignsTiming constraints can be a designer’s best friend, and help you get your designs out the door quickly.


Page 47: Xcell Journal issue 72

constraint, the lower its priority.Conversely, the more specific a constraint,the higher its priority. For example, aPERIOD constraint on a clock net or net-work is very general and will be overruledby a higher-priority FROM:TO constraint ona specific net or network.

The specific constraint for theFROM:TO (or FROM:THRU:TO) isdeemed more important than the moregeneral constraint for any net within aclock domain.

To help you prioritize constraints, youcan run the Xilinx® Timing Analyzer (astatic timing-analysis tool in the ISE®

Design Suite) and have it generate a tim-ing-specification interaction report, or as itis commonly called, a .tsi report. Thisreport will let you see how the constraintsare interacting and what priorities the toolhas set them to by default.

You can override the assumed prioritiesand manually set the priority of any tim-ing constraint by using the PRIORITYconstraint keyword. This is especially use-ful in situations where there is a conflictbetween two or more timing constraintsthat cover the same path. Priority heremeans which of any number of timingconstraints will be applied if two or moreconstraints cover the path. The other,lower constraints are ignored. You can setpriority from -10 to +10. The lower thePRIORITY value, the higher the priority.Note that this value does not affect whichpaths are placed and routed first. It onlyaffects which constraint controls the pathwhen two constraints of equal prioritycover the same path.

Let’s take a closer look at the followingexample in which PERIOD only coversnets from synchronous elements to syn-chronous elements, like FFS to FFS (con-straints are in blue, below):

NET "clk20" TNM_NET = “tnm_clk20";


“tnm_clk20" 20 ns HIGH 50 %;

A TIMEGRP (timing group) is createdcalled tnm_clk20 and contains all of thedownstream synchronous components thatnet clk20 drives. All of the paths between

by Austin LeseaPrincipal EngineerXilinx, [email protected]

As someone who regularly participates inXilinx’s user forums (see http://forums.xilinx.com), I’ve noticed that new usersoften find timing closure, and the use oftiming constraints to achieve it, a mystery.To help those who are new to FPGA designachieve timing closure, let’s take an in-depth look at timing constraints and howyou can leverage them to get optimalresults in your FPGA design projects.

What Are Timing Constraints?To guarantee your design will be successful,you have to ensure that it will perform thetasks it was designed to do in a specifictime frame. To make sure this happens, weapply timing constraints to the nets—thepath or paths taken from one FPGA ele-ment to the inputs of subsequent elementsin the FPGA or on the PCB in which theFPGA resides.

In FPGAs, there are mainly four typesof timing constraints: PERIOD, OFFSETIN, OFFSET OUT and FROM:TO (mul-ticycle) constraints.

PERIOD Constraint and GroupingEvery synchronous design will have at leastone PERIOD constraint (Clock PeriodSpecification), the most basic type of con-straint, which specifies the clock and itsduty cycle. If there is more than one clockin your design, each clock will have its ownPERIOD constraint. The PERIOD con-straint will dictate how we must route netsto meet the timing requirements a designneeds to operate properly.

To simplify the process of applying tim-ing constraints, you’ll often group nets thathave similar attributes as, for example, abus or a control group. Doing so will alsohelp you perform the critical step of prop-erly prioritizing design constraints.

Prioritize Design ConstraintsWhen you have a design with multipleconstraints, you need to prioritize thoseconstraints. Typically, the more general the

Third Quarter 2010 Xcell Journal 47


Page 48: Xcell Journal issue 72

these synchronous elements are then con-strained with the timing specification“TS_clk20: 20 ns”—a 20-nanosecondrequirement from synchronous element tosynchronous element. “HIGH 50%”means that clk20 has a 50/50 duty cycle.

In a second example, we use FROM:TOconstraints to define a requirement for pathsthat go between two groups, as shown:

TIMESPEC TS_my_fromto = FROM

my_from_grp TO my_to_grp 40 ns;

That command tells the tools to ensurethat data makes it from the componentsin the timing group “my_from_grp” to“my_to_grp” in 40 ns. Timing Analyzerwill still calculate the clock skew fromsource group to destination group but ata lower priority if the clocks are related.You can also use predefined groups suchas the following:


TO FFS 40 ns;

If you need to leave out the time unit(nanoseconds, picoseconds, etc.), then thetools automatically assume everything is innanoseconds. For example, you could writea constraint as



You could also leave FROM or TO off theconstraint and make it more generic:


As previously stated, the tools will auto-matically assume all these FROM:TO con-straints in the examples above are higherpriority than the PERIOD constraintunless you specify otherwise.

Closer Look at a .tsi ReportIn addition to helping you observe timingconstraint interactions, the .tsi report will alsomake suggestions on ways to improve con-straints in the universal constraints file(UCF). The report will also notify you if anypaths are constrained by multiple clockdomains. Here is an example of a constraintinteraction report:

Constraint interactions for

TS_clk0_1 = PERIOD TIMEGRP "clk0_1"

TS_clk HIGH 50%;

1 paths removed by TS_my_fromto =


TIMEGRP "FFS" 40 ns;

In this example, the higher-priorityFROM:TO constraint (just one) wasapplied ahead of the PERIOD constraint.

Setup and HoldIn a practical synchronous digital system,the data must arrive before the clock edgethat samples it. The minimum amount oftime it takes for this to happen is called the“setup time.”

As well as arriving before the clock edge,the data must persist for some finite amountof time at the clock edge, a period called “holdtime.” A hold time may be negative, in whichcase the data goes away before the clock edge;zero, which means the data persists until theclock edge samples it; or positive, whichmeans the data persists for some time after theclock edge has completed sampling it.

By design, in the FPGA fabric, for allspeed grades, hold times are not positive(they are either zero or negative). This sim-plifies the placement and routing, since thedata only needs to arrive before the clockedge and is allowed to change immediatelyafter a clock edge sampling takes place.

The value by which the data exceeds theminimum setup time is known as slack.

Slack should always be positive. If a reportshows a negative slack, then the setup tim-ing has not been met adequately—the dataarrived too late.

The clock path itself has delay, or skew.Thus, to analyze the timing, the tools willcalculate the arrival time of the data and theclock at the flip-flop of interest.

Easy Remedies for Constraint Violations To recap for a moment: The PERIOD con-straint defines the clock period for synchro-nous elements of interest, such as flip-flops.You can use the timing analyzer to verifythat all paths between synchronous ele-ments meet the setup-and-hold timing foryour design. A violation of this PERIODconstraint will appear in the timing reportand have a negative slack value, identifiedas violating either the setup requirement orthe hold requirement.

So what happens if the report shows thatthe setup has indeed been violated? Youknow that you will either have to find afaster path between the two synchronouselements in question, or at least a way toensure the data arrives at a proper time andsticks around long enough so that the clockedge registers it. If the place-and-route soft-ware cannot find a faster path, you have theoption of placing the path manually in theFPGA Editor tool.

But this is a tool of last resort. Do notuse it to solve problems before you havelearned how to solve the problems withoutit. Only use FPGA Editor to “see under thehood” and learn what the tools are doingwith your design in order to fit it into theFPGA device. Try first to rearchitect the cir-cuit to meet your design’s timing require-ment. One of the simpler ways to do this isto place a flip-flop earlier in the path. Thistechnique, known as pipelining, will addlatency to the signal, but it will also allowthe value to be captured properly.

48 Xcell Journal Third Quarter 2010

Only use FPGA Editor to ‘see under the hood’ and learn what the tools are doing with your design in order to fit it into the FPGA device.

Try first to rearchitect the circuit to meet your design’s timing requirement.


Page 49: Xcell Journal issue 72

Third Quarter 2010 Xcell Journal 49

If the hold has been violated (the datawent away before the clock edge arrived),then this is often an indication that youhave a design problem (a bad architecture).Values should only change on the clockedge, and not before. If an external value ischanging before the clock edge, you needto delay the clock edge (using a DCM orPLL) so that the data is now registeredproperly by the new, delayed clock.

An alternative is to use the IDELAY ele-ment in the input/output block to movethe data to where the clock is valid.

Data-Valid Window and MetastabilityThe time from before the clock edge(setup) plus the time after the edge (hold)is known as the “data-valid window,” or thetime the data must be stable to be properlyregistered. If the data is not valid for at leastthis amount of time, then the results areindeterminate, or unknown.

But just because the data was not validfor as long as required does not mean thatthe output of the flip-flop is metastable.Metastable is different from indetermi-nate. An output could be 0 or 1, seeming-ly at random, if the timing is not met.Metastability means the edge was “almost”capable of capturing the state, and theflip-flop output is in some intermediatestate (not 1, not 0) for some time after theclock edge. Metastability cannot be pre-vented, as it is a fact of the physics of thecircuits if the clock edge and the data arealmost perfectly “missed.”

In a properly designed synchronous sys-tem, there are no problems with metasta-bility. Metastability is a problem whensomething is asynchronous (like pressing akey on a keyboard) or when two synchro-nous clocks are asynchronous to each other.In general, when something is asynchro-nous, it needs to be synchronized.

If you would like to learn how to dealwith metastability, here is a link to a fabu-lous presentation on the subject: http://www.stanford.edu/class/ee183/handouts_spr2003/synchronization_pres.pdf. (Formore on metastability, see the secondFPGA101 article in this issue.)

Propagation Time and OFFSET ConstraintsThe time it takes to get a signal from pointA to point B is called the propagationtime. It is based on the speed of light, inthe medium it is in. For example, a traceon a printed-circuit board carries signals ataround 6 to 7 ps per millimeter. You canfind this number in a variety of ways,including running simulations or solvingequations when you know the dielectricconstant for the material and the geome-try of the wiring traces. Inside the silicondevice, the signals behave in much thesame way, but also may be delayed by

Now that we’ve gone through the basics of timing con-straints, let’s look at how we can put them to use, in this

case with double-data-rate (DDR) memory.DDR interfacing uses both the rising and falling edges of

the clock in a source-synchronous interface to capture or trans-fer twice as much data per clock cycle.

To properly constrain data arriving at the device, youmust first constrain the clock you are using to capture thedata. At the same time, you must also constrain the arrival ofthe data for both the rising and falling edges of the clock.

For this example, the complete OFFSET_IN specificationwith associated PERIOD constraint would look like this:

NET "SysCLk" TNM_NET = "SysClk";

TIMESPEC "TS_SysClk" = PERIOD "SysClk" 5 ns HIGH


OFFSET = IN 1.25 ns VALID 2.5 ns BEFORE "SysClk"


OFFSET = IN 1.25 ns VALID 2.5 ns BEFORE "SysClk"


where “VALID” and “BEFORE” are reserved words definingthe timing relationship in the constraint.

This global constraint covers both of the data bits of thebus, since in each clock period two bits are captured, namely?data1 and ?data2.

In much the same way as specifying when the data arrives,you also need to specify the output of the DDR data.

For this example, the complete OFFSET_OUT specifica-tion for both the rising and falling clock edges is based on theclock supplied to the DDR register:

NET “CLkIn” TNM_NET = “ClkIn”;





Note here that in the complete constraint format, OFF-SET=OUT <value>, determines the maximum time from therising clock edge at the input clock port until the data firstbecomes valid at the data output port of the FPGA device.

When you omit <value> from the OFFSET_OUT con-straint (as in the example above), the constraint becomes areport-only specification that reports the skew of the outputbus. The REFERENCE_PIN keyword defines the regenerat-ed output clock as the reference point against which the skewof the output data pins is reported.

Of course, do not forget that the output clock also needs aPERIOD constraint. It was not needed for the specificationof the output timing, but is required for getting the data tothe DDR output register.

– Austin Lesea

Putting Constraints to Use with DDR Memory


Page 50: Xcell Journal issue 72

going through active circuits such asbuffers, inverters, logic and interconnect.

You can also measure propagationtimes, often with the help of an oscillo-scope. Propagation times generally do notvary much at all when the path has noactive elements. If the path is in silicon, thestrength of the transistors will cause thepath delay to vary with both a maximumvalue and a minimum value. A designneeds to meet timing for both.

In order to tell the tools when dataarrives at a particular location, you needto use another type of constraint calledOFFSET_IN. An OFFSET_IN con-straint defines the relationship of a clockand data as they enter the device. Take, forexample, the following constraint:

OFFSET = IN 2 ns VALID 16 ns

BEFORE “clk20";

This constraint tells the tools that datawill be set up at PADs 2 ns before the clk20rising edge. It tells the tools that data willremain valid for 16 ns after it arrives. Thisconstraint applies only to PADs that go toregisters that are clocked by clk20 or aderivative (that is, a derived constraint).

OFFSET requires a PERIOD con-straint on clk20, so that it understands theclocking structure. This is also acceptable:

OFFSET = IN 2 ns BEFORE “clk20";

However, this constraint will not checkthe hold time, because we don’t know whenthe data goes away at the pin of the FPGA.If the data won’t be set up until 2 ns afterthe clock edge, then we use the following:

OFFSET = IN -2 ns VALID 16 ns

BEFORE “clk20"; #

While the OFFSET_IN pertains tothe clock and data entering the device,

another common constraint called OFF-SET_OUT defines the amount time ittakes for data to make it out of the deviceafter a clock transition at the input to theFPGA. Here is a common use of OFF-SET_OUT:

OFFSET = OUT 3 ns AFTER “clk20";

This constraint tells the tools that you needto ensure data is at the output pin of theFPGA 3 ns after a clock transition at theinput of the specified clock to the FPGA.This constraint applies only to PADs thatare driven by registers that are clocked byclk20 or a derivative (a derived constraint).OFFSET requires a PERIOD constrainton clk20, so that it understands the clock-ing structure. Hold times are not con-strained for OFFSET_OUT.

If we need the data 2 ns before the clockedge, then we use this:

OFFSET = OUT -2 ns AFTER “clk20";

Groups and Group NamesA time group is a way to identify a con-straint for a collection of paths or netsbetween synchronous elements. To addcomponents to a time group, you woulduse TNM, TNM_NET or TIMEGRP.

Paths are constrained by defininggroups and then giving requirementsbetween those groups. A few constraintsdo not require time groups, such as NETMAXDELAY. The maximum delay(MAXDELAY) attribute defines the max-imum allowable delay on a net.

Timing NamesTo add a component to a user-definedgroup, you can do the following:

[NET|INST|PIN] object_name TNM =

predefined_group identifier;

where “TNM” is a reserved word definingthe name for a timing group.

In this case, object_name is the name ofthe element or signal to be grouped, pre-defined_group is an optional keyword andidentifier can be any combination of let-ters, numbers or underscores.

Do not use reserved words such as FFS,LATCHES or RAMS. This variable is case-sensitive (TNM=abc is not the same asTNM=ABC). You can apply TNM to anynet, element pin, primitive or macro.

Components can be part of more thanone group. For example, my_ffs_groupTNM can have the my_ff component in it.Likewise, my_ffs_group2 TNM can alsohave the my_ff component in it.

To create a group

NET CLOCK TNM=clk_group;

you can make any keyword element into agroup for timing purposes. In this example,the NET CLOCK is traced forward to theflip-flops (FFS). These flip-flops are timing-named (TNM) with the name clk_group.Asa result, clk_group can now be referenced bythis TNM in TIMESPECs.

You can also create a group using aninstance, such as:



All LATCHES in the macro calledmacro1 will be in a group called latchgroup.Likewise, in the constraint INST mymacTNM = RAMS memories; all RAMS in themacro called mymac will be in a groupcalled memories. And in the constraint

INST tester TNM = coverall;

all PADS, LATCHES, RAMS and FFS inthe macro called tester will be in a groupcalled coverall. The applicable

50 Xcell Journal Third Quarter 2010

In genera l , t he f ewer cons t ra in t s t he be t t e r. Comp lex cons t ra in t s can o f t en cause more p rob l ems

than they so lve . You may w ish t o dec la re t ha t no cons t ra in t s shou ld be app l i ed t o c e r t a in noncr i t i ca l pa ths o r ne t s .


Page 51: Xcell Journal issue 72

Third Quarter 2010 Xcell Journal 51

Constraints Guide will contain a completelisting of the predefined groups.

Less Is More In general, the fewer constraints, the better.Complex constraints can often cause moreproblems than they solve. In addition,some paths or nets may be noncritical, andyou may wish to declare that no constraintsshould be applied to these nets.

TIG (timing-ignore) constraints areused to remove things we don’t care aboutor to remove constraints from a false path.Here is a common TIG:

NET "rst" TIG;

This tells the tools that you do not needto constrain this path. It is important to

spell this out so that the tools do not workto meet timing on paths you do not careabout. Setting timing to ignore such pathswill also reduce tool run-times and mayimprove the quality of the timing on thepaths you do care about.

You can also use TIG with FROM:TOconstraints, as in the following :

TIMESPEC TS_my_fromto = FROM

"my_to_grp" TO "FFS" TIG;

Xilinx has a number of great resourceson timing constraints, the most notableof which I’ve cited in the referencesbelow. Please feel free to contact me ifyou have any further questions. I inviteyou all to participate in Xilinx’s commu-nity forums, which offer a plethora of

insights and answers to some of FPGAdesign’s most vexing questions.


Constraints Guide: Constraint syntax for UCF,PCF, HDL, http://www.xilinx.com/support/documentation/sw_manuals/xilinx11/cgd.pdf

Timing Constraints User Guide: Conceptualinformation on how to constrain a design,http://www.xilinx.com/support/documentation/sw_manuals/xilinx11/ug612.pdf

Timing Analyzer Help: General information onhow to use Xilinx Timing Analyzer,http://www.xilinx.com/support/documentation/sw_manuals/help/iseguide/mergedProjects/timingan/timingan.htm


Page 52: Xcell Journal issue 72

52 Xcell Journal Third Quarter 2010

Simplifying Metastability with IDDRUse the flip-flop chain that’s part of the ILOGIC block in Xilinx FPGAs to limit metastability events in your designs.


Page 53: Xcell Journal issue 72

by Primitivo Matas Sanz Technology ExpertTelefonica I+D, Madrid, [email protected]

If you’ve ever used an FPGA in an asyn-chronous system with multiple clocks, orin one that uses a clock with a frequency orphase that differs from the one your FPGAuses, your design can encounter metastabil-ity problems. Unfortunately, if your designfalls into one of these system scenarios,there’s no way to completely eliminatemetastability, but there are several methodsyou can employ to reduce the likelihoodyour system will encounter it.

Let’s take a closer look at what causesmetastability and then examine somemethods we can employ to attack it.

What is Metastability? In synchronous-logic digital devices such asFPGAs, each device’s register cell has pre-defined signal-timing requirements thatallow the device to correctly capture dataand in turn generate a reliable output sig-nal. When another device sends data to theFPGA, the FPGA’s input register must bestable for a minimum setup time before theclock edge and also for a minimum holdtime after the clock edge to receive the sig-nal properly and in its entirety.

After the specified delay, the registeroutput can then send the signal to the restof the FPGA. But if a signal transition vio-lates the times specified, the output registermay go into the so-called metastable state,in which the register output will hover at avalue between the high and low states foran indeterminate period. The result is todelay the stable output state beyond thetime specified for the register, a conditionthat can cause a slight delay in performanceor a logical-behavior side effect.

Addressing the IssueUsually, to connect an FPGA to anotherdigital device that has a different clockdomain, we need to add a synchronizationstage to the input section of the FPGAand make the first register in the FPGAclock domain act as a synchronization reg-ister. To do this, we can use either a

XAPP094, at http://www.xilinx.com/support/documentation/application_notes/xapp094.pdf.

We can calculate the MTBF for one reg-ister with the following formula:

In this instance, C1 and C2 are constantsrelated to the register technology and tMET

is the metastability settling time.We can determine the overall MTBF by

looking at the MTBF of each register. Thefailure rate for a synchronizer is 1/MTBF,and we can calculate the failure rate for theentire design by adding the failure rates foreach synchronizer, as follows:

Given this formula, it’s clear that thereare ways to get a better MTBF. We can, forexample, improve the architecture of ourregister cells; optimize the design toincrease the tMET in the synchronizationregisters; or even increase the number ofregisters in the chain.

High-Level Code and Placement ResultWhen we find an input signal with apotential metastability problem, we canaddress the issue by simply creating a reg-

failure_ratedesign =number of chains







C1 • fCLK • fDATA

sequence of registers or a synchronizationregister chain in the FPGA device’s inputstage. This chain allows additional timefor a potentially metastable signal toresolve before the input registers pass thesignal to other regions of the FPGA. Themetastable settling time is typically muchless than a clock cycle, so a delay of evenhalf the clock period may reduce the prob-ability of a metastable value by manyorders of magnitude.

To reduce the chances of encountering ametastability problem, the sequence of reg-isters (wired as shift registers) that weimplement in a design must meet the fol-lowing criteria:

• All registers must be clocked by the sameclock or by the same phase-related clocks.

• Each register in the chain must fan outonly to the next register.

Because we cannot completely eliminatemetastability problems, we must still accountfor them. To do this, the design communityuses the term mean time between failures(MTBF) to estimate the average timebetween instances when the problem couldcause a failure. A higher MTBF indicates amore robust design. A “failure,” in this case,is a failure to resolve metastability and not anactual system failure per se.

To see how metastability is measured,read the Xilinx Application Note

Third Quarter 2010 Xcell Journal 53
















Figure 1 – Synchronizer chain showing placement by default


Page 54: Xcell Journal issue 72

ister chain with the same phase-relatedclock. When we do this, we will come upwith a circuit that resembles the one weshow in Figure 1.

In this figure, we placed the registerchain into two cells: the first is an ILOGICcell while the other two registers are withina SLICE cell (we select a chain with threeregisters and the same clock). This is onequick and fairly simple way to mitigatemetastability issues, but there are othersthat also optimize performance.

IDDR Method Using Xilinx ILOGIC Blocks In the Virtex®-4 and Virtex-5 FPGAs,Xilinx® places its ILOGIC blocks directlybehind the I/O drivers and receivers. Theblocks include four storage-element registersand a programmable absolute-delay element.

The Virtex-4 and Virtex-5 devices usethose four registers to implement inputdouble-data-rate (IDDR) registers, a fea-ture designers can access only by instantiat-ing the IDDR primitive. We can use this toour advantage.

One of the modes of operation for thisprimitive is called SAME_EDGE_PIPELINED. Figure 2 shows the inputDDR registers and the signals involved inusing this mode. The rectangle in greenshows a perfect sequence of registers wecan use to resolve the metastability prob-lem. What’s more, using this IDDRmethod has an additional advantage—namely, we can use two or three times asmany main clocks without introducingany latency problems into the design.

A Bit of Code Is All it TakesIn the Virtex-4 User Guide, pages 328-329,you can find examples that illustrate theinstantiation of the IDDR primitive inVHDL and Verilog. Here is a typical exam-ple in Verilog:



defparam IDDR_INT2.INIT_Q1 = 1'b1;

defparam IDDR_INT2.INIT_Q2 = 1'b1;

defparam IDDR_INT2.SRTYPE = "SYNC";

IDDR IDDR_INT2( .Q1(sync_data),

.Q2(signal_noload), .C(CLK_2X),

.CE(1'b1), .D(async_data),.R(), .S());

In Figure 3, we can see the new place-ment results. Using this methodology,we’ve placed the register chain into twocells: the first two registers are containedwithin an ILOGIC cell and the other is ina SLICE cell (here, we select a chain withthree registers and two different clocks, onethat is twice as fast as the other).

Overall, metastability issues can be aninconvenience in your design, but byemploying a few quick and easy fixes,including using the IDDR primitive in anew way, you can drastically reduce thechances your design will encountermetastability issues. By making use ofthese methods as you are creating thedesign, rather than afterwards, you cancraft metastability-resilient architecturesoptimized upfront for area, performanceand cost.

54 Xcell Journal Third Quarter 2010



















D Q1





















Figure 3 – Synchronizer chain showing placement with IDDR

Figure 2 – Input DDR in SAME_EDGE_PIPELINED mode


Page 55: Xcell Journal issue 72

ISE Design Suite: Logic Edition

Ultimate productivity for FPGA logic designLatest version number: 12.2Date of latest release: July 2010Previous release: 12.1URL to download the latest release: www.xilinx.com/download

Revision highlights: This latest release of the ISE Design Suiteexpands support for both the Spartan®-6(XA and XQ devices) and Virtex®-6 (XQdevices) FPGA families. In addition, theISE Design Suite: Logic Edition 12 boasts2X faster run-times for Xilinx SynthesisTechnology (XST) and a 1.3X speedup forimplantation of large designs, along with15 to 20 percent faster implementationrun-times using multithreading.

Project Navigator: Xilinx has madeimprovements to design hierarchy parsing,such as upfront HDL syntax error check-ing, user control for enabling or disablinghierarchy reparsing and full support for,and automatic detection of, “include” files,along with process dependency and sourcemanagement. The tool also supportsnetlist sources (EDIF, NGC/NGO) assubmodules in the design hierarchy,including process dependency and source

management. Project Navigator also nowincludes “find” support in the design hier-archy view, which enables you to searchfor sources in the design hierarchy basedon file name, module name or instancename, and to search for missing modules.Process status improvements include anew process monitor, improved statusindicator behavior and the ability to rundownstream processes based on the pres-ence of necessary files rather than the errorstatus of previous steps. Finally, theDesign Summary has a new SystemSettings report that displays environmentsettings and process properties used dur-ing design implementation.

Power optimization: Intelligent clock gat-ing, available for Virtex-6 (version 12.1)and Spartan-6 (12.2) devices, minimizeslogic toggling to reduce dynamic powerconsumption.

Partial reconfiguration: PR enables dynamicdesign modification of a configured FPGAfor Virtex-6. The ISE software (12.2) usespartition technology to define and imple-ment static and reconfigurable regions ofthe device. Note: This software featurerequires an additional license code.

FPGA Editor: This tool, which has approxi-mately 40 percent smaller memory foot-

print for large devices compared with the11.1 release, boasts faster loading of devicegraphics. The List Window has copy, pasteand cut keyboard shortcuts that can beused in the Name Filter.

ChipScope™: This version adds support forcontinuous trigger with multiple ILAcores. Analyzer adds a repetitive-run triggeroption to monitor repetitive events withouthaving to manually rearm the trigger.

Device programming: ChipScope Pro andiMPACT now support JTAG cables soldby third-party partners ByteTools andDigilent; new flash device support has alsobeen added to iMPACT. The followingthird-party devices may be programmed:Numonyx N25Q, Numonyx P30 (now upto 1Gb), Winbond W25Q, SpansionS25FLP and Spansion S29GLP.

ISim: Improved integration and interoper-ability deliver the ability to simulateembedded designs, with integration intoXPS and Project Navigator. The tool offerscomplete OS support, including nativesupport for 64-bit Windows, and ensuresease of use for batch-mode user (you canprogrammatically configure the waveformvia Tcl). Memory viewing and debug areeasier thanks to a new memory editor andviewers. The tool automatically parses thedesign and identifies memory elements.Waveform enhancements include the abili-ty to adjust the time scale automatically foroptimal viewing, and to override HDLstimulus with user-defined values.

PlanAhead™: A new, simpler and intuitiveRTL-to-bitstream pushbutton task-basedflow takes you through three main steps:synthesis, implementation, program anddebugging. Design preservation and partialreconfiguration are supported for the com-mand-line tools and the standalone versionof the PlanAhead software.

Third Quarter 2010 Xcell Journal 55

Xilinx Tool & IP UpdatesXTRA, XTRA

Xilinx continues to improve products and IP in the ISE® Design Suite. Here is thelatest quarterly release for Xilinx® design products and IP as of July 2010. Quarterlyreleases, which offer significant enhancements and new features to the ISE DesignSuite, are now full installations, designed to coexist with previous quarterly or majorXilinx releases. By installing the latest Xilinx release, you are taking advantage of aneasy way to ensure the best results for your design.

The latest release of ISE Design Suite is always available from the Xilinx DownloadCenter at www.xilinx.com/download. For more information or to download a free 30-day evaluation of the ISE Design Suite, visit www.xilinx.com/ise. Also, look fornew Xilinx tools and IP as well as new IP, tools and development boards from Xilinxpartners in the Tools of Xcellence section of this issue.

Page 56: Xcell Journal issue 72

SmartXplorer: SmartXplorer now supportssynthesis when using the command line.The new custom file format that deliversthis feature will also let you specify syn-thesis and implementation strategiessimultaneously. You can display areainformation in the SmartXplorer reporttable by using the –area_report option,and run the power analyzer and displaypower information in the report table byusing the –pwo option (command-linemode only). Also, you can use power as anadditional best-strategy selection criterionif you are using the –pwo option, andcontrol TRCE by using the –to option.This option lets you generate verboseTRCE reports during SmartXplorer runs.

XPower Analyzer: This version has reor-ganized several views and consolidatedrelated data in an organization similar tothat of XPE. The hierarchy view addsresource utilization for each hierarchylevel (LUTs, FFs), while additional statis-tical data appears in the tool-tip window.In clock domain view, you can edit thefrequency for all clocks and see addeddetails regarding clock tree topology. Itallows you to specify custom off-chip I/Otermination to calculate off-chip power.An added Confidence Level view assistsyou in obtaining realistic power data.

XPower Estimator: Readability is signifi-cantly improved with a new color schemeand improved presentation of data. Thetool provides power distribution byresource type on a summary sheet andreports total power supplied to I/O ter-mination (off-chip power). It allows youto specify custom off-chip I/O termina-tion to calculate off-chip power.

XST: This version adds inference supportfor asymmetric-port Block RAM. Poweroptimization dedicated to BRAM opti-mization is now implemented for Virtex-6 and Spartan-6 devices. For moreinformation see the –power option andRAM style (RAM_STYLE) constraint. Anew automax value for the Use DSPBlock (USE_DSP48) constraint instructsXST to maximize utilization of DSP

software design support. Simplified flows forcreating the initial C application will get youthrough your software design flow faster.

MicroBlaze™ Soft Processor: A new configura-tion wizard, with multiple startingMicroBlaze preconfiguration options, guidesyou through processor setup. The branch tar-get buffer stores recently taken branch loca-tions, speeding up future execution, whilebranch prediction keeps the instructionpipeline full, maximizing processor perform-ance. Victim cache stores recently flushedcache lines locally for faster processing.

Embedded IP: Spartan-6 and Virtex-6 produc-tion support (except the MPMC Virtex-6,which remains preproduction) means you candesign to the newest Xilinx FPGA devices.PLBv34 and OPB-based cores have beenremoved, providing faster download and asmaller IP installation footprint.

ISE Design Suite: DSP Edition

Flows and IP tailored to the needs of algorithm, system and hardware developersLatest version number: 12.2Date of latest release: July 2010Previous release: 12.1URL to download the latest patch: www.xilinx.com/download

Revision highlights: All ISE Design Suite Editions include theenhancements listed above for the ISEDesign Suite: Logic Edition. The followingis the list of enhancements specific to theDSP Edition.

64-bit OS support: Native support for the 64-bit version of Windows XP and WindowsVista 64-bit SPI allows System Generator toaccess the additional memory possible withthese operating systems.

Hardware co-simulation improvements: SystemGenerator now supports Ethernet point-to-point hardware co-simulation for theSpartan-6 FPGA SP601 and SP605 plat-forms. System Generator provides supportfor more than one JTAG cable to the com-puter, so that customers can have more thanone hardware co-simulation token in a

resources within the limits of availableresources on the selected device and lets youimplement more logic on DSP blocks thancan typically be achieved with the auto value.This can be particularly useful when a tight-ly packed device is your primary concern. Anew Shift Register Minimum Size (SHREG_MIN_SIZE) option allows you to controlthe minimum size of shift registers that areinferred and implemented using SRL-typeresources. While the default minimal size is2, you may need to raise that threshold formore efficient resource placement and cir-cuit performance.

ISE Design Suite: Embedded Edition

An integrated software solution for designingembedded processing systemsLatest version number: 12.2Date of latest release: July 2010Previous release: 12.1URL to download the latest patch: www.xilinx.com/download

Revision highlights: All ISE Design Suite Editions include theenhancements listed above for the ISEDesign Suite: Logic Edition. The followingis the list of enhancements specific to theEmbedded Edition.

Xilinx Platform Studio: XPS now supports theISim Simulator, to simulate embeddeddesigns using the included ISE HDL simula-tor. Native support for 64-bit Windows NTis included. XPS has the ability to “ExportHardware Design to SDK” from ISE ProjectNavigator—a fast, transparent setup of hard-ware definitions for software design.Simplified Project Options Dialog deliverseasier startup of embedded projects, while anew Cygwin version allows multiple installa-tions of Cygwin to exist on one system.

Software Design Kit: Xilinx has upgraded theSDK to the latest Eclipse version (v3.5.1),delivering all the most recent Eclipse stan-dards. Also, the SDK is updated to the latestC/C++ Development Tools release (CDTv6.0.1), to utilize the industry’s latest C/C++


56 Xcell Journal Third Quarter 2010

Page 57: Xcell Journal issue 72

Third Quarter 2010 Xcell Journal 57


design. For increased flexibility, it also sup-ports custom JTAG cables, like the cableused to connect ChipScope.

Enhanced integration with ChipScope: A JTAGhardware co-simulation design caninclude a ChipScope block to provideadditional visibility during debug. Youcan import repetitive-trigger data withthe xlLoadChipScopeData utility func-tion. Some situations require the repeti-tive-trigger mode to capture the necessarydata during debug.

Xilinx IP Updates

Name of IP: ISE IP Update 12.2Type of IP: All

Targeted application: Xilinx develops IP cores and partners withthird-party IP providers to decrease cus-tomer time-to-market. The powerful com-bination of Xilinx FPGAs with IP coresprovides functionality and performancesimilar to ASSPs, but with flexibility notpossible with ASSPs.

Latest version number: 12.2Date of latest release: July, 2010URL to access the latest version: www.xilinx.com/downloadInformational URL:www.xilinx.com/ipcenter

New Cores in ISE Design Suite 12

Video and image processing:• Image Characterization v1.0 —

Calculates important statistical data forvideo input streams. Characterization isan important processing block for manyapplications, including facial recognitionand object detection.

Wired communications:• SGMII over LVDS — Provides design-

ers with a GMII-to-SGMII bridge func-tion using LVDS (SelectIO™) insteadof transceivers for chip-to-chip applica-tions on Virtex-6 family devices. Thisnew feature is an addition to the

Ethernet 1000BASE-X PCS/PMA orSGMII IP LogiCORE™ IP v10.5. Thecore is included at no additional chargewith the ISE Design Suite software.

Wireless communications:

• 3GPP LTE RACH Detector v1.0 —Provides designers with an LTE RACHdetecting block, which decodes P-RACH data encoded according the3GPP TS 36.211 v8.6.0, PhysicalChannels and Modulation specification.

• DUC/DDC Compiler — Implementshigh-performance, optimized digitalup- and downconverter modules foruse in wireless base stations and othersuitable applications. In addition to awide range of parameter options,resource trade-off options enable you totailor the core to specific designrequirements. The core is included atno additional charge with the ISEDesign Suite software.

• 3GPP LTE Channel Estimator —Implements channel estimation to sup-port decoding of the Physical UplinkShared Channel (PUSCH) in 3GPP LTEeNodeB applications as defined in the3GPP TS 36.211 specification. Includessupport for single-input, single-output(SISO); single-input, multiple-output(SIMO); and multiuser multiple-input,multiple-output (MIMO) communica-tion modes.

• 3GPP LTE FFT — Implements alltransform lengths required by the3GPP LTE specification, including the1,536-point transform for 15-MHzbandwidth support.

New CORE Generator™ features:

• Enhanced assistance with design migra-tion, including improved messaging forcores and core versions that have beenremoved from 12.1. A new warningmessage advises users about newer IPversions that are available along withinformation on core versions in the cur-rent project that can be automatically

upgraded. The project IP panel nowdisplays missing IP as “upgradable,”“removed” or “unavailable.”

• IP catalog enhancements mean the IPcore now displays the life cycle status ofpreproduction and production deviceson a per-family basis.

• Xilinx has added the capability for auto-mated core upgrade to the latest versionfor the following IP cores: BlockMemory Generator v4.1, CICCompiler v2.0, Clocking Wizard v1.5,Fast Fourier Transform v7.1, FIFOGenerator v6.1, SelectIO InterfaceWizard v1.3 and System MonitorWizard v2.0

IP Updates – Highlights

• Memory IP: Block Memory Generatorv4.2 — Designers can use a new “writefirst” mode for single-dual-port (SDP)memory type (Virtex-6 only) instead of“read first” for SDP BRAM when theread and write ports are clocked by dif-ferent clocks. This reduces BRAM uti-lization in SDP mode. Block MemoryGenerator also now supports soft ham-ming error correction for SDP BRAMconfigurations for data widths < 64 bits(Virtex-6 and Spartan-6 only).

• FIFO Generator v6.2: This IP now sup-ports “write first” mode for SDP BRAM-based FIFO configurations for reducedBRAM utilization in SDP mode.

• SelectIO Wizard v1.4: Added supportfor Virtex-6 FPGAs

• Video IP: The Defective PixelCorrection, Gamma Correction, ColorCorrection Matrix and Color FilterArray Interpolation cores have increasedmaximum supported sensor resolutionto 4K x 4K. They now support Spartan-6 and Virtex-6 devices as well as 32- and64-bit Linux.

A comprehensive listing of cores that havebeen updated in this release is available atwww.xilinx.com/ipcenter/coregen/12_2_datasheets.htm.

Page 58: Xcell Journal issue 72

Over the past two years, Xilinx has strivento take full advantage of what Xilinx

CEO Moshe Gavrielov calls “the programma-ble imperative”—the confluence of marketforces driving hardware engineers away fromASICs and ASSPs and toward FPGAs.

As process nodes shrink, the cost of design-ing and developing ASIC and ASSP devicesincreases exponentially to the point where, atthe 40/45- and 28-nanometer nodes, thereturn on investment becomes untenable. Onthe other hand, FPGAs have taken advantageof these smaller process nodes to incorporatemore functions, making the programmabledevices an attractive alternative. And with theFPGA’s inherent flexibility and reduced NREcosts, engineers are finding FPGAs more com-pelling than ever before.

As the market for FPGAs has grown, therehas been increasing pressure to help engineersmake the transition from ASIC and ASSPdesign to FPGA design. In response, Xilinx®

has re-engineered its customer training pro-gram to make it more accessible to a widerand more diverse audience.

In the past, Xilinx customer training laggednew silicon or software releases by six months.This generally aligned to the time it took fortraditional FPGA users to adopt new products.Over the last two years, however, we havefound increasing interest in new Xilinx prod-ucts, especially among engineers new to FPGAdesign, coincident with product launch.

To address this new group, Xilinx has accel-erated training updates so that our classesreflect the latest and greatest technology on themarket today. We update classes at productlaunch or within 30 days of launch. All coursesare now updated to the ISE® 12.1 design tools.

Finding the Right Mix Learning how to design with an FPGA is likelearning to fly. Success takes a mix of theoreti-cal knowledge and hands-on skill. A pilot needsto understand how weather can affect the flight

dynamics of an airplane, but must also gainexperience controlling the aircraft in thoseweather conditions. Similarly, an engineerimplementing an FPGA design needs tounderstand the different tools, IP and designtechniques to steer a successful FPGA designthrough the ever-changing weather of evolvingstandards and system change requirements.

For the expanding market of FPGA engi-neers, Xilinx had to decide how to strike abalance between the practical and theoreti-cal training approaches. Because engineerstend to learn by doing, Xilinx has imple-mented a number of changes to its trainingcurriculum, including:

• Focusing classroom learning on hands-onlabs. On average, engineers spend morethan 50 percent of all class time in labs.

• Modularizing content so instructors canshape the training to meet each class’needs. Some groups may need more theo-retical training, others more hands-onexperience.

• Providing labs in multiple formats toaccommodate each learner’s uniquestyle. This includes labs that providehigh-level instructions for engineers whowant to explore the tools on their own,as well as step-by-step instructions forthose who want to learn the bestmethod for solving problems.

Expanding the Reach of Training To reach out to this ever-expanding pool ofFPGA engineers, Xilinx has stepped up itsefforts to connect learners with expert train-ers regardless of where they reside. Xilinx haspartnered with nearly 30 AuthorizedTraining Providers, expert FPGA engineerswho enjoy sharing their knowledge andexpertise with others. Many of these trainingproviders are also consultants in digital sig-nal processing, embedded processing andconnectivity designs.

Over the past two years, Xilinx has focusedon enhancing the depth and breadth of itsAuthorized Training Providers roster. InNorth America alone, Xilinx has recentlyadded five new training providers to the team.

The Response? The response has been overwhelming.Learners, on average, rate Xilinx customertraining classes at 8.6 out of 10, with amedian score of 9. In addition, learners tellus their FPGA knowledge doubled as aresult of the training.

Xilinx FAEs who engage regularly withcustomers clearly see the results. Says FAEDon Schaeffer, “Engineers that have takenclasses are more likely to make use ofadvanced features, IP and design tech-niques, which dramatically improves thequality of their products and significantlyreduces their time-to-market.”

Adds Mike Cole, “Customers reallyenjoyed the Embedded Systems Designclass and stated that they feel the class willsave them at least four weeks of develop-ment time. The instructor was outstandingand the course content gave them confi-dence in creating custom peripherals andutilizing advanced cores in EDK. Throughthe training the customer was able to askhigher-level questions, utilize advancedtechniques and ultimately have a betterproduct and get to market more quickly.”

Additionally, Xilinx instructor Bill Kafighas this to say: “My favorite quote from astudent taking the Essentials of FPGADesign and Designing for Performanceclasses—‘I learned in three days what itwould have taken me three years to learn!’ ”

Space is limited, so sign up for a coursetoday by visiting http://www.xilinx.com/training/worldwide-schedule.htm. For moreinformation on the Authorized TrainingProviders, see http://www.xilinx.com/training/.

At the Cusp of the Programmable Imperative


Xilinx has altered its training strategy to keep pace with engineers’ interest in FPGA design.

58 Xcell Journal Third Quarter 2010

Page 59: Xcell Journal issue 72

Third Quarter 2010 Xcell Journal 59


Xilinx Training Worldwide www.xilinx.com/training Worldwide

AMERICAS [email protected]

Anacom Eletrônica www.anacom.com.br Brazil

Bottom Line Technologies www.bltinc.com Delaware, District of Columbia, Maryland, New Jersey, New York, Eastern Pennsylvania, Virginia

Doulos www.doulos.com/xilinxNC Northern California

Faster Technology www.fastertechnology.com Arkansas, Colorado, Louisiana, Montana, Oklahoma, Texas, Utah, Wyoming

Hardent www.hardent.com Alabama, Connecticut, Eastern Canada, Florida, Georgia, Maine, Massachusetts, Mississippi, New Hampshire, North Carolina, RhodeIsland, South Carolina, Tennessee, Vermont

North Pole Engineering www.npe-inc.com Illinois, Iowa, Kansas, Minnesota, Missouri, Nebraska, North Dakota, SouthDakota, Wisconsin

Technically Speaking www.technically-speaking.com Arizona, British Columbia, Southern California, Idaho, New Mexico, Nevada,Oregon, Washington

Vai Logic www.vailogic.com Indiana, Kentucky, Michigan, Ohio, Western Pennsylvania, West Virginia

EUROPE, MIDDLE EAST & AFRICA (EMEA) [email protected]

Arcobel Embedded Solutions www.arcobel.nl The Netherlands, Belgium, Luxembourg

Bitsim AB www.bitsim.com/education Sweden, Norway, Denmark, Finland, Lithuania, Latvia, Estonia

Doulos Ltd. www.doulos.com/xilinx United Kingdom, Ireland

Inline Group www.plis.ru Moscow Region

Logtel Computer Communications www.logtel.com Israel, Turkey

Magnetic Digital Systems www.magneticgroup.ru Urals Region

Mindway www.mindway-design.com Italy

Multi Video Designs (MVD) www.mvd-fpga.com France, Spain, Portugal, Switzerland, Mexico, Brazil, Argentina

Pulsar Ltd. pulsar.co.ua/en/index Ukraine

Programmable Logic Competence Center (PLC2) www.plc2.de Germany, Switzerland, Poland, Hungary, Czech Republic, Slovakia, Slovenia,Greece, Cyprus, Turkey, Russia

SO-Logic Consulting www.so-logic.co.at Austria, Brazil, Czech Republic, Hungary, Slovakia, Slovenia

ASIA PACIFIC [email protected]

Active Media Innovation www.activemedia.com.sg Malaysia, Singapore, Thailand

Black Box Consulting www.blackboxconsulting.com.au Australia, New Zealand

E-elements www.e-elements.com China, Hong Kong, Taiwan

Libertron www.libertron.com Korea

OE-Galaxy [email protected] Vietnam

Sandeepani Programmable Solutions www.sandeepani-vlsi.com India

Symmid www.symmid.com Malaysia

WeDu Solution www.wedusolution.com Korea

JAPAN [email protected]

Avnet Japan www.jp.avnet.com Japan

Paltek www.paltek.co.jp Japan

Shinko Shoji xilinx.shinko-sj.co.jp Japan

Tokyo Electron Device ppg.teldevice.co.jp Japan


Page 60: Xcell Journal issue 72

XAPP459: Eliminating I/O Coupling Effects when Interfacing Large-Swing Single-Ended Signals to User I/O Pins on Spartan-3 Familieshttp://www.xilinx.com/support/documentation/application_notes/xapp459.pdf

Xilinx® Spartan®-3, Spartan-3E and Extended Spartan-3A devicessupport an exceptionally robust and flexible I/O feature set that eas-ily meets the signaling requirements of most applications. It is pos-sible to program user I/O pins of these families to handle manysingle-ended signal standards.

The standard single-ended signaling voltage levels are 1.2 V, 1.5V, 1.8 V, 2.5 V and 3.3 V. But in a number of applications, it isdesirable to receive signals with a greater voltage swing than user I/Opins ordinarily permit. The most common use case involves receiv-ing 5-V signals on user I/O pins that are powered for use with oneof the standard single-ended signaling levels.

This application note by Eric Crabill describes ways to receivethe resulting “large-swing signals” by design. In one solution (and inthe general case of severe positive or negative overshoot), parasiticleakage current between user I/O in differential-pin pairs mightoccur, even though the user I/O pins are configured with single-ended I/O standards. The application note addresses the parasiticleakage current behavior.

XAPP1075: Implementing Triple-Rate SDI with Virtex-6 FPGA GTX Transceivers http://www.xilinx.com/support/documentation/application_notes/xapp1075_V6GTX_TripleRateSDI.pdf

Professional broadcast video equipment makes wide use of the triple-rate serial digital interface (SDI) supporting the SMPTE SD-SDI,HD-SDI and 3G-SDI standards. In broadcast studios and video pro-duction centers, SDI interfaces carry uncompressed digital videoalong with embedded ancillary data, such as multiple audio channels.

Xilinx Virtex®-6 FPGA GTX transceivers are well-suited for imple-menting triple-rate SDI receivers and transmitters, providing a high

degree of performance and reliability while occupying a relatively smallamount of FPGA logic resources. In this application note, John Snowdescribes how to implement these triple-rate SDI interfaces.

Doing so requires only two reference clock frequencies to sup-port all SDI modes: 148.5 MHz for SD-SDI at 270 Mbps, HD-SDI at 1.485 Gbps and 3G-SDI at 2.97 Gbps; and 148.5/1.001MHz for HD-SDI at 1.485/1.001 Gbps and 3G-SDI at 2.97/1.001Gbps. The transceiver transmits preformatted dual-link HD-SDIstreams via either dual-link HD-SDI or 3G-SDI Level B formats.With the addition of a 3G-SDI Level A mapping module, it sup-ports all 3G-SDI Level A-compatible video formats. In addition, thetransceiver directly supports transmission of two independent HD-SDI streams in the 3G-SDI Level B mode. Only a single globalclock is required for the transmitter. No mixed-mode clock man-agers (MMCMs) are required.

XAPP496: Creating Wider Memory Interfaces Using Multiple Spartan-6FPGA Memory Controller Blockshttp://www.xilinx.com/support/documentation/application_notes/xapp496.pdf

The Memory Controller Block (MCB) is a dedicated embeddedmultiport memory controller that greatly simplifies the task of inter-facing Spartan-6 devices to DDR3, DDR2, DDR and LPDDRmemories. Spartan-6 devices contain two to four MCBs, each ofwhich can implement a single-component interface to a 4-bit, 8-bitor 16-bit memory. However, some applications with higher memo-ry bandwidth or density requirements benefit from using memoryinterfaces wider than 16 bits. This application note by Derek Curddescribes how to merge the operation of two or more MCBs toimplement effective 32-bit or wider memory interfaces.

Both MCBs must be in a single-port configuration mode. EachMCB still operates at full performance (up to 800 Mbits/second),allowing the user application to realize the full benefit of using thesededicated embedded memory controllers for wider interfaces. The

60 Xcell Journal Third Quarter 2010

Application NotesApplication NotesIf you want to do a bit more reading about how our FPGAs lend themselves to a broad number of applications, we recommend these notes. If you want to do a bit more reading about how our FPGAs lend themselvesto a broad number of applications, we recommend these notes.


Page 61: Xcell Journal issue 72

author has verified the associated reference design in hardware andanalyzed it for both performance and device utilization. However,the design does not support merging MCBs that are configured inthe multiport configuration mode.

XAPP1073: NSEU Mitigation in Avionics Applicationshttp://www.xilinx.com/support/documentation/application_notes/xapp1073_NSEU_Mitigation_Avionics.pdf

Neutron-induced single-event upset (NSEU) is a known phenomenonin the memory structures of modern ICs used in terrestrial applica-tions. With current and next-generation aircraft operating at altitudesof 40,000 feet and higher, the increased atmospheric neutron flux rais-es the likelihood of this phenomenon by several orders of magnitude,with the potential to affect flight safety. The decreasing feature size ofmemory structures combined with the growth in memory size meansthat systems are becoming ever more susceptible to NSEUs.

SRAM-based FPGAs pose a unique challenge for avionics man-ufacturers, because an FPGA’s functionality depends on the integri-ty of its configuration memory. It is vital for FPGA designers toachieve NSEU hardness for critical avionics systems through a com-bination of soft and hard mitigation techniques.

This application note by Ching Hu and Suhail Zain providesbackground on NSEUs in SRAM-based FPGAs, mitigation tech-niques (with a focus on configuration memory) suggested by Xilinxand an overview of how to calculate projected failures-in-time (FIT)rates at altitude.

While SRAM-based FPGAs have an additional susceptibilityover other programmable technologies due to the volatility of theirconfiguration memories, Xilinx has developed a number of mitiga-tion techniques and structures. They include SRAM cells built tologic-design rules (not to memory-design rules), built-in ECC struc-tures in Block RAM, and SEU detection and correction structuresbuilt into hardware. All memory elements are susceptible to NSEUs,but with proper mitigation techniques, SRAM-based FPGAs pro-vide avionics designers with a wide range of possible solutions.

XAPP879: PLL Dynamic Reconfigurationhttp://www.xilinx.com/support/documentation/application_notes/xapp879.pdf

This application note by Karl Kurbjun and Carl Ribbing provides amethod to dynamically change the clock output frequency, phaseshift and duty cycle of the Spartan-6 FPGA phase-locked loop (PLL)through its Dynamic Reconfiguration Port (DRP). After explainingthe behavior of the internal DRP control registers, the authors pro-vide a reference design that uses a state machine to drive the DRP soas to ensure that the registers are controlled in the correct sequence.

The PLL used in conjunction with the DRP interface is recom-mended for advanced users when the basic PLL functionality is notsufficient. The DCM_CLKGEN primitive can be a useful alternative.

The reference design, which supports two reconfiguration stateaddresses, can be extended to support additional states. Each statedoes a full reconfiguration of the PLL so that most parameters can

be changed. Its modular nature means you can use the design as afull solution for DRP or easily extend it to support additionalreconfiguration states. The design uses minimal Spartan-6 FPGAresources, consuming only 25 slices.

However, if designers need postconfiguration cyclic redundancycheck (CRC) functionality in their design, they cannot use the PLLDRP port to dynamically reconfigure the PLL. Doing so breaks thefunctionality of postconfiguration CRC.

XAPP1146: Embedded Platform Software and Hardware In-the-Field Upgrade Using Linuxhttp://www.xilinx.com/support/documentation/application_notes/xapp1146.pdf

New features and bug fixes often necessitate upgrading flash imagesto replace the existing FPGA bitstream, boot loader, Linux kernelor file system. This application note describes an in-the-fieldupgrade of the Spartan-6 FPGA bitstream, Linux kernel and loaderflash images, using the presently running Linux kernel. Upgradefiles are obtained from a CompactFlash storage device or over thenetwork from an FTP server. Author Brian Hill includes one refer-ence design built for the Xilinx SP605 Rev C board.

XAPP498: Source Control and Team-Based Design in System Generatorhttp://www.xilinx.com/support/documentation/application_notes/xapp498.pdf

This application note by Douang Phanthavong provides anoverview of how to perform source version control and team-baseddesign using the System Generator tool. Designers can accomplishthese tasks using the version control features native to the MAT-LAB® Simulink® software environment, or with an external sourcecontrol system. While this application note focuses on Subversion,a well-known, free, open-source control system, other version con-trol software such as CVS, MS Source Safe and Clear Case can alsobe used—depending on the design environment.

Collaborative development allows developers who are physicallydispersed to concurrently and collaboratively design, test, debugand document the same design. Team-based design in MAT-LAB/Simulink requires coordination of modeling activitiesbetween team members. If properly managed, dozens of geograph-ically dispersed developers can effectively share their work in a safe,secure and productive design environment. However, if not man-aged well, dealing with many design versions and their dependen-cies can lead to severe loss in productivity and reduced confidencein product quality. Version control is the key to managing an orga-nization’s MATLAB/Simulink designs.

This application note provides the basic knowledge required tomanage model versions using Simulink’s native features. It alsoshows how to use source control systems such as Subversion inter-nally and externally to the MATLAB/Simulink software environ-ment. Users find out how to graphically compare and mergemodels using the SimDiff and SimMerge tools and how to use thesetools with source control systems such as TortoiseSVN.

Third Quarter 2010 Xcell Journal 61


Page 62: Xcell Journal issue 72

62 Xcell Journal Third Quarter 2010

Tools of XcellenceTools of XcellenceNews and the latest products from Xilinx partnersNews and the latest products from Xilinx partners


by Mike SantariniPublisher, Xcell JournalXilinx, [email protected]

by Mike SantariniPublisher, Xcell JournalXilinx, [email protected]

Synopsys Inc. recently released the sixth generation of its HAPSASIC and ASSP prototyping system, based on the Xilinx®

Virtex®-6 FPGAs. The HAPS-60 series has more than double thecapacity and up to 30 percent better performance than the prior-generation product, said Neil Songcuan, product marketing manag-er for FPGA-based prototyping solutions at Synopsys. In addition,the product offers new IP support via Synopsys’ DesignWare IPlibrary, along with advanced verification functionality to facilitateASIC design and even end-product debug.

Chip companies typically use the HAPS systems to verify andvalidate the functionality of their ASIC designs in real hardware.They use Synopsys’ Simplify synthesis in concert with the Certifypartitioning tool to program their ASIC design into the FPGAsrunning on the HAPS system of choice. Once they have the hard-ware validated, they can use this HAPS version to get an earlyjump on developing and validating firmware, drivers andapplication software running on their design in thecontext of the entire end product.

Synopsys’ new HAPS-60 line consists ofthree systems. The HAPS-61, which usesone Virtex-6 FPGA, supports up to4.5 million ASIC gates. The HAPS-62, equipped with two Virtex-6FPGAs, supports up to 9 millionASIC gates. The biggest system inthe lineup, the HAPS-64, uses fourVirtex-6 FPGAs and supports up to18 million ASIC gates.

Songcuan said that all these sys-tems achieve clock frequencies of upto 200 MHz (25 to 70 MHz typi-cal). That performance allows theHAPS-60 series to support applica-tions requiring real-time interfacessuch as video, cellular data or livenetwork traffic.

In addition to improved capacity and performance, Synopsys isalso offering pretested intellectual property from its DesignWare IPlibrary to facilitate implementing soft-IP blocks from an ASIC designinto the FPGAs on the HAPS-60 system. “Customers are using inHAPS the exact same RTL they are using in their SoC (system-on-chip),” said Songcuan. “It allows you to make sure you are verifyingexactly what is going to be in your SoC and focus on debugging yourdesign rather than your prototyping system.” The IP includesSynopsys’ SuperSpeed USB 3.0, PCI Express and HDMI cores.

Contact your local sales representative for more information onavailability and pricing of the HAPS-60 series of rapid prototypingsystems. A list of Synopsys sales offices can be found athttp://www.synopsys.com/apps/company/locations.html

Synopsys Debuts Latest HAPS Prototyping System

Synopsys’ sixth-generation HAPS prototyping system, built on the Virtex-6, features higher capacity and performance and advanced verification capabilities.

Page 63: Xcell Journal issue 72

Mentor Graphics Corp. (Wilsonville, Ore.) recently released itsnew vendor-independent Precision Rad-Tolerant FPGA

design solution for aerospace and high-reliability applications.Developed with the guidance of NASA, the tool provides synthesis-based radiation-effects mitigation. It includes all the features of thelatest version of the company’s traditional Precision synthesis tool,along with specialized features and flows for mil-aero and safety-critical applications.

“Precision Rad-Tolerant is the first synthesis-based radiation-effects mitigation solution to reduce the risk of functionality prob-lems including soft errors caused by single-event upset (SEU) andsingle-event transient (SET) disruptions,” said Daniel Platzker,FPGA synthesis product-line director at Mentor Graphics’ DesignCreation and Synthesis Division. “It primarily targets aerospace-and-defense applications that use FPGAs, but there are an increasingamount of commercial applications that require high reliability andare also susceptible to SEU and SET disruptions.”

Platzker cited communications and high-performance computingas two applications that demand ever-more-sophisticated radiation-effect mitigation technologies.

Platzker pointed to an automated multimode triple-modularredundancy (TMR) feature as one of the most advanced features inthe new tool. In a TMR design, a voting circuitcompares the results of three separate instantia-tions of a system that perform the same task. Ifat least two of them produce the same result, thevoter deems it to be correct.

In a Xilinx FPGA environment, this newPrecision synthesis feature is another approach toXilinx’s internally developed and well-maturedTMRTool. Platzker said that where the XilinxTMRTool implements redundancy post-synthe-sis (taking in a netlist and outputting a newnetlist), the Precision Rad-Tolerant TMR toolperforms triple-modular redundancy a step earli-er in the design process—during logic synthesis.“We can decide if it is more optimal to infer onememory over another, or infer one kind of DSPover another, and decide at an earlier stage of thedesign cycle where to insert a voter,” saidPlatzker. “You typically aren’t making these deci-sions after you’ve designed the circuit.”

Further, users can choose what types ofTMR they would like the tool to implement.“It gives new levels of granularity in what youwant to triplicate,” he said. Designers can, forexample, opt to implement redundancy ofsequential or combinatorial logic. “The toolhas a Local TMR mode,” said Platzker. “Youcan choose to only triplicate flops, for exam-ple. You can also choose to triplicate a certainflop or feature.”

The tool also has a Distributed TMR mode. “If you are alsoconcerned with SET, with distributed TMR, you can not only trip-licate the selected flops, but also the logic on the drive of the flopsand the voters,” said Platzker. “It’s a great complement to Xilinx’smitigation solutions.”

In the third and final mode, the Global TMR mode, the toolnot only triples all sequential elements, combinatorial logic andmajority voters, but also global buffers.

Another feature of the Precision Rad-Tolerant product is syn-thesis-based insertion of fault-tolerant finite state machines(FSMs). “We have a mode called ‘detection and recovery,’ ” saidPlatzker. “Essentially, we add two parity bits to the logic andregardless of the state size and encoding, if your design takes anSEU hit, it will return the state machine into its default state. Atthat time, the designer can decide what next action the designshould take.”

Another FSM mode is called “fault-tolerant FSM.” Here, “Weuse Hamming-distance-3 error correction to allow the FSM toabsorb the SEUs without disruption,” said Platzker.

Platzker said the new Precision Rad-Tolerant product is availablenow and is priced higher than the standard version of the tool. Formore information, visit www.mentor.com/precision-radtolerant.


Third Quarter 2010 Xcell Journal 63

Mentor Graphics Fields Precision Rad-Tolerant Synthesis Tool with Versatile TMR Capabilities

The Distributed TMR and Global TMR modes triple sequential elements, combinatorial logic and majority voters.

Page 64: Xcell Journal issue 72

GateRocket Inc. (Bedford, Mass) has released a new versionof its RocketDrive FPGA verification and debug environ-

ment for the Xilinx Virtex-6 family of high-performance pro-grammable devices.

The RocketDrives are unique in that the devices, which arethe size of a hard drive, actually include the Virtex-6 FPGAs youare targeting. In GateRocket’s Device Native methodology, as youare simulating your design, you offload blocks of it onto theRocketDrive to speed up the performance of debug and verifica-tion. It also gives you a way to verify the design running on thesame FPGA device on which you plan to implement your design.

“Today’s FPGAs are simply too big to use the traditional‘blow and go’ methodology, where you program your designinto an FPGA, find and fix a problem and then repeat theprocess exhaustively untilyou hopefully find all bugs,”said Dave Orecchio, thepresident and CEO ofGateRocket. “That’s whymost designers now use sim-ulation, but simulationalone tends to be slow anddoesn’t give you the truehardware functionality.”

In contrast, GateRocket’sDevice Native methodologyallows designers to rundesign blocks in the FPGAthey are targeting, bringingmore order to the debugprocess and speeding up ver-ification and accuracy.“Designers typically seetheir FPGA verification anddebug times cut in halfcompared to the traditionalsoftware-only verificationcycle,” said Orecchio.

The GateRocket solutionallows designers using Virtex-6 devices to move effortlesslybetween RTL and the specific FPGA being targeted, combiningactual FPGA hardware and RTL simulation models together ina single verification run, without changes in the design flow ormethodology. The company calls this technique “soft patch,”and it gives engineers the ability to make a change to one ormore RTL blocks and rerun them along with the hardwareimplementations of the other blocks. That sidesteps the need torebuild the device for each fix and enables multiple design-change-debug iterations in a single day.

The new Virtex-6 RocketDrives use the largest LX and SXdevices for advanced logic and DSP applications respectively.

GateRocket also offers a cost-effective midrange device configu-ration targeted at users who do not require the largest FPGAdevice in the family.

By using devices optimized for specific needs, GateRocket saysit can deliver cost savings for an even greater customer return oninvestment. Each RocketDrive configuration offers the sameenhanced verification performance and debug efficiency, andmaintains complete compatibility with popular EDA logic simu-lators from Cadence, Mentor and Synopsys.

Along with the new Virtex-6 configurations, GateRocket alsosells versions of it RocketDrive supporting Xilinx Virtex-4 andVirtex-5 FPGAs. Pricing for the Virtex-6 version starts at$25,000. For more information about GateRocket, go towww.gaterocket.com.


64 Xcell Journal Third Quarter 2010

GateRocket Offers Virtex-6 FPGA Version of RocketDrive Debug Environment

GateRocket’s RocketDrive, a unique verification tool that’s the size of a hard drive, now comes in a Virtex-6 version.

Page 65: Xcell Journal issue 72


Third Quarter 2010 Xcell Journal 65

FPGA design services, module and IP vendor Enclustra(Zurich, Switzerland) has just released an FPGA module

equipped with two Fast Ethernet PHYs and a fast DDR2SDRAM. As such, the Mars MX1 module is ideal for system-on-programmable-chip designs that pair a soft-core processor andreal-time Ethernet functionality, said Oliver Brundler, develop-ment engineer at Enclustra. A reference MicroBlaze™ systemusing Xilinx Platform Studio is available and allows access to allon-module peripherals.

“The Mars MX1 is designed for applications such as industrialautomation, where dual Ethernet PHYs are used for real-timeEthernet,” said Brundler. “However, since Enclustra optimizedthe module for low cost, it can also be used for applicationsrequiring low production volume or for rapid prototyping. Usingthe MX1 module, the hardware design effort can be reduced fromover 100 components to just one.”

Enclustra built the new Mars MX1 around Xilinx’s mostrecent low-cost FPGA, the Spartan®-6 LX. The module comes intwo standard configurations, carrying either the XC6SLX16

(14,579 LUT4 equivalents) or the XC6SLX45 (43,661 LUT4equivalents) FPGA. Both variants are fitted with a 128-MbyteDDR2 SDRAM, 16-Mbyte SPI Flash and real-time clock. Themodules operate from a single 3.3-volt supply and provide 108user I/Os, which you can also configure as 54 differential pairs.Since the form factor of the module is SO-DIMM (68 x 30 mm),you can use space- and cost-saving standard connectors to easilyintegrate the MX1 into your targeted system, the company said.Custom configuration options and suitable carrier boards areavailable upon request.

Enclustra is a company built around the FPGA technology. Itoffers not only FPGA modules but also IP and design servicesfor various applications such as software-defined radio, drivecontrol and single-chip systems. Along with the Mars MX1module, the company also fields the MX2, a module equippedwith PCI Express. Enclustra has a second module line namedSaturn which is DSP-optimized and fitted with a Spartan-3ADSP FPGA.

For more information, visit www.enclustra.com/marsmx1.

Enclustra Introduces Compact FPGA Module with Fast Ethernet

Enclustra’s Mars MX1 is very compact module designed for developing systems that combine Ethernet and system-on-chip processing.

Page 66: Xcell Journal issue 72

66 Xcell Journal Third Quarter 2010

Xpress Yourself in Our Caption Contest


NO PURCHASE NECESSARY. You must be 18 or older and a resident of the fifty United States, the District of Columbia or Canada (excluding Quebec) to enter. Entries must be entirely original and must be received by 5:00 pm Pacific Time (PT) on September 15, 2010. Official rules are available online at www.xilinx.com/xcellcontest. Sponsored by Xilinx, Inc., 2100 Logic Drive, San Jose, CA 95124.

There’s something fishygoing on here, andwe’re counting on

you to tell us what it is. If youhave a yen to Xercise yourfunny bone, step up to ourverbal challenge and submit anengineering- or technology-related caption for this imageof robotic fish developed at theMassachusetts Institute ofTechnology. MIT scientists saythe critters could potentiallybe used to detect underwaterpollutants and inspect sub-merged boats or structures.The group seen here mightinspire a caption like “Herbieangled to debug the soleremaining catch that was mak-ing his design flounder.”

Send your entries to [email protected]. Include your name, job title, company affiliationand location, and indicate that you have read the contest rules at www.xilinx.com/xcellcontest. After due deliberation, we will print the submissions we like the best in thenext issue of Xcell Journal and award the winner the new Xilinx® SP601 Evaluation Kit,our entry-level development environment for evaluating the Spartan®-6 family of FPGAs(approximate retail value, $295; see http://www.xilinx.com/sp601). Runners-up will gainnotoriety, fame and a cool, Xilinx-branded gift from our SWAG closet.

The deadline for submitting entries is 5:00 pm Pacific Time (PT) on Sept. 15, 2010.So, cast your nets and get writing!

MARCO RIVERO,an electrical engineer at the Anderson

Research Group at Harvard University, has won an SP601 Evaluation Kit with his caption for the photograph of the

man arguing with a robot in Issue 71 of Xcell Journal.

Congratulations as well to our two runners-up:

Playing soccer—check. Understanding what a bribe or taking a dive is—fail.

– Matthew Hicks, PhD Candidate, University of Illinois

“AI Divorce Court 2020!! – New This Fall! –Check local listings in your area.”

– David Santoro, PhD Candidate, Electrical Engineering,

City College of New York

“C'mon Ref! He’s clearly using his ‘acting’ function!”






Page 67: Xcell Journal issue 72

The Best Prototyping System Just Got Better

� Highest Performance

� Integrated Software Flow

� Pre-tested DesignWare® IP

� Advanced Verification Functionality

� High-speed TDM

HAPS™ High-performance ASIC Prototyping Systems™

For more information visit www.synopsys.com/fpga-based-prototyping

Page 68: Xcell Journal issue 72

© Copyright 2010 Xilinx, Inc. All rights reserved. Xilinx and the Xilinx logo are registered trademarks of Xilinx

in the United States and other countries. All other trademarks are property of their respective holders.

Unleash the full potential of your product design with Xilinx® Virtex®-6 and Spartan®-6 FPGA families — the

programmable foundation for Targeted Design Platforms.

• Reduce system costs by up to 60%

• Lower power by 65%

• Shrink development time by 50%

Realize your potential. Visit www.xilinx.com/6.

Potential. Realized.

• High bandwidth serial connectivity with up to 72

low-power transceivers supporting up to 11.18Gbps

• Ultra high-performance DSP using up to 2016

low-power, performance-optimized DSP slices

• Integrated high-performance ExpressFabric™

technology running at 600 MHz clocking and

performance-tuned IP blocks

• Proven cost-reduction with EasyPath™-6 FPGAs

• Easy-to-use, low power, serial transceivers support

up to 3.125Gbps to enable industry standards such

as PCIe®

• Low voltage option reduces total power consumption

by 65% over previous generations

• Integrated DSP, memory controllers, and clocking

technology simplifies designs