Xcell Journal: Issue 68 - Xilinx - All Programmable · · 2018-01-09a formulation that Cal Tech professor Carver Mead famously dubbed “Moore’s Law.” ... The exorbitant cost,

Issue 68Third Quarter 2009

Xcell journalXcell journalS O L U T I O N S F O R A P R O G R A M M A B L E W O R L DS O L U T I O N S F O R A P R O G R A M M A B L E W O R L D

INSIDE

Novel PC Architecture DitchesMicroprocessor for an FPGA

European UltrawidebandProject Taps Virtex-5 FPGAs

Choosing the Right FPGAGigabit Transceiver is a Matter of Protocol

Interpolated Lookup TablesOffer Easy Path to DSP Functions

INSIDE

Novel PC Architecture DitchesMicroprocessor for an FPGA

European UltrawidebandProject Taps Virtex-5 FPGAs

Choosing the Right FPGAGigabit Transceiver is a Matter of Protocol

Interpolated Lookup TablesOffer Easy Path to DSP Functions

www.xilinx.com/xcell/

Accelerate Innovation withTargeted Design Platforms

Accelerate Innovation withTargeted Design Platforms

1 . 8 0 0 . 3 3 2 . 8 6 3 8www.em.avnet.com

Copyright© 2008, Avnet, Inc. All rights reserved. AVNET and the AV logo are registered trademarks of Avnet, Inc. All other brands are the property of their respective owners.

Prices and kit configurations shown are subject to change.

Xilinx®Spartan

®

-3A Evaluation Kit

The Xilinx® Spartan®-3A Evaluation Kit provides an

easy-to-use, low-cost platform for experimenting

and prototyping applications based on the Xilinx

Spartan-3A FPGA family. Designed as an entry-level

kit, first-time FPGA designers will find the board’s

functionality to be straightforward and practical,

while advanced users will appreciate the board’s

unique features.

Get Behind the Wheel of the Xilinx Spartan-3A Evaluation Kit and take

a quick video tour to see the kit in action (Run time: 7 minutes).

Ordering Information

Part Number Hardware Resale

AES-SP3A-EVAL400-G Xilinx Spartan-3A Evaluation Kit $39.00* USD (*Limit 5 per customer)

Take the quick video tour or purchase this kit at: www.em.avnet.com/spartan3a-evl

Target ApplicationsGeneral FPGA prototyping»MicroBlaze» ™ systems

Configuration development»USB-powered controller »Cypress» ® PSoC® evaluation

Key FeaturesXilinx XC3S400A-4FTG256C »Spartan-3A FPGA

Four LEDs »Four CapSense switches »I» 2C temperature sensor

Two 6-pin expansion headers »20 x 2, 0.1-inch user I/O header »32 Mb Spansion» ® MirrorBit®

NOR GL Parallel Flash

128 Mb Spansion MirrorBit »SPI FL Serial Flash

USB-UART bridge »I» 2C port

SPI and BPI configuration »Xilinx JTAG interface »FPGA configuration via PSoC» ®

Kit IncludesXilinx Spartan-3A evaluation board »ISE» ® WebPACK™ 10.1 DVD

USB cable»Windows» ® programming application

Cypress MiniProg Programming Unit»Downloadable documentation »and reference designs

FASTER THAN THE

The path to true innovation is never a straight line. Only Xilinx programmable silicon,

software, IP and 3rd party support gives you the agility to stay ahead of the competition and adapt

to changing market requirements, without slowing down. So you have the freedom to innovate

without risk. Find out more about Xilinx Targeted Design Platforms at www.xilinx.com.

SPEED OF CHANGE

© Copyright 2009 Xilinx, Inc. All rights reserved. Xilinx and the Xilinx logo are registered trademarks of Xilinx in the United States and other countries. All other trademarks are property of their respective holders.

L E T T E R F R O M T H E P U B L I S H E R

Xilinx, Inc.2100 Logic DriveSan Jose, CA 95124-3400Phone: 408-559-7778FAX: 408-879-4780www.xilinx.com/xcell/

© 2009 Xilinx, Inc. All rights reserved. XILINX, the Xilinx Logo, and other designated brands includedherein are trademarks of Xilinx, Inc. All other trade-marks are the property of their respective owners.

The articles, information, and other materials includedin this issue are provided solely for the convenience ofour readers. Xilinx makes no warranties, express,implied, statutory, or otherwise, and accepts no liabilitywith respect to any such articles, information, or othermaterials or their use, and any use thereof is solely atthe risk of the user. Any person or entity using suchinformation in any way releases and waives any claim itmight have against Xilinx for any loss, damage, orexpense caused thereby.

PUBLISHER Mike [email protected]

EDITOR Jacqueline Damian

ART DIRECTOR Scott Blair

DESIGN/PRODUCTION Teie, Gelwicks & Associates1-800-493-5551

ADVERTISING SALES Dan [email protected]

INTERNATIONAL Melissa Zhang, Asia [email protected]

Christelle Moraga, Europe/Middle East/[email protected]

Yumi Homura, [email protected]

SUBSCRIPTIONS All Inquirieswww.xcellpublications.com

REPRINT ORDERS 1-800-493-5551

Xcell journal


Golden Age of SemiconductorsPoints to Platinum Era Ahead

any years ago, one of my former colleagues over at EDN magazine mailed me a copyof Gordon Moore’s landmark article “Cramming More Components onto IntegratedCircuits.” It was a reprint of a piece that first appeared in the now-defunct magazineElectronics on April 19, 1965, and I’ve kept it at my desk ever since. If you’ve never

read the article, do so. Moore opens with the sentence, “The future of integrated electronics isthe future of electronics itself,” and then goes on to make a slew of brilliant predictions, fore-seeing the inventions of the personal computer, the Internet and the handheld market—fruits ofthe ever-progressing IC.

Of course, the article is considered a landmark chiefly because Moore goes on to write that thenumber of transistors on an IC will double roughly every year (later amended to every two years),a formulation that Cal Tech professor Carver Mead famously dubbed “Moore’s Law.” The semi-conductor and, in turn, electronics industries have soared for the last 44 years paced indisputablyby Moore’s Law.

Back in 2005, I had the pleasure of covering a Computer History Museum event for EDNwhere Carver Mead actually interviewed Moore about his law. (You can view the video of their chat at http://www.youtube.com/watch?v=MH6jUSjpr-Q&feature=PlayList&p=6B12A0FACFA35D1F&index=7.) More recently, I witnessed another event of historic importancefor the industry: the induction ceremony of 15 semiconductor giants into the National InventorsHall of Fame (read my pldesignline.com blog coverage at http://www.dspdesignline.com/news/217400639).

Moore and Mead were among the 15 inductees, but I was really there to honor Xilinx co-founderRoss Freeman, who was also inducted (posthumously) into the Hall of Fame, and to learn more

about him from his colleagues and family. For those of you who may notbe familiar with the name, Freeman invented the FPGA in 1984 (PatentNo. 4,870,302), only five years before he died prematurely at the age of45. At this fascinating event, I learned that the FPGA probably would nothave been commercialized or grown as successful as it has had Freeman,his co-founders and, ultimately, Xilinx’s early investors not wholeheart-edly embraced Moore’s Law.

Essentially, Freeman’s early FPGA circuit wasn’t considered the mostefficient use of transistors, but it was inventive and unique. In thosedays, transistors were very expensive to produce. Freeman’s inventiontraded off transistor minimization and utilization for flexibility, fastturnaround and the convenience of outsourced manufacturing. Back in

M‘Only the beginning’ of a great new wave of invention built on the IC.

Xilinx co-founder Ross Freeman

Mike SantariniPublisher

1984, Freeman predicted that as time and Moore’s Law pro-gressed, FPGAs would keep pace, at least doubling in logic-cellcounts, and that the cost per transistor would decline dramatical-ly, making programmable devices more attractive to a growingnumber of users.

Just as Moore identified a trend in the cycle of IC manufactur-ing, Freeman—along with fellow co-founders Bernie Vonderschmittand Jim Barnett—pinpointed a related economic trend. Namely, ifmanufacturing kept pace with Moore’s Law by introducing a newprocess technology every two years, the associated costs of build-ing a fab and outfitting it with the latest process equipment would,over time, make owning a fab less feasible for a growing numberof chip companies.

So, for Xilinx’s first production FPGA, the XC2064,Vonderschmitt masterminded a revolutionary deal to have Seikomanufacture the parts. That marked the birth of the fabless model.Today, the vast majority of semiconductor and system companiesno longer manufacture their own ICs, but contract with foundries because, as Ross and Berniepredicted, manufacturing costs have spiraled with each new process introduction.

The exorbitant cost, which foundries pass on to customers, has now made even fablessproduction of ASICs and ASSPs economically out of reach for an ever-growing number ofdesigners. These engineers are increasingly turning to FPGAs.

Today, Freeman’s predictions about the increasing value of FPGAs resonate more stronglythan ever, thanks in part to Xilinx’s employees and customers, who have collaborated over theselast two decades to advance FPGA technology in extraordinary ways, helping Xilinx innovateusing the world’s most advanced process technologies. As a result of this ongoing innovation,today’s FPGAs contain hundreds of thousand of logic cells and include microprocessorcores, DSP slices and high-speed I/O. FPGAs are gaining broader use every day as designersincreasingly choose them as the means to quickly turn their inventions into reality.

The spirit of innovation honored at the 2009 Inventors Hall of Fame ceremony wasn’t just acelebration of the golden age of semiconductors. It was also a celebration of myriad inventions youdesigners will continue to enable in the future. The last 50 years may have been golden, but thenext 50 could surely be platinum. As Cal Tech’s Mead graciously put it in his brief acceptancespeech, “this is only the beginning…”

Ross Freeman’s mother, Ethel, holds

the Xilinx co-founder’s award at

National Inventors Hall of Fame

ceremony. Ross’ brother Fred

Freeman (right) accepted the

award on the family’s behalf.

SUSA

N C

ART

ER

C O N T E N T S

VIEWPOINTS XCELLENCE BY DESIGN APPLICATION FEATURES 1414

2626

Letter from the Publisher Golden Age of SemiconductorsPoints to Platinum Era Ahead…4

Xpectations Targeted Design Platforms Take FPGA Innovation to New Heights…66

88

Xcellence in Wired CommunicationsBlade Management Controller Rides FPGA Processor…14

Xcellence in Wireless CommunicationsVirtex-5 Propels Ultrawideband Comms and Ranging…22

Xcellence in Industrial Control FPGA-Powered Platform Controls IndustrialMotors for Maximum Efficiency…26

Xcellence in New ApplicationsEVE Taps Xilinx for Multiple Generations of Emulators…32

2222

Cover StoryTargeted Design Platforms Put Innovation on Fast Track

T H I R D Q U A R T E R 2 0 0 9 , I S S U E 6 8

Xperts Corner Understanding Timing Parametersin Xilinx System Generator…36

Xplanation: FPGA 101 The Right FPGA GigabitTransceiver Makes All the Difference…41

Xperiment: FPGA Anchors Innovative General-Purpose PC…48

Ask FAE-X Interpolated Lookup Tables: SimpleWay to Implement a DSP Function…54

Profiles of Xcellence Dick Corey Has theEngineering Bug…64

XTRA READING

THE XILINX XPERIENCE FEATURES

4141

4848

5454

Are You Xperienced? Sign up today for X-fest…60

Xamples… A mix of new and popular application notes…62

by Mike SantariniPublisher, Xcell JournalXilinx, [email protected]

When Moshe Gavrielov joined Xilinx in 2008as the company’s new president and CEO, hebrought with him decades of knowledge as asemiconductor executive and designer of ICs,and the aggregate perspective that many eco-nomic realities are converging to make pro-grammability an imperative for a greaternumber of electronic end products.

Prior to joining Xilinx, Gavrielov managedLSI Logic’s ASIC group for many years beforemoving on to serve as the CEO for EDA firmVerisity. During his years in the business, he haswitnessed firsthand the rise in both IC designcomplexity and cost of manufacturing, in tan-dem with ever-tighter development budgets ascompanies strive to rapidly address new mar-kets and evolving standards.

8 Xcell Journal Third Quarter 2009

Targeted Design Platforms Put Innovation on Fast TrackTargeted Design Platforms Put Innovation on Fast TrackXilinx pioneers cutting-edge infrastructure for its latest FPGAs, the Spartan-6 and Virtex-6.Xilinx pioneers cutting-edge infrastructure for its latest FPGAs, the Spartan-6 and Virtex-6.

COVER STORY

Gavrielov concluded that these converg-ing factors would present Xilinx with amajor opportunity for growth. That’sbecause FPGAs don’t burden the users withmanufacturing costs. Also, the devices arereprogrammable and, thus, inherently flex-ible and forgiving of design errors.

However, to really seize the opportunity,Xilinx not only has to provide customerswith the best FPGAs in the industry, butmust complement that silicon with world-class tools, intellectual property (IP) that’s

easy to integrate, evaluation kits, services,helpful documentation and targeted refer-ence designs. Each part of the puzzle is nec-essary to facilitate customers getting theirinnovations to market quickly.

To turn Gavrielov’s vision into a reality,the hard-working employees of Xilinx andits partners put their heads together toengineer the Xilinx® Targeted DesignPlatform approach, which streamlines theFPGA design process for all users, fromnovice to expert.

leverage to create innovations for an ever-growing number of applications,” saidGavrielov. “All told, the opportunity is ripefor Xilinx to more rapidly take marketshare away from ASICs and ASSPs andassume a more central role at the heart ofnext-generation electronic innovations.Targeted Design Platforms will facilitatethe FPGA development process for ourgrowing user base, so designers can get themost out of our FPGAs and can get theirinnovations to market faster.”

At the foundation layer of the pyramidis the Base Targeted Design Platform,which Xilinx made available in June of thisyear, following the February 2009 launchof the Virtex-6 (http://www.xilinx.com/products/virtex6/index.htm) and Spartan-6(http://www.xilinx.com/products/spartan6/index.htm) FPGA families. In April, thecompany released its ISE® Design SuiteEdition 11 in support of the Targeted DesignPlatforms (http://www.xilinx.com/support/documentation/white_papers/wp307.pdf).

The Xilinx Targeted Design PlatformTo communicate this approach, Xilinxrepresents the strategy as a pyramid con-sisting of four layers: the Base TargetedDesign Platform, the Domain-SpecificPlatform and the Market-SpecificPlatform, topped off with the capstone ofuser differentiation (Figure 1). Users canautomate a greater percentage of themore-basic parts of their designs byadding, at their discretion, upcomingXilinx domain-specific and market-spe-

cific platform offerings to the BaseTargeted Design Platform. The approachwill enable them to get products to mar-ket fast, and focus the majority of theirdesign cycles on the elements that willreally differentiate those products.

“Today, our high-performance Virtex®-6 and high-volume Spartan®-6 FPGAsboast several hundreds of thousands of pro-grammable logic cells, up to 11.2-Gbit/sec-ond transceivers, 38 Mbits of block RAMand 2,000 DSP slices that designers can

Third Quarter 2009 Xcell Journal 9

‘Targeted Design Platforms will facilitate the FPGA development process for ourgrowing user base,’ says Xilinx CEO Moshe Gavrielov, ‘so designers can get the

most out of our FPGAs and can get their innovations to market faster.’

COVER STORY

Base Platform

Domain-Specific

Market-Specific

CustomerDesign

20-25%

75-80%

40-50%

20-25%

Communication • Video • AVBMarket-Specific IP, Custom Tools, Custom Boards

Focus on Differentiation

Embedded • DSP • ConnectivityDomain IP, Domain Tools, FMC Daughter Cards

Virtex • SpartanBase IP, ISE Program, Base Boards

Figure 1 – The Xilinx Targeted Design Platform squarely aims at increased user productivity.

The June launch announced first deliveryof the of the Base Targeted Design Platformfor Virtex-6 and Spartan-6 FPGAs in theform of evaluation kits that combine all theelements customers need to realize the ben-efits this new platform approach promises.These evaluation kits combine the latest ISEDesign Suite 11.2 with Virtex-6 LX240Tand Spartan-6 LX16 FPGA evaluationboards, preverified base IP, base platformreference designs and a complete set of doc-umentation that guides customers throughthe entire process, from setting up the hard-ware to installing software and beginningtheir design. (See www.xilinx.com/products/targeted_design_platforms.htm.)

“The release of the Base TargetedDesign Platform means that Virtex-6 andSpartan-6 are now open for business,” saidBrent Przybus, director of product market-ing. “Customers can now download ISEDesign Suite 11.2, order base evaluationkits for Spartan-6 and Virtex-6 FPGAs,and access comprehensive documentationto begin developing designs.”

The Targeted Design Platform MethodologyWith the Targeted Design Platformmethodology, Xilinx users will start theirprojects with the Base Targeted DesignPlatform, gaining access to basic functionscommon in almost all applications. Then,depending on the design domain in whichthey typically work—logic designer, DSP,embedded-software programmer or sys-

tems engineer—they can add XilinxDomain-Specific kits of their choice to theBase Targeted Design Platform.

These kits will provide domain-specifictools (available in ISE Design Suite 11Editions), along with libraries of preverifieddomain-specific IP from Xilinx and key IPpartners as well as domain-specific daughtercards. Designers can plug the cards intotheir Base Targeted Design platform evalua-tion boards by means of an FPGAMezzanine Connector (FMC) on the baseplatform evaluation boards to more quicklyimplement their designs (see sidebar, page12). Each Domain-Specific kit will alsoinclude targeted reference designs to helpusers quickly implement a greater numberof functions in their designs.

In addition to these Domain-Specifickits, Xilinx and its partners are also develop-ing Market-Specific Platforms, which willallow users to attach daughter cards gearedfor specific market segments to the base plat-form via the FMC connection, or to accessmarket-specific boards. Each of the individ-ual market-specific kits will likewise includepreverified IP and reference designs to helpusers quickly add functions to their designs.

Przybus said that the more elements ofthe Targeted Design Platform users add, thefaster they will complete their designs. Bysimply leveraging all that’s available in thebase platform, he said, users could conceiv-ably implement up to 25 percent of theiroverall project, saving valuable time in

which to develop the rest of their design.And if they leverage the domain-specific kitstoo, they should be able to quickly imple-ment as much as 50 percent of their design.Going a step further and using the market-specific kits, designers could quickly imple-ment up to 75 percent of their designs,Przybus said, freeing them to focus on the 25percent of the design that will bring thegreatest differentiation to the end products.

Przybus notes that some users may wantto design their entire FPGA implementationfrom scratch, and they still can do so if theywish. But he believes a vast majority will wel-come the great benefits of leveraging all theelements of the Targeted Design Platform.

Targeting Your Needs Przybus explains that Xilinx created TargetedDesign Platforms to help every category ofcustomer, from FPGA design experts whohave used programmable devices religiouslyfor many years; to ramping designers whomay have traditionally focused on ASIC andASSP designs but have started to transitionthe bulk of their efforts to FPGAs; all theway to novices who have never used FPGAsbefore but have traditionally targeted ASICs,or perhaps only have experience program-ming standalone processors.

With Targeted Design Platforms,Przybus said, Xilinx is giving the novices notonly the FPGAs, IP and tools they will needto quickly start designing, but also referencedesigns to help them build their products.


14

6

53

FMCPower

Board Power

USB Download

42

7

1 9

15

8

10

12 11

13

# Feature

1 DDR2

2 SPI x4, x1, Ext. x4 Configuration

3 SPI Header

4 Parallel Flash

5 10/100/1000 Ethernet

6 USB UART

7 IIC

8 Clock, Socket, SMA

9 FMC LPC connector

10 LED

11 DIP Switch

12 Pushbutton

13 USB JTAG

14 12-pin (8 I/O) Header

15 VCCint Voltage Selection Header

COVER STORY

Figure 2 – The Spartan-6 SP601 Evaluation Kit’s board features a rich set of system functions.


“They can take these reference designs andadd their own content to create differentiat-ed products,” said Przybus. Targeted DesignPlatforms will help novices quickly becomefamiliar with FPGA design, and as theygrow more experienced they can add evengreater amounts of their own differentiation.

For those already knowledgeable aboutFPGA design and who are escalating theirFPGA design efforts, Przybus said theTargeted Design Platforms will help acceler-ate their development time and get to marketfaster. “These groups are typically familiarwith FPGA design, but one of the biggestissues they have is migrating a design, orcomponents of a design, to a new FPGA,” hesaid. “The Targeted Design Platforms willmake that migration very simple, with refer-ence designs that show how the new featuresare used, as well as useful migration tech-niques. They will be able to see how Xilinxdid it, learn from that example and thenaccelerate their own design efforts.”

Expert FPGA designers are also con-cerned with migrating their designs, but theyare typically more focused on getting maxi-mum performance out of their FPGAs ormaximizing power efficiency, Przybus said.“The Targeted Design Platforms will providethese users with examples to help them getthe greatest efficiencies out of their FPGAdesigns, leveraging the new FPGA and toolcapabilities,” he said. “We are providingthem with a design environment that willallow them to closely monitor power con-sumption to optimize their designs for power

efficiency, and also implement and analyzemultigigabit transceivers and DSP slices run-ning at their fastest rates. And for the firsttime, we are giving them a whole system-centric design to evaluate those elements.”

The combination, Przybus said, “willaccelerate their development time.”

Spartan-6 and Virtex-6 Evaluation KitsA central element of the Base TargetedDesign platform is the evaluation kit—specifically, the Spartan-6 SP601Evaluation Kit and the Virtex-6 ML605Evaluation Kit. Customers can order bothof these new products today.

Przybus described the Spartan-6 FPGASP601 Evaluation Kit (Figure 2) as a low-cost, entry-level environment for developingconsumer, infotainment, video and othercost- and power-sensitive applications usingthe Spartan-6 LX16 FPGA. The kit’s sys-tem-level capabilities include DDR2 memo-ry control, flash, Ethernet, general-purposeI/O and UART. The kit include ISE DesignSuite 11.2 WebPACK, reference designs, a“getting started” demo, board design files,documentation, cables and a power supply.

Meanwhile, the Virtex-6 FPGA ML605Evaluation Kit (Figure 3) is a scalable envi-ronment for developing system designswith the Virtex-6 LX240T FPGA. Amongother system-level capabilities, the kitincludes high-speed serial transceivers,PCIe® Gen2 blocks, a soft DDR3 memo-ry controller, Gigabit Ethernet and DVI. Italso includes the ISE Design Suite 11.2

Logic Edition, reference designs, a “gettingstarted” demo, board design files, docu-mentation, cables and a power supply.

Three Easy Steps to Kick-Start Your DesignPrzybus said that Xilinx designed these kitsto get users up and running right out ofthe box in a three-step process.

In step 1, users connect cables from theboard to their PC, power up the board,load the reference design interface softwareon their computer, view the referencedesign demo and begin working.

In step 2, users will evaluate the refer-ence design, which includes alternativedesign implementations using hard andsoft IP to implement common functions.Customers can evaluate features directlyfrom the base reference design and see theresults visually as well as view key per-formance statistics.

In step 3, users will open the ISEDesign Suite design tools, customize thereference design and generate a new designin the software. Next, they will downloadit to the evaluation board and then run thedesign on the FPGA. “It’s really that sim-ple,” said Przybus. “In a matter of minutes,you can be up and running.”

With the release of ISE Design Suite11.2, the Xilinx tool suite now supports theVirtex-6 and Spartan-6 FPGA families,delivering a 2x overall run-time improve-ment, speeding synthesis runs by betterthan 2x using XST and offering place-and-route optimizations that lower overall

COVER STORY

10

18

15

14 114

5

176

2

3

1713

1612 12

88

99

# Feature1 DDR 32 GPIO Dip Switch3 SFP4 Flash5 10/100/1000 Ethernet6 USB UART & USB JTAG7 MGT Clock8 User Clocks9 FMC Connectors10 Pushbuttons11 MGT 12 USB 2.013 Card Reader for System ACE14 DVI Output15 PCI Express16 12V Wall Adapter Power17 12V ATX Power18 Power Regulator Control

LPC HPC

Board Power

16 x 2 Display

Figure 3 – The Virtex-6 ML605 Evaluation Kit offers world-class functionality.

dynamic power consumption by 10 per-cent. In addition, the tools have a 28 per-cent smaller workstation memory footprintthan the prior version of ISE.

ISE 11.2 supports the SecureIP simula-tion model, facilitating compatibility withthird-party simulators from Cadence,Mentor Graphics and Synopsys. Also,Mentor Graphics’ Precision RTL andPrecision RTL Plus products support theBase Targeted Design Platform, as do theSynplify Pro and Synplify Premier toolsfrom Synopsys (Synplicity).

Certainly, another key component ofthe Base Targeted Design Platform and theoverall Targeted Design Platform strategyis IP support. In conjunction with therelease of the Base Targeted DesignPlatform, Xilinx and its IP partners arerolling out numerous soft cores supportingthe Spartan-6 and Virtex-6 FPGA families.

The upcoming domain-specific andmarket-specific offerings will feature some

of these pieces of IP. One, for example, is aPCI Express® DMA Engine fromNorthwest Logic. Xilinx and Northwestwill package the DMA Engine with ademonstration application and drivers toform a complete Connectivity TargetedReference Design built to support the newSpartan-6 and Virtex-6 devices with theXilinx base platform.

Xilinx is selling the Spartan-6 FPGASP601 Evaluation Kit for $295 and theVirtex-6 FPGA ML605 Evaluation Kit for$1,995. The Spartan-6 FPGA SP601Evaluation Kit is available now. TheVirtex-6 FPGA ML605 Evaluation Kitwill be available in late July.

In the third quarter, Xilinx and resellerAvnet will begin rolling out domain-specifickits for the connectivity, embedded andDSP spaces, followed by market-specific kitsfor communications, video and broadcast.

For more information, contact yourlocal Xilinx sales office or distributor.


COVER STORY

The FPGA Mezzanine Card (FMC) connection will facilitate rapid IP and kit development.

Spartan-6/Virtex-6 FPGA Base Board

FMC Modules

FMC Card Expedites Design DevelopmentOne of the key enablers of the Targeted Design Platform approach is the mutual adop-tion by Xilinx and its network of resellers and IP partners of the FPGA Mezzanine Cardfrom the VITA standards body. The FMC will serve as a standard interface to attachDomain-Specific and Market-Specific Kits from Xilinx and its partners to the BaseEvaluation Kits (see figure).

This standards-based approach allows Xilinx and its network of third-party compo-nent and board suppliers, including Avnet Electronics, Curtiss-Wright ControlsEmbedded Computing, Linear Technology and Northwest Logic, to deliver to mutualcustomers their latest and greatest offerings.

The ANSI-approved VITA 57.1 standard incorporates predefined and fixed locationsof parallel and serial I/Os, clocks, JTAG, control signals and power. It uses the well-defined high-performance Samtec SeaRay, supporting Low Pin Count (LPC) 4x40-rowand High Pin Count (HPC) 10x40-row connection. For compatibility with older XilinxFPGA boards, it also includes a voltage-compatible FMC HPC/LPC module.

— Mike Santarini

Toggle among banks of internal signals for incremental real-time internal measurements without:

See how you can save time by downloading our free application note.

www.agilent.com/find/fpga_app_note

Quickly see inside your FPGA

u.s. 1-800-829-4444 canada 1-877-894-4414© Agilent Technologies, Inc. 2009

Also works with all InfiniiVision

and Infiniium MSO models.

by Andy NortonDistinguished Engineer, Office of the CTOCloudShield Technologies, [email protected]

Jeff MullinsPrincipal Engineer, Embedded Systems LeadCloudShield Technologies, [email protected]

Dick MincherEmbedded Systems SpecialistCloudShield Technologies, [email protected]

The trend toward converged networks forboth telecom and enterprise simplifies net-work infrastructure and dramatically low-ers costs while providing scalable, openplatforms. Blade platforms are achievingnetwork convergence using Ethernet, scal-ing from 1G/10G to 40G/100G, withblade-management controllers keepingpace by using standards-based IntelligentPlatform Management Interfaces (IPMI).

To help drive network convergence intothe mainstream market, our design teamhere at CloudShield Technologies architect-ed a flexible blade-management controllerby integrating the PowerPC® 440 found inthe Virtex-®5 with standard and customperipheral cores, creating multiple embed-ded-system configurations with a commonhardware design. This FPGA-based designtook advantage of a unique and flexible fea-ture set that included system reconfigura-tion, powerful embedded processors,intellectual-property (IP) cores and embed-ded Linux firmware for chassis manage-ment services unique to the resident blade.


Blade Management Controller RidesFPGA Embedded ProcessorCloudShield builds powerful, flexible solution around Xilinx PowerPC 440 running embedded Linux.

XCEL LENCE IN WIRED COMMS

Xilinx Platform Studio cores allowed usto quickly instantiate UARTs, I2C buses,memory interfaces, Ethernet MAC/PHYcombinations and a system monitor for volt-age and temperature oversight. We easilycreated custom cores to provide novel sys-tem acceleration, offloading real-time tasks.Our application implemented standards-based IPMI and chassis managementfirmware running over embedded Linux, fora robust and open software infrastructure.

Deep-Packet Processing Our next-generation content-processingplatform is a highly programmable, modu-lar, high-speed hardware system capable ofhandling a wide range of network applica-tions and multiple simultaneous func-tions, as shown in Figure 1. The chassisintegrates deep-packet processor blades,application-services processor blades,redundant chassis management modulesand an array of interface modules that canbe deployed with different network inter-face options and configurations.

Chassis and blade management are underthe control of redundant chassis manage-ment modules using a variant of theIntelligent Chassis Management Bus(ICMB) that extends IPMI intelligent plat-form management to meet highly securerequirements. Each blade or module of thisbus—except for the power supply module—contains an IPMI management controller toshare environmental monitoring and statuswith the chassis management module.

Sharing common requirements, ourfamily of management controllers consistsof chassis management controllers(CMCs), processor management con-trollers (PMCs) on processor blades andinterface management controllers (IMCs)on interface modules, as shown in Figure 2.

The chassis’ extensive networking fea-tures support four control-plane GigabitEthernet links to each blade. The dataplane has 16 high-speed lanes to each bladethat provide four 10-Gbit Ethernet linksusing XAUI (4 x 3.125 Gbits/second).Future bandwidth scalability exists, witheach lane capable of 10GBASE-KR (singlelane, 10.3125 Gbps). We used Xilinx®Virtex-5 FPGAs to implement control-

available power supplies, the active CMCregulates power demand for the entirechassis by allowing the IMCs and PMCsto power on only when capacity remains.The fan tray and power supply modulesuse a chassis I2C bus where the activeCMC ensures sufficient power and cool-ing for the entire chassis.

Once the CMC detects a new blade inthe chassis, it uses a discovery protocol tolocate and identify each blade. The discov-ery protocol exchanges IPMI messages toretrieve serial number, model number andinventory for the blade or module, andthen sends a command with power permis-sions if the blade can be correctly identifiedand power is available. The CMC monitorsevents from each IMC or PMC to detectwarning and critical conditions, collectingand storing them in an event log.

To provide a rich user feature set, ourprimary user interface to the chassis man-agement system is through a window man-agement or command line interface (WMIor CLI) running on an application proces-sor blade. The application processor bladeeither occupies a chassis slot or remotelycommunicates through a control-planenetwork connection to the chassis manage-ment controller. The CMC autonomouslyprovides chassis oversight for health and

plane Gigabit Ethernet functions with hardtrimode Ethernet media-access controllers(TEMACs) as well as data plane 10-GbitEthernet using the Xilinx 10GEMAC IPcore with integrated XAUI (usingRocketIO™ serial transceivers).

Blade Management ServicesOur chassis management solution is a hier-archical structure, with the IMC and PMCentities operating in much the same way asa standard IPMI baseboard managementcontroller (BMC) for their respective mod-ule or blade. For each blade, the controllersuse IPMI commands to monitor voltageand temperature, provide a power controlinterface, enable blade network interfaceconfiguration and allow interrogation ofmanufacturing data.

As the center of chassis management,the CMC shares common IPMI BMCboard-level features with the IMC andPMC, along with additional higher-levelchassis management functionality. Thedual redundant CMCs operate in anactive/standby capacity with automatichealth monitoring and failover. This chas-sis management capability allows theCMC to discover, inventory and providepower permissions for all blades and mod-ules. To prevent oversubscription of the


Service Control• Traffic Prioritization• Bandwidth Usage Control• Network Flow Analysis• L2-L7 Access Analysis• P2P Control & Cache• Tiered Services & QoS

Transport• IPv4 & IPv6 Migration (Transition, Stacks)• Multicasting & Acceleration (IPTV, VoIP, P2P)• Content-Based Routing (User Services)

Security• DDoS Mitigation• DNS Protection• BGP Protection• Botnet Protection• Content Filtering• Native IPv6 Security

Service Control

Security

Transport

CloudShield Deep Packet Inspection

Figure 1 — Next-generation deep-packet inspection processing system enables convergence of service control, security and transport functions into integrated services and capabilities.


environmental monitoring, presenting thisinformation through front-panel LEDs,while application-specific user controlcomes from the application processorblade. This same blade interrogates allblades and modules, including the CMC,to provide the user with environmental aswell as operational health and status. TheCMC contains an RS485 IPMI proxy serv-ice to pass application processor bladecommands on the control plane through tothe RS485 IPMI path to the target blade.

Each CMC has five independent point-to-point RS485 links to each blade. Unlikea bus topology, this architecture providesincreased security from eavesdropping,minimizes contention and increases reliabil-ity, in that a failing blade cannot disruptcommunication to other blades. Ourredundancy implementation requires thestandby CMC to constantly monitor thehealth of the active CMC. Upon detection

of a failure, the standby CMC can take over,with its independent set of radial RS485links, to keep the chassis operational.

Management Controller Hardware ArchitectureSharing common hardware functionality,our controllers include FPGA multibootand fallback capability, PowerPC440processor, SDRAM and NOR flash memo-ry configuration, interrupt controller, serialconsole UART, ICMB RS485 custom core,Gigabit Ethernet links, system monitorsensors and local I2C buses.

Our controller family is uniquely archi-tected to share a common hardware plat-form using a single microprocessorhardware specification file that defines theembedded-processor core instantiations,connectivity and parameterization. Ourmini boot loader resides in internal blockrandom-access memory (BRAM), whichcopies the U-Boot boot loader from flash

to DDR memory and then jumps to the U-Boot entry point.

Taking advantage of the features of theVirtex-5 FX30T FPGA, CloudShield’steam easily assembled an advanced embed-ded system, integrating the PowerPC 440,DDR2 memory controller, multiportmemory controller for flash, EthernetMAC/PHYs, system monitor and otherstandard cores with our custom core.

By leveraging the Virtex-5’s multibootreconfiguration, the system achieves truein-system FPGA upgradability without aprocessor to manage bitstream loading.Our external flash supports both “golden”and “upgrade” bit files located at addressesthe revision select pins determine. TheFPGA’s ability to fall back and reconfigureusing the golden image in the event of anupgrade image failure was essential to ourdesign objective of a configurable high-availability system.


Chassis Management Module (Active)

CMC

Chassis Management Module (Standby)

CMC

IMC

PMC

PMC-A2

Redundant Radial RS 485

IMC

Interface Switch Modules

Deep Packet Processor Blades

Application Processor Blade

Power SuppliesFan Tray

Figure 2 – Secure ICMB chassis management system



Post-configuration, the PowerPC 440core operates at 400 MHz, the crossbarand MPLB bus at 133 MHz, TEMACs at125 MHz and DDR2 at 266 MHz. Ourconstructed hardware system covered thesuperset of functionality needed amongthe CMC, IMC and PMC variants asshown in Figure 3. The flash containsenough space for multiple application pro-grams, including the CMC, IMC andPMC configurations.

We were able to rapidly craft the base sys-tem thanks to the wide variety of standardperipheral cores available in the IP library.We selected cores as complex as GigabitEthernet (with choices of 1000BASEX,SGMII and GMII PHY interfaces) and assimple as traditional 16550 UARTs to meetour system requirements.

Key requirements for a blade controllerinvolved chip- and board-level monitor-ing and alarms for voltages and tempera-tures. We took advantage of an integratedsystem monitor, simply instantiating theSYSMON_ADC pcore from the standardIP library. The system monitor is a hardmacro on Virtex-5 devices that contains a10-bit, 200-kilosample/s analog-to-digitalconverter, supporting both on-chip andoff-chip monitoring of supply voltagesand temperatures.

Off-the-shelf IP and the availability ofan Avnet Virtex-5 FXT evaluation kit thatused the XC5VFX30T FPGA allowed us torapidly prototype our design and kick-startthe software development effort right outof the box.

With the software effort under way, wethen created custom IP cores to satisfy anumber of needs. While off-the-shelfpcores might meet functional require-ments, multiple instances of simple pcoressuch as GPIO can result in excessiveProcessor Local Bus (PLB) loading as wellas higher slice utilization. Rather thanincur the expense of multiple IP interfaces,we built a single super-GPIO module withthe help of the Create Import Peripheral(CIP) wizard integrated into XPS.Amortizing a single IP interface over mul-tiple GPIO registers, we collapsed largenumbers of internal and external registersinto a single custom pcore including revi-

sion, scratch, and internal and externalGPIO registers.

We also used the CIP wizard to buildour special-purpose custom IP. The wizardcreated the necessary files (MPD, PAO,HDL templates) and directory structure,

and offered easy selection of IP interfaceservices and user logic interfaces. It alsogenerated the bus functional model neededfor simulation. Our novel PicoBlaze™-based coprocessor pcore is described laterin greater detail.

CMC5VFX30T

PPCMCDDR2

TEMACGMII

customGPIO

User UART

MPMC

Console UART

PPC440 CPU

RemoteI2C

PMCUART

PMCUART

PMCUART

LocalI2C

SysMonADC

BootBRAM

JTAG PLB46 INTC

IMC5VFX30T

PPCMCDDR2

TEMACGMII

customGPIO

MPMC

Console UART

PPC440 CPU

RemoteI2C

LocalI2C

SysMonADC

BootBRAM

JTAG PLB46 INTC

TEMACSGMII

TEMACSGMII

PMC5VFX30T

PPCMCDDR2

customGPIO

MPMC

Console UART

PPC440 CPU

RemoteI2C

LocalI2C

SysMonADC

BootBRAM

JTAG PLB46 INTC

TEMACSGMII

TEMACSGMII

customCMB Pblaze

customCMB Pblaze

customCMB Pblaze

Figure 3 – Common hardware architecture for CMC (top), IMC (center) and PMC versions.


IPMI Messaging Our chassis management controller withits corresponding system interfaces isshown in Figure 4. We use the customICMB coprocessor pcore for IPMI mes-saging to the processor-management andinterface-management blade controllers.GPIOs monitor and control the frontpanel, backplane, redundancy and inter-locks, as well as internal registers, whileTEMACs are used for control-planeEthernet switch management. Our I2Csread the local module EEPROM, andhandle remote monitoring of chassis fansand power supplies. UARTs take care of

front-panel user connectivity, debug con-soles and proxy UARTs to CPUs onprocessor blades.

The real-time protocol requirements ofthe Intelligent Chassis Management Buspresented us with a problem in that theembedded Linux operating system couldnot natively guarantee interrupt latency.

Our solution uses a custom ICMB pcorewith the PicoBlaze 8-bit sequencer as anICMB data-link and physical-layercoprocessor offloading the PowerPC. Thisunderappreciated resource (see Xcell Journal,issue 67, page 53, “A Hidden Gem:PicoBlaze IP”) is shown in Figure 5, coupled

with BRAM instruction store, registers,communication FIFOs, UARTs and timers.After initialization, the PowerPC holds thePicoBlaze in reset, loads code into thePicoBlaze program BRAM and thenremoves reset, activating the coprocessor.

We wrote our PicoBlaze code in theform of a state machine to monitor thetransmit FIFO and control registers, imple-ment the collision-avoidance and back-offalgorithms, and manage five UARTs.

The TX FIFO passes 8-bit data from thePowerPC to the PicoBlaze for transmissionon the ICMB bus. To transmit a packet, thePowerPC writes the packet into the TX


PMC Slot 3

IMC Slot 4

IMC Slot 5

I2C Control/Monitoringfor fans, power supplies

Front panel buttons /LEDsBackplane monitor

InterlockRedundancy

DDR2 DRAM64MB

FLASHConfigSW app

Upgrades

IPMI Control Messages

FRUEEPROM

Gigabit Ethernetswitch

CMC5VFX30T

PPCMCDDR2

TEMACGMII

customGPIO

User UART

MPMC

customCMB Pblaze

ConsoleUART

PPC440 V5 CPU

RemoteI2C

PMCUART

PMCUART

PMCUART

LocalI2C

SysMonADC

BootBRAM

JTAG PLB46 INTC

PMC Slot 2

PMC Slot 1

BladeVoltage

Temperature

Front PanelRJ-45

Debug

Backplane

Figure 4 – CMC system interfaces

The embedded Linux operat ing system could not nat ively guarantee interrupt la tency. Our solut ion leveraged the

capabi l i t ies of the PicoBlaze 8-bi t sequencer.



FIFO, followed by a write to the controlregister with a “start” bit and channel num-ber. Using a common transmit UART andindividual transmit enables, the PicoBlazeenables a single RS-485 transceiver for datatransmission. Upon completion, thePicoBlaze indicates success or failure bywriting to the status register and generatingan interrupt to the PowerPC.

Packets come in on multiple UARTssimultaneously. The PicoBlaze maintainsindividual state information for each chan-nel. Indications of start/end of packet anddata bytes are placed in the RX FIFO alongwith the channel number, and thePowerPC gets an interrupt when data isavailable. The RX FIFO passes 8-bit data,

5-bit status (start/end of packet, error indi-cations, debug information) and 3-bitchannel number information from thePicoBlaze to the PowerPC. The PowerPCthen demultiplexes packet data and, whenvalid, sends it up to the servicing agent. Weincluded additional trace and state transi-tion capability in the receive channel inorder for the PowerPC to print debug mes-sages to the console.

We used the CIP wizard to create a busfunctional model, enabling us to generatebus stimulus and verify the operation ofthe peripheral core without needing to cre-ate an extensive test bench or full-systemsimulation. The wizard made it easy togenerate, respond to and analyze bus trans-

actions. As a result, we created and modi-fied our simulation for the custom pcoreand got it running in a matter of hours.

We used the output of the PicoBlazeassembler to initialize the program BRAM.Next, using the Bus Functional Languageto describe PLB bus transactions, we gener-ated a PicoBlaze reset, sent data to the TXFIFO and wrote to the control register totransmit data on a channel. Looping backtransmit and receive ports was a simple wayto verify the operation of the receive chan-nels and RX FIFO.

Firmware Architecture There are many embedded-OS choicesavailable for the PowerPC 440 and, specif-ically, the Xilinx hard PPC440 core. All ofthe management controllers required real-time performance, a robust network stackand extensibility for our custom peripher-als. Our choice of embedded Linuxallowed us to take advantage of existingopen-source resources, utilities, optimiza-tions and support.

After reset, the PowerPC bootsequence begins from reset vector spacewithin a small internal BRAM. Mappedinto this area is a small mini boot loader(12 kbytes), which looks for a valid U-Boot image at an “upgrade” or “golden”location within the NOR flash. Since thegolden image is programmed only at thefactory during the manufacturing process,this choice eliminates the possibility of afailure during flash programming of a newU-Boot version. Once it detects and vali-dates the proper U-Boot image, thePowerPC loads it into SDRAM and exe-cutes it. U-Boot then loads the properLinux image files and passes control forthe final time. At this point, a fully con-figured Linux kernel loads and begins ourapplication-specific processes.

We considered several options for theboot sequence during the design process.For example, the mini boot loader couldhave mapped the NOR flash into thereset vector area and eliminated theBRAM. This would have saved someBRAM, but the processor would then bedependent on a valid NOR flash part andimage, which could be corrupted during a

IP InterConnect (IPIC)

TX FIFORX FIFOCtl RegStatus Reg

PB

laze Reset

ProgramBRAM

Kcpsm3 (PicoBlaze)

Bbfifo16x8

Uart Clock

Kcuartrx

Gap Timer

TransmitEnable Control

BackOff Timer

PLB v46

SlaveAttachment

InterruptControl

Register I/F Address Range Decode

IPIF

USER

LOGIC

PICOBLAZE

CUSTOM

Bbfifo16x8

Kcuartrx

Bbfifo16x8

Kcuartrx

Bbfifo16x8

Kcuartrx

Bbfifo16x8

Receive FIFOs /UARTs

Kcuarttx

Receive Timer

StartUp Timer

Kcuartrx

Read

Figure 5 – Custom ICMB PicoBlaze pcore


failed flash attempt. Similarly, a casecould be made for eliminating U-Boot byloading and executing Linux directly.While this could be a much more directboot sequence, we found the features ofU-Boot during the development phase tobe extremely valuable for prototype bring-up and debugging.

Our initial design challenge was tolocate and choose the correct open-sourcetrees. DENX Software Engineering(www.denx.de) has extensively developedthe Das U-Boot boot loader for a varietyof embedded-processor boards. But whileit directly supports the MicroBlaze,PowerPC 405 and PowerPC 440 proces-sor cores, the DENX version does notinclude the latest patches, board supportor drivers for some of the XPS IP coresthat come with the EDK.

Therefore, we chose to use the Xilinxdevelopment branch of the U-Boot tree(xilinx.wikidot.com), which contains thelatest changes and the drivers for the I2C,ENET and LLTEMAC, providing addi-tional functionality without starting fromscratch. Xilinx frequently updates its devel-opment branch from the original DENXmaster tree, submitting changes upstreamfor acceptance into the DENX tree.

Having selected the Xilinx U-Boottree, we looked at the main Linux tree(www.kernel.org) and its support for theXilinx drivers. We similarly concludedthat we obtained more working function-ality when choosing the Xilinx-main-tained Linux development branch (also atxilinx.wikidot.com), which included thelatest changes that may not necessarily bein the main Linux tree.

A potential drawback of using theXilinx-maintained trees is that there is lagtime in terms of synchronization with themain trees, and they are officially devel-opment-quality branches. Nevertheless,we selected the Xilinx trees to support theLLTEMAC cores and newer features inour design.

In order to build U-Boot and Linux,we needed a Linux distribution andmachine to run the scripts and tools thatcross-compile, link and build thePowerPC executable. While this could be

a barrier if only Windows-based machineswere available, there are now a fewWindows virtual machines that runLinux within the Windows environment.The added benefit is that once we set upa build machine, we could back up thevirtual-machine disk image or replicate itfor use on other development machines.We selected Sun xVM VirtualBox as alightweight, reliable and easy-to-use vir-tual machine that will run many differentOS types, including Linux.

While the Linux kernel provided anexcellent foundation, the actual drivers andapplications handled the entire workloadspecific to our application. The Xilinx I2C

and LLTEMAC drivers connect the LinuxIP stack to the hardware, and theCloudShield ICMB driver hooks the cus-tom PicoBlaze IP core into the Linux driv-er subsystem. We used Busybox as astandard Linux application, and addedthree new CloudShield applications forchassis management by the CMC, asshown in Figure 6.

Our RS485 proxy application connectsto a standard IP port and communicatesover the control-plane Ethernet, decodingeach packet. Packets received from ourWMI/CLI on the application processorblade may be addressed to three differenttargets in the system: BMC functions onthe chassis management module board,CMM blade management for the chassisor other blades within the chassis. Allbaseboard management controller func-tionality runs over the I2C buses, handling

CMC and chassis health requests. TheRS485 proxy application translates pack-ets destined for other blades in the chassisinto ICMB packets and sends them overthe RS485 link to the appropriate destina-tion. Blade management packets allow theuser to obtain chassis status, such as powersubscription, as viewed by the CMC.Some additional minor applications forredundancy and front-panel serial consoleinteraction complete the CMC.

Our IMC and PMC were very similar tothe CMC firmware, except that we replacedthe blade-management application with aswitch- or processor-management applica-tion appropriate for that blade.

Architect, Build, Verify, Deliver The key to rapidly architecting, develop-ing and delivering complex embedded-system solutions is “right-sizing” thechoices, efforts and methodology. Thecombination of powerful embeddedprocessors in FPGAs, off-the-shelf IP andaccelerated design and verification ofcustom IP enabled us to create multipleembedded-system configurations with acommon hardware design. The benefitsof using Linux and an integrated devel-opment environment freed us to focus onthe target application while utilizing theavailable open-source services, driversand libraries.

Indeed, the ability to rapidly alter inter-nal FPGA embedded architecture, easilydeploy in-system upgrades and exploit newhardware-software trade-off boundaries istransforming embedded-system design.


Linux kernel

LLTEMAC driver Linux drivers

BusyboxBMC functions

IP stack

Blade mgmtRS485 Proxy

ICMB driverI2C driver

Applications

Environmental Control PlaneEthernet

RS485

Figure 6 – CMC firmware architecture


More gates, more speed, more versatility, and ofcourse, less cost — it’s what you expect from The Dini Group. This new

board features 16 Xilinx Virtex-5 LX 330s (-1 or -2 speed grades). With over 32 Million ASICgates (not counting memories or multipliers) the DN9000K10 is the biggest, fastest, ASICprototyping platform in production.

User-friendly features include:

• 9 clock networks, balanced and distributed to all FPGAs

• 6 DDR2 SODIMM modules with options for FLASH, SSRAM, QDR SSRAM, Mictor(s),DDR3, RLDRAM, and other memories

• USB and multiple RS 232 ports for user interface

• 1500 I/O pins for the most demanding expansion requirements

Software for board operation includes reference designs to get you up and running quickly. Theboard is available “off-the-shelf ” with lead times of 2-3 weeks. For more gates and more speed,call The Dini Group and get your product to market faster.

www.dinigroup.com • 1010 Pearl Street, Suite 6 • La Jolla, CA 92037 • (858) 454-3419 • e-mail: [email protected]

by Guy EschemannResearch Engineer Sennheiser electronic GmbH & Co. [email protected]

Heinz LüdigerProject ManagerIMST [email protected]

Birgit KullSenior ScientistIMST [email protected]

Since the Federal CommunicationsCommission authorized the unlicenseduse of the ultrawideband (UWB) radiotechnique in 2002, most of the technolo-gy’s commercial applications, such aswireless USB, have been based on fre-quency-domain modulation techniques(such as OFDM) for high-data-rate trans-missions. Alongside this establishedapproach, UWB offers the potential foranother kind of data transmission basedon ultrashort pulses with durations onthe order of nanoseconds. So-called


Virtex-5 Propels UltrawidebandComms and Ranging Europe’s PULSERS project

uses Xilinx FPGA with MicroBlaze processor in impulse-radio UWB system.

XCEL LENCE IN WIRELESS COMMS

impulse-radio (IR) systems transmitinformation by modulating one or moreof the pulse parameters, such as positionor amplitude. At the same time, by meas-uring the pulses’ time-of-flight, they canimplement centimeter-accurate rangingcapabilities.[1] This opens up the field toa new range of location-aware applica-tions in such diverse areas as logistics(package tracking), manufacturing,search and rescue (communication withand localization of firefighters, for exam-ple) or smart tour guides.

The European PULSERS Phase IIProject, an industry-led collaboration intoUWB radio technology by 30 key industri-al and academic organizations, has set outto design and implement an IR-UWB com-munication and ranging system [2] featur-ing data transmission capabilities in themegabit/second vicinity and a ranging reso-lution of 4 cm. Our system consists of a setof identical autonomous nodes, each ofwhich can communicate with and deter-mine the range to every other node in thenetwork. A node consists of a custom UWBdaughterboard connected to an off-the-

measuring the time delay between sending aranging request and receiving the answerfrom the remote node (see sidebar). Rangingrequests are always sent in beacon slot 1,while ranging answers are expected to comeback in slot 3. This gives the remote node theduration of a full beacon slot (slot 2, approx-imately 33 microseconds) for processing thereceived ranging request and scheduling theoutgoing ranging answer.

shelf Xilinx® ML506 development board(see Figure 1). The high performance of theVirtex®-5 SXT architecture combined withthe flexibility of a MicroBlaze™ softprocessor allowed us to implement theentire baseband signal chain as well as allthe higher system layers in a single FPGA.

IR-UWB Communication and RangingTo transmit information, our system uses asimple pulse-position modulation with fourpossible time shifts (4-PPM), where eachpulse encodes two data bits. Pulses aregrouped into frames and transmitted in apredefined raster of beacon frames andtime-hopping frames, as illustrated inFigure 2. Each beacon frame consists ofthree identical beacon slots that customerscan use for ranging or communication pur-poses. We initially intended the time-hop-ping frame to carry high-data-ratetransmissions based on time-hopping code,but we’ll use that technique in a later prod-uct. At this time, all the data transmissionsoccur in the beacon frames.

We performed ranging using a methodknown as two-way ranging, which consists of


UWBRX

PAUWBTX

TCXO

ClockGenerator

A

D

CPLD Exp

ansi

on H

eade

r

Exp

ansi

on H

eade

r MicroBlaze

Baseband

AC97Controller

UART

MultiportMemory

Controller

AC97CODEC

RS232

SDRAM

FPGA

RF Front End ML506

SMAconn. Antenna switch

Comparator trigger

COMM

240 MHzClock

ToA

LFS

RLF

SR

ToT

60 MHz clock

SMAConn.

Diff.clock input

PLB

Beacon frame Beacon frameTime-hopping frame Time-hopping frame

...slot #1 slot #2 slot #3 slot #1 slot #2 slot #3

Typ. 100 μs

Typ. 1 ms

Center frequency 7.68 GHz

Baseband bandwidth 750 MHz

Expected range 25 - 30 meters

Actual range 3 m (due to local-oscillator crosstalk)

Ranging resolution 3.9 cm

Communication data rate > 1 Mbit/s

Figure 2 – Periodic beacon frames consisting of three beacon slots are interspersed with time-hopping frames.

Figure 1 – The system consists of an off-the-shelf Xilinx ML506 board connected to a custom UWB daughterboard.

Table 1 – UWB communication and ranging system characteristics


System ArchitectureThe ultrawideband daughterboard carriesboth the impulsive-transmitter and incoher-ent-receiver ASICs, which we designedspecifically for this project using IHP’s 0.25-micron SiGe:C BiCMOS technology. [3, 4]

The transmitter ASIC, which generatesUWB pulses as shown in Figure 3, is capa-ble of modulating both the amplitude andthe position of the generated pulses. Itincludes a 3.84-GHz counter for preciselyscheduling the time of transmit of the out-going pulses and measuring the time ofarrival of the received pulses.

The reception path splits into twobranches inside the receiver ASIC. The firstbranch, which has a relatively narrowbandwidth (120 MHz), serves communi-cation and coarse pulse-timing purposes.Precise pulse timing occurs on a secondreception branch that utilizes the fullimpulse bandwidth (750 MHz). On thisbranch, a high-speed comparator detectsincoming pulses. Its output triggers thereadout of the 3.84-GHz counter runninginside the transmitter ASIC. Thus, thetime of arrival of each received pulse ismeasured with a resolution of 260 ps,which translates into a spatial resolution ofapproximately 8 cm.

The daughterboard feeds the basebandmodule in the Virtex-5 FPGA with twodata buses running at 120 MHz. The com-munication (COMM) bus carries the ADCsamples while the time-of-arrival bus con-veys the high-resolution time stamps asso-

ciated with the received pulses. Both busesgo through an XC95144XV CPLD which,while not strictly required, was a greatdebugging aid. We can set the CPLD tooutput a sequence of pseudo-random num-bers [5] on the buses to the FPGA. Wethen use the CPLD output to both adjustthe FPGA’s input timing and to verify theintegrity of the bus lines—tasks that wouldhave been difficult without a priori knowl-edge of the transmitted data sequence.

Inside the FPGA, the baseband module(see Figure 4) takes care of both the encod-ing of the outgoing pulses and the decod-ing of the received pulses. The transmissionpart of the baseband module is relativelystraightforward, consisting mostly of outer(CRC) and inner (convolutional) coding.The implementation of the reception

counterpart involves, among other things, achannel estimator and a custom Viterbidecoder, and is thus much more resource-intensive. We connected the basebandmodule to the processor system via aProcessor Local Bus (PLB) interface.

While programmable logic is notori-ously more difficult to debug than soft-ware, the ChipScope™ Pro tool, withboth an integrated logic analyzer and abus analyzer configuration, was of greathelp during debugging sessions. The logicanalyzer proved useful for simultaneouslycapturing a burst of COMM and time-of-arrival samples, providing real-world datato our MATLAB® simulator. The busanalyzer helped us debug a few issuesrelated to the PLB interface of our base-band module.


SyncHeader

Correlator

SyncDecider

Channel Estimator

AGC & Sync. Header InsertionSymbol

Generator

LLVCalculator

Convolutional EncoderIncluding Tail Bits

PPMDecider

ViterbiDecoder

CRCCheck

CRCGenerator

Noiseestimate

RX pulseshape

estimate

LLVs ofPPM

positions

LLVs ofcoded bits

RX signal(ADC samples)

OutputPPM symbols

Decodeddata bits

Outputdata bits

Figure 4 – The baseband module’s reception (top) and transmission chains.

Figure 3 – A UWB pulse consists of a 7.68-GHz carrier wave with a Gaussian envelope.



Processor SystemWe generated the processor system using theBase System Builder wizard within the XilinxPlatform Studio (XPS) design tool, which gaveus a perfectly working system to start with.After that, we incrementally modified the basesystem to obtain the system shown in theFPGA section of Figure 1. These modificationsinvolved, among other things, switching to adifferential clock input and connecting thebaseband module to the PLB.

The software application runs on anembedded MicroBlaze processor on top ofXilkernel, which is a minimal real-time oper-ating system perfectly suited for small appli-cations. The application is divided into threethreads that run concurrently:

• The UWB thread manages the configura-tion and operation of the baseband module.

• The application thread is responsiblefor the audio acquisition and playbackactivities when the system is used indata transmission mode.

• The RS232 thread communicates withan external PC running the demonstra-tion graphical user interface.

Since the GNU development chain,which XPS uses, is available on a numberof other platforms, we could easily com-pile and test the hardware-independentmodules of code on a host PC (for exam-ple, using the Cygwin environment)rather than on the embedded target. Thatmade debugging a lot easier. Only thefinal testing had to be done on the embed-ded target, and having a source-leveldebugger like GDB was a real blessing.Xilinx application note XAPP1037 [6]

showed us many useful tricks for debug-ging our software.

A few hardware issues, located in theUWB ASICs, currently limit the nominalrange of our system to 3 meters instead ofthe initially expected 25 to 30 meters.Still, we have been able to demonstrateboth the communication and rangingcapabilities of the system, which makesthe project a great success.

Future work may include redesigningthe UWB ASICs to increase the system’soperating range, as well as implementingmultilateration capabilities to move from aranging system to an actual indoor posi-tioning system.

For more information, visit http://www.imst.de/de/forschung_pul.php or [email protected].

Acknowledgements

We wish to acknowledge Erwin Stenzel and DanielKotzor from EADS; Gunter Fischer from IHP; ThorstenKohl, Jac Romme and Norbert Schmidt from IMST;Maria Dolores Pérez-Guirao from the Leibniz Universityof Hannover; Axel Schmidt from Sennheiser, andDajana Cassioli from RadioLabs for their contributionsto this project. Thanks to Steven Backer from Sennheiserfor reviewing drafts of this article.

This work was co-funded by the European Commissionunder the Information Society Technologies integratedproject PULSERS Phase II.

Citations

[1] S. Gezici, Z. Tian, G. B. Giannakis, H. Kobayashi,A. F. Molisch, H. V. Poor and Z. Sahinoglu,“Localization via Ultra-Wideband Radios,” IEEESignal Processing Magazine, July 2005.

[2] H. Lüdiger, B. Kull, M. D. Perez-Guirao, “An Ultra-Wideband Approach towards Autonomous RadioControl and Positioning Systems in Manufacturing &Logistics Processes,” Proceedings of the 4th Workshopon Positioning, Navigation and Communication,Hannover, Germany, March 2007.

[3] G. Fischer, O. Klymenko, D. Martynenko, “Time-of-Arrival Measurement Extension to a Non-CoherentImpulse Radio UWB Transceiver,” Proceedings of the5th Workshop on Positioning, Navigation andCommunication, Hannover, Germany, March 2008.

[4] O. Klymenko, G. Fischer, D. Martynenko, “A HighBand Non-Coherent Impulse Radio UWB Receiver,”Proceedings of the IEEE International Conference onUltra-Wideband, ICUWB 2008, Hannover, Germany,September 2008.

[5] P. Alfke, “Efficient Shift Registers, LFSR Counters,and Long Pseudo-Random Sequence Generators,”XAPP052 application note.

[6] B. Hill, “Introduction to Software Debugging onXilinx MicroBlaze Embedded Platforms,” XAPP1037.

Two-way ranging demystifiedIn two-way ranging, the technology we employed in our impulse-radio UWB system,the distance between node A and node B is determined using the following tech-nique (see figure):

1. Node A sends a ranging request

to node B and starts its high-

resolution clock (3.84 GHz).

2. Node B receives the ranging

request after the signal propaga-

tion delay , which is

proportional to the distance

between nodes A and B.

3. Node B sends back a ranging answer to node A after a known

processing delay .

4. Upon receipt of the ranging answer, node A stops its clock at time . It can

then compute the one-way signal propagation delay , which,

multiplied by the speed of light, gives the range between A and B.

The 3.84-GHz clock’s time resolution of 260 picoseconds yields a spatial resolu-tion of approximately 8 cm. Since, however, the radio signal traverses the distancebetween the two nodes twice, the range can be determined with a resolution of 4 cm.

Knowing the range between itself and three non-co-linear anchor nodes, amobile node can compute its 2-D position. Using four non-co-planar anchornodes, it can even find its 3-D position.

– Guy Eschemann, Heinz Lüdiger and Birgit Kull

Node A Node B

Node A Node B

Node A Node B

Node A Node B

RR

RR

RA

RA

The two-way ranging process


by Greg CrouchEmbedded Systems Business DirectorNational [email protected]

Facing increased regulation and the need toreduce factory operating costs, machinebuilders are looking for solutions to boostproduct power efficiencies. Along withHVAC systems, the top consumers of elec-tricity in a factory are water heating, light-ing, office equipment and, especially,machinery. More specifically, the motorswithin these factory machines are responsi-ble for approximately two-thirds of thetotal electrical-energy consumption in atypical industrial facility. Motors are every-where—in blowers, pumps, compressors,conveyors, machine tools, mixers, shred-ders and more.

One way to get the maximum efficiencyfrom the motors that control machinery isto employ more efficient and sophisticatedfield-oriented control to optimize their effi-ciency (see sidebar, “Motor Efficiency Leadsto Greener Bottom Line”). To this end, ourteam at National Instruments (NI) usedXilinx® FPGAs as the basis for a commonhardware architecture called reconfigurableI/O (RIO) to create a flexible embeddedcontroller with high computing perform-ance. Our machine-builder customers useRIO as a platform from which to drawfield-oriented control (FOC) techniquesthat improve motor efficiency.


FPGA-Powered PlatformControls Industrial Motorsfor Maximum EfficiencyNational Instruments uses Xilinx devices inCompactRIO design for industrial-equipmentbuilders requiring optimal motor control.

XCEL LENCE IN AUTOMOTIVE & ISM

This RIO architecture is now deployedin many systems, such as those fromEUROelectronics, Srl. The architecturehelped EUROelectronics advance from theprototyping phase to the final machinesetup in only three months (Figure 1).

Reduced Machine Design Time U.S. Department of Energy data informsmachine builders that switching to amotor with a 4 to 6 percent higher effi-ciency rating can pay for itself in just twoyears if that motor is in operation formore than 4,000 hours a year.Unfortunately, many machines hostmotors that are very large and too costlyto replace. So in these cases, the key toreaping savings lies in updated drive-con-trol algorithms and controller hardware.

A second challenge is the integratedcontrol complexities required for brush-less DC and permanent-magnet synchro-nous AC motors (PMSM), bothcommonly grouped together as brushlessDC motors (BLDC). Many machinebuilders lack the software or hardwaredesign expertise required to build anembedded controller that can executereal-time closed-loop control on a widevariety of analog and digital sensor types.

backplanes, combined with PowerPC®

603e-based processors in various perform-ance frequencies (Figure 2).

We built into our RIO framework con-figuration software utilities and dynamicI/O reconfiguration capabilities that savetime in setup and reuse for both the end-

To reduce time to final design forembedded-machine builders, we incorpo-rated one form of our RIO-based architec-ture into a product called CompactRIO.These FPGA-based configurations includesystems built from Xilinx Virtex®-5 LX85to Spartan®-3 and Virtex-II 1M-gate-based



LED/DIP

Disk on Chip

DDR SDRAM

LED/DIP

Fuse

UART Ethernet Phy

Power SupplyInterface

RTC

Battery

PC

I/Lo

calB

us UART Ethernet MAC

PC

I/Loc

alB

us

Mem

ory

GPIO 12C

DMAASIC

Configuration Flash

Onboard ADC,DAC, 24V DIO

5V TolerantDIO

110 DIO

DIO

DIO

NI C Series I/O

Custom I/O

Custom I/O

Processor FPGA

Figure 1 – Embedded-machine builder EUROelectronics reduced power use with FPGA-based field-oriented control.

Figure 2 – Modular RIO architecture-based framework for NI’s single-board RIO and CompactRIO configurations

application programmer and the digitaldesign engineer. Our configuration soft-ware automatically detects custom hard-ware installed in the system. Integrateddiagnostic tests of I/O peripherals ensurethat I/O devices function properly. Thedesigner connects the custom circuitrydirectly to the Xilinx FPGA, and can

design logic with the Xilinx tools or the NILabVIEW FPGA Module.

Planning for dynamic configurationhelps the machine builder design for hard-ware reuse. At the same time, it providesthe ability to begin high-level applicationcoding before the final new I/O circuitrydesign is complete.

It can be problematic if driver softwareand the associated APIs do not executeproperly or return device-specific errorswithout the I/O circuitry installed. To getaround this problem, software developersoften create simulation subroutines thattemporarily replace the I/O circuit codewithin the application. This method makesit difficult to get a start on the applicationdevelopment and virtually impossible to

debug the code. Our RIO middlewaredriver architecture includes functionality tointegrate simulation code directly into thefunctional driver, thus simplifying codereuse and debugging.

For example, Figure 3 describes theembedded middleware software designhierarchy. These middleware drivers and

system services have proven their mettlein thousands of deployed machine-builder applications. Parallel and multi-thread-safe embedded-middleware driversare an integral part of RIO. The machinebuilder can call both multithread-safe andreentrant functions from multiple threadsat the same time, and still operate cor-rectly without blocking—an importantfeature for writing parallel code and opti-mizing performance. Drivers that lackreentrant execution can erode perform-ance or, worse, cause a crash. Your codehas to wait until the other threads aredone using each function before it canaccess them. Reentrancy is an importantconsideration to eliminate any unneces-sary dependencies in your code.

Help from FPGA Control Algorithms Although brushless DC motors and perma-nent-magnet synchronous AC motors areboth considered brushless DC motors, theydiffer in the way their stator is wound.When rotated, the stator of the BLDC iswound in such a way as to produce a trape-zoidally shaped back-EMF voltage, while

the PMSM’s voltage is sinusoidally shaped. Brushless DC motors are more costly

than AC induction motors but provide bet-ter energy efficiency and performance whencontrolled using advanced algorithms.Moreover, they can scale up to serve veryhigh-power and high-speed applications.While AC induction motors still dominatethe market, brushless DC motor sales havequadrupled over the last five years to morethan $1.2 billion, according to the ARCAdvisory Group.

BLDC motors are a type of synchronousmotor. This means the magnetic field thestator generates and the magnetic field offthe rotor rotate at the same frequency.Usually BLDCs are equipped with threephases. The stator of a BLDC motor con-



Figure 3 – With processor middleware drivers, machine builders can focus on custom circuitry design. They can link to the programmable Xilinx FPGAs through standard header connectors.

The machine builder can call both multi thread-safe and reentrant functions from multiple threads at the same t ime. Drivers that lack reentrant execution

can erode performance or, worse, cause a crash.


sists of stacked steel laminations withwindings placed in slots that are axially cutalong the inner periphery. Most BLDCmotors have three stator windings connect-ed in a star fashion. The internal structureis like that of an induction motor contain-ing pairs of permanent magnets on therotor rather than windings.

As the name implies, the brushless DCmotor is designed to operate without brush-es. This means that the commutation thebrushes provide must now be handled elec-tronically. To rotate the BLDC motor, thestator windings are energized in a sequence.

To calculate which winding to energize at atime, it is necessary to know the rotor posi-tion, typically measured by three Hall-effectsensors embedded into the stator. Based onthe triple combination of these sensor sig-nals, the control electronics can determinethe exact sequence of commutation.

Because brushless motors use perma-nent magnets in their rotors rather thanpassive windings, they natively providehigher power than induction motors fortheir size and weight. The key to high-effi-ciency operation, however, lies in theFPGA-based controller.

FPGA-based algorithm control deliversbetter efficiency than microprocessors canachieve. A wide range of control-systemalgorithms are available, including trape-zoidal, sinusoidal and field-oriented.

Trapezoidal, or six-step, control is thesimplest but lowest-performance method.For each of the six commutation steps,the motor drive provides a current pathbetween two windings while leaving thethird motor phase disconnected. However,torque ripple causes vibration, noise,mechanical wear and greatly reducedservo performance.


On today’s factory floor, the motor-driven equipment is estimatedto be responsible for two-thirds of the total electricity-energy con-sumption. This is creating a challenge for equipment manufactur-ers to develop more energy-efficient systems. The effort can bemanifested in many ways, such as selecting a more efficient motor,properly sizing a motor for its designed load or improving motorperformance through better and faster feedback and control.

Check the Efficiency RatingSimply replacing an existing motor with a more energy-efficientmodel justifies the slight premium of its initial cost. For a 500-horsepower motor that runs 8,000 hours a year, switching to amodel with a 5 percent higher efficiency rating could save morethan $12,000 and 170 kilowatt-hours of electricity annually. Tooffset the upfront costs, some utility companies and public agen-cies provide incentives to encourage customers to upgrade toNEMA Premium Motors, certified by the national trade organi-zation for the electrical manufacturing industry.

Size Matters; So Does LoadIn general, motor efficiencies also vary based on the amount ofload on the motor, and can range from 85 percent to 97 percent atfull load, with a peak on average at 70 percent to 80 percent load.Counterintuitively, slightly oversizing a motor (up to 25 percent)can actually increase efficiency. Also, a seemingly minor increase ina motor’s full-load rotational speed of 40 RPM can result in a 3 to6 percent increase in the load placed upon the motor, increasingenergy consumption by 7 percent and offsetting the energy anddollar savings expected from a NEMA Premium Motor.

Field-Oriented Control Squeezes More SavingsThe load to the motor is never constant; therefore, feedback fromthe motor will help to track incremental load changes. While the

Motor Efficiency Leads to Greener Bottom LineBy Wil Florentino Product Marketing Manager, ISM Vertical Market, Xilinx

traditional, scalar control techniques for variable-speed opera-tion of three-phase electric motors offer simple implementa-tion, they limit the performance. Field-oriented control(FOC), also called vector control, is a better method, since itprovides for faster feedback and tighter control of the torqueby controlling the current with every PWM cycle. In this way,FOC ensures that current is inherently limited. An additionalbenefit is the ability to use a smaller motor without sacrificingtorque or speed, while running at higher efficiencies, provid-ing better dynamic response.

Apart from enabling the use of a smaller motor, FOC canlower a motor’s energy consumption while providing betterefficiency. For example, FOC implemented within an ACinduction motor can improve motor efficiencies up to 25 per-cent beyond what’s attainable in a non-field-oriented approach.The U.S. Department of Energy estimated that implementingenergy-saving techniques, such as FOC, can shave operatingmargins by 1 to 5 percent, significant when compared with thetypical plant operating margins of 16 percent.

Science and ArtThere is a definite science and a little bit of art involved indesigning-in the most efficient motor and drive system for yourequipment. Choices can range from selecting the right sizemotor for the expected load to pushing the performance limitsusing a more-complex motor-control algorithm such as FOC.

The benefits of increasing the efficiency of motor-controlledsystems start with lower manufacturing costs. Indeed, no mat-ter whether you are moving fluids, using a drill or controlling arobot arm, motor selection and control techniques will affectyour bottom line. Also, the environmental impact of reducingenergy consumption provides for more efficient use of the lim-ited natural resources available to us today.

Sinusoidal control, also known as volt-age-over-frequency commutation, addressesmany of these issues. A sinusoidal controllerdrives the three motor windings with cur-rents that vary smoothly. This eliminatestorque ripple issues and offers smooth rota-tion. The fundamental weakness of sinu-soidal commutation is that it attempts tocontrol time-varying motor currents using abasic proportional-integral (PI) controlalgorithm, and doesn’t account for interac-tions between the phases. As a result, per-formance suffers at high speeds.

Field-oriented control, also known asvector control, improves upon sinusoidalcontrol by providing high efficiency atfaster motor speeds. It delivers the highesttorque per watt of power input comparedwith the other control techniques, andallows precise and responsive speed controlwhen the load changes. FOC also guaran-

tees optimized efficiency, even during tran-sient operation, by perfectly maintainingthe stator and rotor fluxes.

A Deeper Look at FOC One way to understand how FOC works is toform a mental image of the coordinate refer-ence transformation process. If you picture anAC motor operation from the perspective ofthe stator, you see a sinusoidal input currentapplied to the stator. This time-variant signalgenerates a rotating magnetic flux. The speedof the rotor is a function of the rotating fluxvector. From a stationary perspective, the sta-tor currents and the rotating flux vector looklike AC quantities.

Now, imagine being inside the motor andrunning alongside the spinning rotor at thesame speed as the rotating flux vector that thestator currents generate. Observing the motorfrom this perspective during steady-state con-

ditions, the stator currents look like constantvalues and the rotating flux vector is station-ary. Ultimately, you want to control the sta-tor currents to obtain the desired rotorcurrents. With coordinate reference transfor-mation, you can control stator currents suchas DC values using simple PI-control loops.

Under the hood, the FOC algorithmworks by removing time and speeddependencies and enabling the direct andindependent control of both magnetic fluxand torque. It accomplishes this by math-ematically transforming the electrical stateof the motor into a two-coordinate time-invariant rotating frame using mathemati-cal formulas known as the Clarke and Parktransformations.

An efficient method to control thepower electronics, called space-vector pulse-width modulation (PWM), maximizes theusage of the motor supply voltage and min-



Encoder Hall Effect Sensors Analog Sensors

Velocity Measurement Digital Input SPI ADC Interface

Lowpass Filters

Com

mun

icat

ion

Inte

rfac

e

Par

amet

er R

egis

ters

Field-Oriented Controller

LCD Controller

FPGA

16x2 Character LCD

PWMIGBT Pair

PWMIGBT Pair

PWMIGBT Pair S

afet

y In

terlo

cks

DIO

DIO

DIO

DIO

DIO

DIO

3-P

hase

IGB

TM

otor

Brid

ge

LabVIEW FPGA IP Modules

External Interfaces

Figure 4 – System diagram for FOC implementation using a Xilinx FPGA-based RIO hardware platform

Field-oriented control, also known as vector control, improves upon sinusoidal controlby providing high efficiency at faster motor speeds. It allows precise and responsive

speed control when the load changes and also guarantees optimized efficiency.


imizes harmonic losses. Harmonics can sig-nificantly erode motor efficiency by induc-ing energy-sucking eddy currents in theiron core of the motor.

Best of all, designers can utilize field-oriented control for both AC inductionand brushless DC machines to improveefficiency and performance. They can alsoapply FOC to existing motors by upgrad-ing the control system. In fact, they canemploy vector-control techniques likeFOC with AC induction motors to enableservo-motor-like performance.

FPGAs Tackle FOC ChallengeIt takes powerful computation devices toimplement FOC, and this requirementmakes FPGAs a natural fit for motor con-trol. An FOC system must continuouslyrecompute the vector-control algorithm ata rate of 10 to 100 kHz. In parallel to thecontrol algorithm, additional intellectual-property (IP) blocks such as the high-speedPWM outputs need to execute withoutaffecting the timing of the control algo-rithm. With their inherent parallel execu-tion and hardware reliability, FPGAs areable to perform control algorithms withloop rates up to hundreds of kilohertz, withroom left over to handle communication

and provide the data for user-interfaceapplications on the host microprocessor.Moreover, the reconfigurability of FPGAsallows the customer to adjust the controlalgorithm whenever necessary.

Figure 4 illustrates a system for FOCimplementation using a Xilinx FPGA onthe National Instruments RIO platform.Besides the actual control algorithm, theFPGA executes IP blocks to read the threeHall-effect sensors, an encoder and threeadditional analog sensors as it generatesPWM signals that drive external electronicsto power the motor. For communication toa host processor and a simple user interface,IP blocks execute in parallel.

Figure 5 shows the LabVIEW FPGAimplementation of the FPGA-based FOCalgorithm. The Clarke transformation con-verts the three-axis coordinates shifted by120° (Ia, Ib, Ic) into orthogonal two-axisones (Ia, Ib). In a second step, the Parktransformation converts the fixed (Ia, Ib)coordinates into decoupled two-axis rotat-ing coordinates (Id and Iq), which a simplePI controller can then control. The FOCsystem uses inverse Park and Clarke trans-formation to bring them back into thefixed AC three-phase frame of the statorwindings. NI provides the complete source

IP at our IPNet Web site at ni.com/ipnet. One of our customers, electronic-sys-

tem designer and manufacturer BAESystems Avionics, used our RIO platformwith Xilinx FPGAs to squeeze 15 percentextra performance from its existing motorswhile saving weight in avionics products byreducing motor mass. Thanks to the effi-ciency and tight power control of FOC, theBAE Servo Systems Technology Group inEdinburgh, Scotland, now specifies smallermotors than previously possible.

Ultimately, machine-builder designrequirements insist on improving motoroperating efficiency to meet regulatoryrestrictions and to ensure a rapid returnon investment. When evaluating control-system upgrades, machine designers oftenunderestimate energy costs, calculatingthat they are typically orders of magni-tude higher than hardware costs over thelife cycle of the motor. We at NI focusour efforts on delivering flexible embed-ded controllers with high computing per-formance using commercial off-the-shelfhardware solutions based on FPGA tech-nologies from Xilinx. With this combina-tion, we can meet the most extremecustomer challenge, specifically the per-formance requirements of FOC.


Figure 5 – Field-oriented control algorithm for LabVIEW FPGA

by Ludovic LarzulVice President of [email protected]

Creating a hardware emulation system isno easy task. At a minimum, each gener-ation of emulation system has to accom-modate a growing number of logic gates,memory and DSP blocks to allow ASICand ASSP system-on-chip (SoC) design-ers to debug their extremely complexdevices before sending them off to thefoundry for production. Emulation sys-tems must also be easy to program, reli-able and, what’s more, affordable. Here atEVE, we’ve developed several generationsof emulation systems—leveraging thepower of Xilinx® FPGAs—to emerge as aleader in the market.

Today’s SoCs are exceedingly complexpieces of silicon. They contain one ormore processors that will execute soft-ware. The software code they run is everybit as important a part of the final systemas the silicon itself. The software and thesilicon have to act as a seamless solution;if there’s a problem, it might be the soft-ware, or it might be the silicon.

Designers can only do so much soft-ware testing on a development host. Noreasonable host development system canreflect the true parallelism of the targetSoC. You can really only test out suchissues as synchronization, data integrityand resource contention in situ, and that’sfar too late to identify problems.Simulation isn’t a viable solution; it’s sim-ply too slow to allow the execution of anyrealistic code.

As a result, engineers have been usingemulation systems for well over twodecades to verify the most advanced ICsthe semiconductor industry can build.Most of these earlier-generation emula-tion systems were powered by customICs that the emulator vendors designedthemselves. They would then pass thecost of the custom IC development on totheir customers, making the power ofemulation more cost-prohibitive forcompanies struggling with ever-tighterIC development budgets.


EVE Taps Xilinx for Multiple Generations of Emulators

EVE Taps Xilinx for Multiple Generations of Emulators Engineers can debug complex chip designs on FPGA-powered ZeBu system. Engineers can debug complex chip designs on FPGA-powered ZeBu system.

XCEL LENCE IN NEW APPL ICAT IONS

In 2001, EVE broke with tradition bybasing our innovative emulation system onXilinx FPGAs. The goal was to provide thelowest hardware-assisted verification cost ofownership in the industry, as achievedthrough a combination of high executionspeed, high capacity (which today means upto a billion gates), quick design revision,flexible and powerful debugging capabili-ties, lowest cost per gate and most cycles perdollar. In addition, we wanted to make the

the test environment that we callReconfigurable TestBench (RTB). TheRTB allows for test configuration andcontrol. The DUT will change with eachrev of the design, but the RTB neverchanges unless the test environment does.Having a single mass of FPGAs containinga mix of the RTB and DUT designs wouldhave been messy and required unnecessaryrecompilation of the RTB design, so weseparated them out.

As a result, we have one set of FPGAs forthe DUT and another set for the RTB (Figure2). The number of DUT FPGAs varies bysystem size; bigger and smaller systems areavailable for bigger and smaller designs.

The RTB FPGAs provide communica-tion and control. We can optimize themmuch more aggressively for performanceand capacity, since the design will remainconstant. While no one reconfigures thecontents of this set of FPGAs, the designallows a rich set of configurations of the testsetup under user control.

For the DUT FPGAs, the overwhelmingpriority in terms of required features wasdensity. We needed to be able to put reallybig designs in here in order to provide value.Key among the necessary hardware featureswere multipliers, which later gave way toDSP blocks. In addition, big designs needed

system easy to use for ASIC designers whomight not be familiar with FPGA design.

The result was the ZeBu (for “zero bug”)emulation system (Figure 1). We’ve nowdeveloped six generations of emulators, themost recent of which is ZeBu Server (seesidebar), and we’re still going strong.

Separation of PowersOur approach is to split the device-under-test (DUT) from an interface to



Cycle-Based

Target System

Transaction-Based

SoftwareDebuggers

ZeBuVerificationAccelerator

TXRX

ZeBu

EmbeddedTestbench

HardwareTransactions

DynamicTraces

Triggers

LogicAnalyzer

Clock Trees LogicResources

MemoryResources

ClockServer

MemoryServer

Memory

Memory

Memory

FPGAFPGA

FPGAFPGA

FPGAFPGA

FPGAFPGA

DUTRTB

Figure 1 – EVE bases its ZeBu (for “zero bug”) emulation system around Xilinx FPGAs.

Figure 2 – ZeBu uses one set of FPGAs for the device under test and another for the Reconfigurable TestBench (RTB).

big memory, so the largest possible memoryblocks were a requirement.

Close behind density on our “musthave” list were bandwidth and latency. Thedata transmission between the FPGAs hadto be fast enough to keep from becoming aserious bottleneck. We didn’t plan to useclock-data-recovery (CDR) circuits—infact, they weren’t even available when westarted out—but we did use high-speedLVDS I/Os, layering our own proprietarylow-latency protocol over them.

To provide robust debugging capabili-ties, we needed readback of internal states.We wanted a way of interrogating what wasgoing on inside so that designers couldunderstand the inner workings of theirtechnology—especially when it didn’t seemto be working. We might implement read-back using JTAG or some other means, butit had to be there.

Finally, to keep the challenge of imple-menting a single design across multiple

FPGAs tractable, we wanted to use only asingle technology on the board and acrossdifferent members of the emulator family.So the FPGA family we chose had to havea broad range of densities and pin counts.

Our considerations for the RTB FPGAswere different. Again, one of the primaryrequirements was keeping to one FPGAtechnology per system. This would ensurethat all of the FPGAs were in sync withrespect to version and availability. Ofcourse, we still needed to be able to meetour performance requirements, no easytask given that the RTB FPGAs run attwice the speed of the DUT FPGAs.

Given the sum of requirements, theVirtex®-II family was our clear choice forthe first ZeBu generation. At the time, itled in gate and memory density, providedgreater bandwidth with less latency andexcelled in its readback capability. Our firstsystem used two Virtex-II 8000s for theDUT and one Virtex-II 6000 for the RTB.

Over time we’ve continued to evaluatethe Virtex family against others as the tech-nology has advanced, but to date we havenever found another choice compellingenough to move us away from the Virtex.As such, we have progressed with Virtex-4(up to 64 Virtex-4LX220s for the DUT)and Virtex-5 (as many as 400 Virtex-5LX330s), and are eyeing the new Virtex-6family with eager interest.

Going With a FlowWe design the RTB FPGAs with a standardFPGA flow, using XST for synthesis andthe standard Xilinx ISE® tools for placeand route. The effort we make here is typi-cal of what anyone designing with FPGAsmight do to create a complex design. Thedifferentiation isn’t in the flow; it’s in thehard work and our system knowledge.

When it comes to the DUT flow, how-ever, our challenge is far greater than theone the typical FPGA designer faces. By



Hardware emulation has become a valued component of the hard-ware/software co-design flow, enabling hardware engineers and soft-ware developers to share system and design representations andwork together to debug hardware/software interactions.

Use of this tool as a popular EDA solution has evolved over thepast 20 years, from the early days of standard FPGA-based emula-tors to custom ASIC-based models, back again to emulators basedon standard FPGAs.

CPU and graphics chip engineers were the first to use emulation,because the sheer complexities of their designs crippled traditionalevent-based hardware description language (HDL) simulators. Thetool quickly gained acceptance by the wireless community due tothe extensive use of embedded software in its hardware designs.Today, consumer electronics companies widely use hardware emu-lation for the design of digital TVs, set-top boxes, digital still cam-eras and camcorders, multifunction printers and other products.

The early version of hardware emulation delivered executionspeeds four to five orders of magnitude faster than simulation, mak-ing it ideal for accelerating the time required to develop and validatethe hardware of ASIC or system-on-chip (SoC) designs. However,it did not meet the minimum speed of execution required for effi-cient testing of embedded software—namely, 1 MHz. Moreover,cost of ownership hampered emulation’s popularity and restrictedits adoption to large corporations in limited numbers.

The newer FPGA-based emulators are more reasonably pricedand cost-effective, and have shortened the overall verification cycle

Evolution of Hardware Emulation Culminates in ZeBu-Serverof complex chip and electronic-systems designs. These emula-tors have a smaller footprint than prior emulation tools, arefast, efficient and easy to use.

Setup is straightforward and newer emulators consist offewer FPGAs than older machines. The latest generation ofemulation systems can execute billions of verification cycles, asrequired in embedded designs, in a short period of time. Theyprovide a full view of the design, necessary to debug the hard-ware. These machines also support transaction-level verifica-tion, which is needed for hardware debugging at a high levelof abstraction, via monitors, checkers and assertions.

EVE is an FPGA-based emulation trendsetter. On July 14,it launched the latest incarnation of the ZeBu (for “zero bugs”)product line, called ZeBu-Server, a sixth-generation version ofits emulator based on the Xilinx Virtex LX330.

Providing design capacity of up to 1 billion ASIC-equiva-lent gates, ZeBu-Server can be used across the entire develop-ment cycle for verifying and debugging a new ASIC or SoCdesign. Designers can use it to test the integration betweenhardware and software, and to validate embedded softwarebefore silicon availability.

This new emulator improves on previous-generation offer-ings in terms of capacity, speed, setup time, integration anddebugging capabilities. And last but not the least, it is alsocost-effective.

— Ludovic Larzul


definition, we deliver an unfinished system,in the same way that Xilinx delivers anunfinished chip. Our customers have to fillin the DUT design details. So we must pro-vide them with a way to implement theirdesigns—while assuming that they may notknow or care about how to do FPGAdesign. To help facilitate this, we created theZeBu Compilation User Interface (zCUI).

A key concern is the fact that the DUTdesign, by definition, will change numeroustimes throughout the design cycle.Minimizing the design turn time has always

been a high priority. Because the first sys-tems were relatively small, we didn’t focuson the compiler performance; we simplyworked through the standard Xilinx flow.But as we moved up in density, compiletime—and, in particular, synthesis time—became more critical. A design that willoccupy 400 of Xilinx’s largest FPGAs is nottrivial to synthesize.

We were concerned that standard synthe-sis tools were spending too much time doingtoo good a job. Our priority was not optimalsynthesis—optimal meaning highest per-formance, lowest gate count. Those are goodthings, but for us, compile time was a higherpriority, and we could live with 20 percentworse performance or density—especiallysince we’re executing at less that 30 MHz.

For this reason we developed our ownzFAST synthesis tool, which could takeSystemVerilog, VHDL or Verilog designsand synthesize them as much as 10 timesmore quickly than standard synthesisproducts. We now make zFAST availablealongside the more traditional synthesistools so that our customers can invoke itfor large designs as needed. Once thedesign is synthesized, our own partition-ing tool partitions it at the gate level.

It should be noted that some of ourusers are accomplished FPGA designers,

and we willingly hand over greater con-trol if they want it. The customer is freeto use the standard FPGA flow to try toget more performance out of the DUT,constraining and using any tricks theyknow as necessary. So the system makes iteasy for a non-FPGA designer, but doesnot restrain an expert.

The continuing performance improve-ments of the ISE tools from Xilinx havealso helped our overall compile times. EVEwas one of the first users of ISE 11, allow-ing us to keep our FPGA compile timestypically under two hours per FPGA.Parallel compilation means that we canturn large designs with multiple FPGAs ina reasonable amount of time. Our overallflow is summarized in Figure 3.

Trust, but Verify Verification is a bit tough for us, specifical-ly because we don’t ship a finished design.We have to be confident that, once our cus-tomer compiles a design onto the ZeBuplatform, everything will work as promised.

While the verification of our first set ofRTB FPGAs was a challenge, since thenwe’ve been able to use our own existingZeBu systems to verify the RTB for thenext-generation ZeBu system. We run thenew RTB for hours and days, executing bil-lions of cycles, to confirm that it’s solid.

For the DUT FPGAs, in conjunctionwith our own tools we have a large vault ofdesigns accumulated over years of experi-ence. Each night, we run thousands ofdesigns and test the results using applica-tion-specific test bench suites associatedwith each individual design, as well asthrough cycle-by-cycle simulation compar-isons that include internal state as well asinputs and outputs.

We also check to make sure that ourresults remain deterministic as our toolsand those from Xilinx evolve. Wheneverthere’s a revision change, we run throughthe suite of tests to ensure that the resultswe get with the new version are the same asthose we got on the older version.

Looking back, then, it is clear that EVE’stechnology has been intricately bound upwith Xilinx’s technology. Starting with theoriginal choice of Virtex-II, we have dove-tailed the growing capacity of the Virtexchips and the evolution of the Xilinx toolswith our own system design and tools toprovide a solution that continues to gaintraction in the emulation market. Given theXilinx and EVE road maps, that symbiosisis likely to continue unabated.

Ludovic Larzul, a founder of EVE and its vice president of engineering, has 13 years ofexperience in CAD development. He hasworked on numerous projects, including theintegration between hardware emulators andsoftware simulators, efficient interfaces based on transactors, communication among multipleemulation systems and system-level place androute. Larzul holds a Diplome d’Ingenieur del’Institut de Recherche et d’EnseignementSuperieur aux Techniques de l’Electronique(Ecole Polytechnique de Nantes).


XST(RTB)

Eve-Compiled User-Compiled(zCUI or standard)

XST, Synplify(Small/Med DUT)

Partition

zFast(Large DUT)

ISE ISE

DUTRTB

Bitstream Bitstream

Figure 3 – The ZeBu design flow makes use of Xilinx’s ISE tools.

by Juergen Wassner Lecturer Lucerne University of Applied Sciences and Arts, School of Engineering and [email protected]

Christoph EckLecturerLucerne University of Applied Sciences and Arts, School of Engineering and [email protected]

Model-based design (MBD) has recentlyattracted much attention for its promise ofclosing the gap between abstract mathe-matical modeling and physical realizationof real-time systems. By using the samesource for algorithm analysis, architectureexploration, behavioral simulation andhardware/software design, MBD promisesto shorten the system design cycle.

Several tools for model-based designhave emerged over the past years, many ofwhich center on the MATLAB® andSimulink® environments from TheMathWorks. An elegant way to leverage theprecious stimuli-generation and data-analysis features of MATLAB/Simulink forMBD is by employing the Xilinx SystemGenerator for DSP tool. Despite its name,this tool can be useful for disciplines otherthan classical digital signal processing.Control applications are of particular inter-est, since MATLAB/Simulink is the stan-dard design and simulation environmentfor the control-engineering community.

Xilinx System Generator for DSPenables control engineers to design theirsystem in the familiar Simulink environ-ment and then implement it in an FPGA,without knowledge of a hardware descrip-tion language (HDL). To do so, you must

relate the parameter values in the mathe-matical model of the controlled system(often called the plant)—for example, acontinuous- or discrete-time transferfunction or state-space description—tothe sample rate of the digital controllerand the FPGA system clock frequency.

Digital Controllers in FPGAAs with any signal-processing algorithm,digital controllers can be implemented ineither software or hardware. A combinationof the two—that is, a hardware-acceleratedsoftware implementation—is rarely neces-sary, due to the moderate sample-raterequirements and algorithm complexity ofmost control applications. When choosingFPGAs as implementation technology fordigital controllers, designers can utilize fea-tures not available with MCU-based soft-ware solutions without compromising on


Understanding Timing Parameters in Xilinx System GeneratorUnderstanding Timing Parameters in Xilinx System GeneratorWhen it comes to model-based design of feedback control systems,getting a grip on timing parameters is crucial.When it comes to model-based design of feedback control systems,getting a grip on timing parameters is crucial.

XPERTS CORNER

field-upgradability. The non-interrupt-driv-en, parallel data processing in FPGAs guar-antees absolute timing determinism evenwhen multiple processes with different sam-ple rates need to be controlled. This,together with configurable word widths,makes FPGAs an attractive choice for theimplementation of digital controllers.

Traditionally, for FPGA implementa-tion, the controller designer would use alow-level HDL after first validating thecontrol strategy and parameters with high-level simulations of controller and plantmodels, using for instance Simulink. AnHDL testbench will verify correspondencebetween the HDL controller design andSimulink simulation. And in order to veri-fy the controller design in a closed-loopsystem, this testbench must contain amodel of the plant. Accomplishing all thisis no easy task for designers, including mostcontrol engineers, who lack a sound back-ground in HDL and FPGA technology. Inthis situation, a high-level modeling anddesign environment such as Xilinx SystemGenerator is the ideal solution.

PID Controller in System GeneratorSince many controllers are still based onthe classical proportional–integral–deriv-

Figure 3 shows the overall systemmodel, which contains, besides the con-troller, the plant and the simulation test-bench built from standard Simulinkblocks. Here designers can model the plantby either a continuous- or discrete-timetransfer function, whereas in an HDL test-bench only discrete-time functions wouldbe appropriate. Note that in the System

ative, or PID, structure, we have chosen aPID controller for demonstrating ourpoints. Nevertheless, the approach weoutline can handle other commonly usedcontrol components such as lead-lagcompensators, state-space observers oradaptive controllers equally well. Figure 1shows the PID controller we designedwith blocks from the Xilinx block set.

Instead of using the Xilinx Accumulatorblock, we accomplished integration withthe elementary building blocks Adderand Register. Doing so enables us toinsert anti-windup logic, also shown inFigure 1, to freeze the contents of theaccumulator register whenever the inte-gral part of the controller output reachesthe saturation limits that the actuatorimposes. The anti-windup logic makesthe PID controller a nonlinear systemand has a positive effect on the overallsystem dynamics.

The block parameter menu shown inFigure 2 can configure the controlparameters and word widths of varioussignals. Also, designers can enable or dis-able the anti-windup function here. Thismenu provides for convenient experi-mentation without the need to modifylow-level HDL code.


XPERTS CORNER

Figure 1 – PID controller with anti-windup function built from System Generator blocks

Figure 2 – Customized parameter menufor the PID controller

Generator approach, you can performanything from system modeling throughsimulation and verification, up to imple-mentation, using one and the same high-level model.

Understanding Timing ParametersNow let’s look at the various timing meas-ures you will come across during thisprocess. In what follows, we deliberatelymake use of the same terminology as theXilinx System Generator documentation,even if this terminology does not alwaysseem to be intuitive.

There are several types of timingparameters. Those that refer to absoluteunits such as megahertz or nanosecondsare denoted by upper-case letters. Allother parameters are denoted by lower-case letters. Furthermore, the timingmeasures can be divided into control andanalysis parameters. All these parametersare summarized in Table 1.

Control ParametersThe first of the control parameters is thesimulation time unit TSim. You must notenter this unit explicitly anywhere in yourdesign. It represents your implicit assump-tion on the fundamental unit of time inSimulink simulation. As such, it affectssimulation solely. In the Simulink as wellas System Generator context, the simula-

tion time unit is often assumed to be onesecond. For instance, the display of theSystem Generator WaveScope block usesthis convention. But as we will see below,TSim can be any other time unit that suitsyour needs.

You must set the next parameter, theFPGA clock period TCLK, in the SystemGenerator token in units of nanoseconds. It

represents the period of the main systemclock input to the FPGA from which all otherclock and clock enables are derived. Hence,its setting affects only hardware implementa-tion. For instance, for the popular Spartan®-3E Starter Kit from Xilinx, the FPGA clockperiod would be 20 ns (50 MHz).

The Simulink system period psys, mean-while, represents the global link between


XPERTS CORNER

Name Symbol Unit Effect / Scope Location

Simulation time unit TSim s, ms, _s, ns Simulation only, Implicitly assumedglobal

FPGA clock period TCLK ns HW only, Enter in global SysGen token

Simulink system period psys [TCLK /Tsim] Simulation and Enter inHW, global SysGen token

Sample period psam [Tsim] Simulation and Enter inHW, local Gateway-In block

Sample time tsam [TCLK] None, Displayed byfor analysis only ST block

Sample frequency Fsam MHz None, Optionally displayedfor analysis only for each block port

Contr

olAn

alysis

Figure 3 – Overall system model (top) and plant output in response to command input without control (a), and with PID control and anti-windup disabled (b) and enabled (c).

Table 1 – Timing parameters for System Generator designs


Simulink simulation and hardware imple-mentation. Designers must set this param-eter, which influences both Simulinksimulation and hardware implementation,in the System Generator token. Duringsimulation, this value determines howoften the System Generator blocks of yourmodel are called, but not necessarily updat-ed, relative to the simulation time unit. Forhardware implementation, it specifies theamount of overclocking with respect to thesample rate of the controller. Unlike theSystem Generator documentation, wedefine the Simulink system period as aunitless quantity—that is, the ratio of theFPGA clock period and the assumed simu-lation time unit:

This makes it possible to assume arbitrarysimulation time units, as mentioned above.

The sample period psam of a particularsignal within the System Generator por-tion of a design is either set explicitly (forexample, in the Gateway-In block) or isderived from rate-changing blocks such asUp Sample or Down Sample. When set-

ting it explicitly, you would enter it as anumerical value in units of the assumedsimulation time unit. Its setting has animpact on both Simulink simulation andhardware implementation. During simu-lation, this value determines how manycalls to a block must occur before thisblock actually can change states.Similarly, in the implementation, thisvalue represents the number of clockcycles after which the block logic will beenabled. And since all clock enable signalsin a System Generator design are derivedfrom the main FPGA clock input, eachenable period must be an integer multipleof the FPGA clock period.

Analysis ParametersIn the second category of timing parameters,namely, analysis parameters, the first one toconsider is the sample time (ST) block. Ituses no hardware resources in the imple-mentation of the system, but solely servesanalysis purposes in the Simulink model.The value tsam that the ST block displays isthe clock enable period in units of FPGAclock periods used for the associated signalin the hardware implementation.

When designers select the option forthe next analysis parameter, sample fre-quency, in the Icon Display property boxwithin the System Generator token, eachXilinx block in the model displays thesample frequency Fsam in megahertz,used in the implementation of this block.This sample frequency is related to theother timing parameters as follows:

where TCLKenb is the period of the associatedclock enable in the implementation. Figure 4exemplifies these relations.

From the second equation above itbecomes clear that each sample period psam

must be an integer multiple of theSimulink system period psys, since onlythose clock enable signals can be derivedfrom the FPGA system clock. The thirdequation shows that the value the STblocks display is the clock enable period inunits of FPGA clock periods.

XPERTS CORNER

Figure 4 – Relation of the six timing parameters exemplified for psys = 1/4 and three different sample rates

In the System Generator approach, you can perform system modeling through simulation and verification, up to implementation, using one and the same high-level model.

Step-by-Step Guide to Choosing Timing ParametersThe example control system developed aboveoffers insights into how to select timingparameters. We have broken the processdown into five steps.

1. Identify the plant.

Model the plant by an appropriate transferfunction. In our example, we modeled theplant as a PT2 element with a gain factorK = 2, a time constant T = 20 ms and anattenuation factor d = 0.2. Hence, theplant is an oscillatory element, as can beseen in Figure 3(a).

2. Choose the simulation time unit.

At this point you can choose the funda-mental simulation time unit Tsim, suchthat the plant transfer function has con-venient numerical parameters. In ourworking example we chose Tsim = 10 ms.With the above parameters this yields theplant transfer function

3. Set the Simulink system period.

Given the simulation time unit, we arenext going to set the Simulink system peri-od psys depending on the FPGA clock peri-od TCLK of the hardware platform at hand.In the case of the Spartan-3E Starter Kitwith 50-MHz system clock, we have TCLK= 20 ns and

4. Determine the sample frequency.

A rule of thumb is that the sample rate ofthe digital controller must be at least 20times larger than the cutoff frequency of theplant. Our example plant has a cutoff fre-quency of about 30 Hz and we thus optedfor a sample frequency of Fsam = 1 kHz.

5. Set the sample period.

Finally, we set the sample periodparameter psam in the Gateway-Inblock in front of the controller. In theworking example we have

With these settings, we can simulatethe model, tune the controller parame-ters and synthesize the controller logic.However, sometimes the FPGA clockperiod TCLK is much smaller than thefundamental time unit Tsim, for instancebecause your controller is part of a larg-er design that requires a much higherclock frequency than the controlleritself. The simulation run-time can thenbecome unbearably long due to the highnumber of void clock cycles to be simu-lated before the controller block actuallyprocesses the next data sample. In thissituation you can use different settingsfor psys in simulation and implementa-tion without losing the consistency ofyour model. This is possible because thevalue of psys affects only the SystemGenerator part of your model.

More specifically, you can set psys =psam during simulation of your controlsystem. This ensures that the SystemGenerator blocks are called only whennecessary—namely, when the blocksactually can change states. Before yougenerate the FPGA implementation,you just need to switch back to the orig-inal value of psys.

Model-based design of closed-loopcontrol systems requires that absolutetiming measures of the plant transferfunction be consistently related to thetiming parameters of your design envi-ronment. We have provided a systemat-ic approach to this problem using theXilinx System Generator for DSP tool.

As a next step, we are going to investi-gate the hardware co-simulation featuresof Xilinx System Generator in order toperform real-time analysis of closed-loopsystems with FPGA-based controllers.


XPERTS CORNER

by Carol A. FieldsSenior Staff Product Line ManagerXilinx, [email protected]

At a very high level, gigabit transceivers(GTs) are I/O superhighways that pumpdata from one chip to another at extremelyfast rates. The right GT can unclog bottle-necks and speed up a system, an importantdesign consideration in communicationsand real-time processing in particular. Anynumber of applications are looking toleverage GTs, but a given market segmentmay have many standards or protocols anduse models. Sometimes there may be sever-al standards targeting one application, forc-ing the designer, to figure out which onebest fits the function you want your systemto perform. To select the best gigabit trans-ceiver, therefore, you need to be sure you’reup to date up on the latest protocols.

Industry-standard connectivity proto-cols exist in a wide range of market seg-ments, from wireless communications toconsumer electronics. Many of thesemodern protocols are based on the OpenSystem Interconnection model, in whichinteroperability between network devicesand software is broken down into layers.In the FPGA world, intellectual property(IP) in libraries such as the XilinxLogiCORE™ and AllianceCORE oftenuses higher-level serial connectivity pro-tocols such as PCI Express® in additionto lower-level physical-layer (PHY) pro-tocols like 1000BASE-X.

Right FPGA Gigabit TransceiverMakes All the DifferenceRight FPGA Gigabit TransceiverMakes All the DifferenceKnowledge of the PHY sublayers will help you customize physical layers using Xilinx’s High-speed Serial Transceiver Architecture Wizard. Knowledge of the PHY sublayers will help you customize physical layers using Xilinx’s High-speed Serial Transceiver Architecture Wizard.

XP LANAT ION:FPGA 101


However, determining the correct PHYprotocol template for your given designproject is not always as straightforward asselecting a higher-level protocol. As withmany industries, consolidation and designreuse can create a complex maze that youmust navigate. Understanding the “higher”layer protocol and its relationship to the“lower” layer protocol specifications—andbeing mindful of how each industry definesthe PHY—will help you choose the bestXilinx LogiCORE IP High-Speed SerialTransceiver Architecture Wizard protocoltemplate to achieve your design goals(www.xilinx.com/products/design_resources/conn_central/solution_kits/wizards/index.htm).

Let’s review these protocols and thentake a look at the best methods to help youselect the right one for your design.

OSI: A Template for Connectivity ProtocolsThe Open System Interconnection, orOSI, is an ISO standard for worldwidecommunications (http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=20269) that defines a frame-work for implementing protocols in sevenlayers. Control passes from one layer tothe next, starting at the application layerin one station and proceeding to the bot-tom, physical layer, over the channel tothe next station and back up the hierarchy.

establish, maintain and release physicalconnections between data link entities.

Three PHY Sublayers Many of today’s popular serial connectivityprotocols mimic the OSI layers model anduse similar terminology, for example denot-ing lower to upper layers as Nos. 1 through 7in a hierarchy ranging from physical to appli-cation layers. For the purpose of this article,when referring to a serial connectivity proto-col the term “higher” refers to Layers 2-4 (orthose above the physical layer).

The PHY layer (Layer 1) comprisestwo to three sublayers: the physical cod-ing sublayer (PCS), the physical-medium

attachment (PMA) sublayer and a third,optional, sublayer called physical-medi-um dependent (PMD). When referring tothe PHY layer, we use “higher” in refer-ence to PCS and “lower” for the PMAand PMD layers. Figure 1 depicts the lay-ers in a block diagram.

When transmitting, the packets or datatravel in forward order: media-access con-trol (MAC) layer to PCS, PMA and PMD;they go in reverse order when receiving.How Xilinx implements the PHY sublayersin the high-speed serial transceiver willdepend upon the protocol,

The physical coding sublayer interfacesto the higher-level Layer 2 data link (or

From highest to lowest, the OSI ladderconsists of the application, presentation,session, transport, network, data link andphysical layers.

Protocols of the application layerdirectly serve the end user by distributinginformation services appropriate to anapplication, to its management and tosystem management. The next-highestrung in the OSI model, the presentationlayer, provides a set of services that theapplication layer may select to enable itto interpret the meaning of the dataexchanged. These services are for themanagement of the entry exchange, dis-play and control of structured data.

The session layer assists in supportingthe interactions between cooperating pres-entation entities, while the transport layerprovides a universal transport service inassociation with the underlying servicesthat the lower layers supply. The networklayer, for its part, provides functional andprocedural means to exchange networkservice data units between two transportentities over a network connection.

Finally, the data link layer provides thefunctionality and procedural means toestablish, maintain and release data linksbetween network entities, while the physi-cal layer offers mechanical, electrical, func-tional and procedural characteristics to

RX Serial Clock PMA Parallel Clock(XCLK)

PCS ParallelClock

(RXUSRCLK)

RX InterfaceParallel Clock(RXUSRCLK2)

RXEQ

RXCDR

PMAPLL

Divider

From PMA PLL

RX-PMA RX-PCS

SIPO Over-Sampling Polarity

PRBSCheck

CommaDetect

&Align

Loss of Sync

RX Status Control

RX Pipe Control

10B/

aB

RX ElasticBuffer

FPGALogic

Figure 1 – Block diagram of example Virtex-5 RX physical sublayers PCS, PMA and PMD





MAC) layer. It typically implements 8b/10bencoding/decoding, comma alignment,channel bonding and clock correction.

Higher-layer protocol specifications maydefine the PCS as part of the PHY or referto an industry-standard PCS. For example,the second-generation Serial RapidIO spec-ification defines the PCS but refers designersto use the CEI-6G-SR/LR specification forthe PMA. You can implement the PCS inthe high-speed serial transceiver, in the bodyof the FPGA or in a combination of the two.

The physical-media attachment sublay-er, meanwhile, is often referred to as the“electrical specification.” The PMA imple-ments the appropriate signal integrity ofthe protocol as well as a number of othertypical features. They include current-mode logic (CML) drivers/buffers withconfigurable termination, voltage swingand coupling; programmable transmit pre-emphasis and receive equalization for opti-

mal signal integrity; and line ratesdepending on device and transceiver type(for example, Spartan®-6 GTP, Virtex®-6GTX) with optional oversampling.

Additional functions the PMA handlesinclude fixed latency modes for mini-mized, deterministic datapath latency;out-of-band signaling support (specifical-ly designed to address the requirements ofPCI Express and Serial ATA protocols;and built-in pseudo-random bitstream

generation/checking logic for easier bit-error-rate checking.

Finally, the physical-medium depend-ent sublayer is an optional part of the PHYlayer specifications generally used withEthernet protocols. The PMD is responsi-ble for the transmission and reception ofindividual bits on a physical medium. Itsjobs encompass bit timing, signal encodingand interacting with the physical mediumand the cable or the wire.

In implementing a 1000BASE-XPCS/PMA LogiCORE, for example, the1-Gigabit Ethernet MAC connects to theLogiCORE, which in turn connects to a1000BASE-X PMD optical transceiver.The Xilinx Trimode Ethernet MAC(TEMAC) LogiCORE implements thePCS and PMA functions—the latter in thehigh-speed serial transceiver, sometimescalled a gigabit transceiver (GT). In thePCS layer, the high-speed serial transceiver

implements the encoding and decoding,while the FPGA logic handles autonegoti-ation. The PMD sublayer contains a trans-ceiver for the physical medium.

Confusion Over PHY Usage It’s easy to get confused over the usage ofPHY as a silicon chip. The PHY is a spec-ification layer consisting of sublayers. Youcan implement the PHY, which designersoften refer to as the electrical specification,

in a single or multiple devices. The sublay-er usage depends on the market segmentand protocol. In the case of the popularserial protocol PCI Express, the PHY con-sist of PCS and PMA sublayers.

The PHY layers in communicationprotocols commonly use the PCS, PMAand PMD sublayers. Figure 2 is an exam-ple of the use of the Xilinx TEMAC(10M/100M/1G) LogiCORE in a local-area network application where the 1-Gbit Ethernet MAC communicates to a1000BASE-X PCS/PMA and to a laseroptical-transceiver 1000BASE-X PMD.Here, the PHY is implemented in boththe FPGA and in an optional optical-transceiver device.

Because PHY layers can be confusing, itis best to read the specification so that youcan understand exactly what function thelayers will perform (that is, PCS, PMA,PMD) and where the function is imple-

mented in the silicon (FPGA logic body,high-speed serial transceiver or outside ofthe FPGA). In most cases the PMA isassumed to be part of the PMD.

Of course, there are many connectivityprotocols (for example, PCI) that are notserial (that is, single-ended or differential-signal I/O) and do not use a high-speedserial transceiver. In some cases you mayimplement the serial PHY using theSelectIO™ serializer/deserializer (serdes).

TCP IP FIFOI/F

EthernetMAC

PCS PMA PMD

Physical Sublayers

GMII/MIIRGMII

SGMII (Rocket I/O)

1000 BASE-X(Rocket I/O)

Figure 2 – Example of PHY PCS, PMA and PMD layers in an Ethernet communications application

You can implement the physical layer, which designers often refer to as the electrical specification, in a single or multiple devices.

The PHY sublayer usage will depend on the market segment and protocol.

Hardened or Embedded IP ConsiderationsOften, Xilinx will put popular ubiquitousprotocols like PCI Express and GigabitEthernet right into the FPGA. These“hardened” versions may implement all orpart of the protocol. In these two cases, theLogiCORE wrapper implements the MACand physical layer (PCS and PMA) as partof the LogiCORE product. The wrapperincludes the hardened block and connectsit with the high-speed serial transceiver. Inthe case of the TEMAC, the hardened IP

implements the MAC and portions of thePCS, along with the transaction and datalink layers of the PCI Express LogiCORE.You can use Xilinx’s High-speed SerialTransceiver Wizard to view and modify theGTP/GTX settings. (For details, refer to thedocumentation for the Tri-mode EthernetWrapper for Integrated Block at www.xil-i n x . c o m / p r o d u c t s / d e s i g n _ re s o u r c e s /conn_central/protocols/gigabit_ethernet.htm;and for the Endpoint BlockPlus for PCIExpress LogiCORE at www.xilinx.com/products/design_resources/conn_central/solution_kits/pci/index.htm.)

Wizard Protocol ListTables 1, 2 and 3 provide lists of the serialprotocols the High-speed Serial TransceiverWizards support. Table 1 contains a list ofhigher-layer protocols whose specificationincludes the PCS and PMA, while Table 2consists of higher-layer protocol specificationsthat define the usage of an industry-standardPCS and/or PMA specification. Table 3 lists

physical-layer specifications and the higher-level protocols commonly used with them.

To best help you customize your PHYusing the GT architecture wizard, let’stake a more-detailed look at the PHY lay-ers of the top serial protocols and theircorresponding wizard protocol templates.The most commonly used protocolsinclude 10G Ethernet MAC—XAUI,CPRI v4.0, 3G and 6G OBSAI RP3-01,first- and second-generation PCI Express,Serial RapidIO, HD-SDI and XilinxTEMAC (10M/100M/1G Ethernet).

Before using the GT wizard, I advisenew users to become familiar with theVirtex-6 or Spartan-6 GT datasheet, eachof which contains lists of supported proto-cols. Also, while using the wizard is fairlystraightforward, it isn’t a bad idea to reviewthe “Wizard Getting Started Guides” forboth of those devices, which contain a lotof useful information (see www.xilinx.com/products/design_resources/conn_central/solution_kits/wizards/3).

Aurora 8b/10b, Aurora 64b/66b

DisplayPort

GPON

Interlaken

OC-48

PCI Express Gen1/Gen2

Serial RapidIO Gen1

HD-SDI

Higher protocol PCS/PMA protocol

10G Ethernet (XFI/SFI) 10GBASE-X, XAUI

CPRI 1000BASE-CX, XAUI, LVXAUI

OBSAI CEI-6G-SR/LR and XAUI

Lower PCS/PMA protocol Higher protocol

1000BASE-X Xilinx TrimodeEthernet MAC(TEMAC)

CEI-6G-SR/LR OBSAI RF03-01 LVXAUI

SGMII Xilinx TEMAC

XAUI 10GE

Table 1 – Higher-layer protocol specificationdefining PCS/PMA

Table 2 – Higher-layer protocol defining the useof PCS and PMA specifications

Table 3 – Lower-layer protocol defining both the PCS (or portion of PCS) and electrical

PMA (or portion of PMA)Figure 3 – Virtex-6 GTP Wizard walks users through the high-speed I/O setup.



Table 4 – Protocol templates for Virtex-6 and Spartan-6 gigabit transceiver wizards



GT Wizard Virtex-6 GTX Spartan-6 GTP Description Protocol SpecificationProtocol Template

3G OBSAI � � Open Base Station Architecture Initiative. Open Base Station Architecture Initiative.OBSAI RF03 is a subset of RF3-01. www.obsai.org; Optical Internetworking Forum (OIF),www.xilinx.com/esp/wireless.htm www.oiforum.com; XAUI(IEEE), www.ieee.org

6G OBSAI � Open Base Station Architecture Initiative. Open Base Station Architecture Initiative,OBSAI RF03 is a subset of RF3-01 www.obsai.org; Optical Internetworking Forum (OIF),www.xilinx.com/esp/wireless.htm www.oiforum.com

Aurora 8b/10b � Due soon Lightweight link layer Xilinx, www.xilinx.com/aurora

Aurora 64b/66b � Lightweight link layer Xilinx, www.xilinx.com/aurora

CPRI v4.0 � � www.xilinx.com/esp/wireless.htm Common Public Radio Interface, www.cpri.info

DisplayPort Due soon � Display protocol, www.xilinx.com/ Video Electronic Standards Assoication (VESA),products/design_resources/conn_central/ www.vesa.org, www.displayport.orggrouping/displayport.htm

Fibre Channel � PHY is similar to XAUI’s. www.xilinx.com/ Fibre Channel Industry Association (FCIA),products/design_resources/conn_central/ www.fibrechannel.orggrouping/fibre_channel.htm

GigE (SGMII/1000Base-X) � � Support by Xilinx TEMAC. www.xilinx.com/ IEEE, www.ieee.orgethernet, www.xilinx.com/esp/wired.htm

GPON � Fiber to the home. Gigabit passive optical International Telecommunication Union, www.itu.intnetwork. www.xilinx.com/esp/wired.htm

HD-SDI � � High definition. One rate, 1.485G, used in Society of Motion Picture and Television Engineersbroadcast. www.xilinx.com/esp/broadcast.htm (SMPTE), www.smpte.org/home

Interlaken � Flexible chip-to-chip packet transfers. Works with www.interlakenalliance.commany MACs: OC768 Sonet, 100GE. www.xilinx.com/esp/wired.htm

OTN OTU2 (XFI) Due soon www.xilinx.com/esp/wired.htm XFI electrical interface specification is a part of XFP Multi Source Agreement Specification.www.xfpmsa.com/cgi-bin/msa.cgi

PCI Express Gen1 � � First-generation PCI Express. PCI-SIG, www.pcisig.orgwww.xilinx.com/pciexpress

PCI Express Gen2 � Second-generation PCI Express. PCI-SIG, www.pcisig.orgwww.xilinx.com/pciexpress

Serial RapidIO Gen1 � � First-generation Serial RapidIO. RapidIO, www.rapidio.orgwww.xilinx.com/esp/wireless.htm

TEMAC using SGMII � � Support by Xilinx TEMAC. IEEE, www.ieee.orgwww.xilinx.com/ethernet, www.xilinx.com/esp/wired.htm

XAUI � TBD 10G Ethernet Extended Attachment Unit Interface 10GE & XAUI: IEEE, www.ieee.org(XAUI). Connects to 10GE MAC. www.xilinx.com/ethernet, www.xilinx.com/esp/wired.htm

Figure 3 shows the Virtex-6 GTX wiz-ard GUI, with a pull-down menu for theprotocol templates. Table 4, meanwhile,contains a list of protocol templates thatthe Virtex-6 GTX and Spartan-6 GTP wiz-ard LogiCOREs support.

At the request of customers, Xilinx hasimproved the use model between theLogiCOREs and the GT wizards. ManyXilinx serial LogiCOREs—including thosefor Serial RapidIO, XAUI and Aurora—now use the output of the wizards directly.Customers can select a protocol templatelike Serial RapidIO and make custommodifications. System I/O specialists nolonger need to work with the LogiCOREengineering team to modify the serdes set-ting to create custom protocols.

10G Ethernet - XAUIThe 10-Gigabit Ethernet (also known as10GbE, 10GigE or 10GE) standard is anIEEE specification that defines a nominalrate 10 times as fast as Gigabit Ethernet.The physical layer consists of an interfacefrom the MAC to the PHY, PCS, PMAand PMD. In the case of the XilinxLogiCORE, the 10-Gbit Media-Independent Interface (XGMII) can con-nect to an optical module or to the 10GEthernet Extended Attachment UnitInterface, better known as XAUI. ThePMA and PMD are either an externaldevice, as in the example of an opticaltransceiver, or considered part of theXAUI, as in the case of a chip-to-chip orbackplane application.

The Virtex-6 LXT, SXT and HXTFPGAs do not support a one-lane, 10Gdata transfer rate to interface to a single-lane optical-fiber PMD. XAUI’s four lanesare used to interface to the optical PHYmodule, which is then able to send/receivethe 10GE packets to their destinationthrough a single fiber cable.

Designers supporting applications inmany markets, including networking andtelecommunications, commonly useXAUI to connect board-to-board overbackplanes, or to connect to 10-Gbitoptical modules. Used in a variety ofhigher-level protocols, XAUI can alsoserve as a chip-to-chip MAC/PHY inter-

face, since the 72-pin XGMII is not apractical one to use on a circuit boardheader or off-board interface.

A designer can use the Xilinx XAUILogiCORE to implement the DTEXGXS (another acronym for XAUI, itstands for “10-Gbit extender sublayer”),PHY XGXS and 10G BASE-X PCS usingboth the gigabit transceiver and FPGAlogic. The XAUI interface is considered aPCS layer that can interface to a 10GEthernet PHY on one side and the MACon the other. The FPGA would notimplement the 10G Ethernet PHY, whichcould be optical or copper.

Xilinx customized the GT wizard’s10GE XAUI template for the Xilinx 10GEXAUI LogiCORE. For custom applica-tions or AllianceCORE, refer to the IPdocumentation implementation details.The designer can use the GT wizard todefine the electrical attributes of the gigabittransceiver for a XAUI PCS.

Common Packet Radio Interface v4.0 You can use the Common Packet RadioInterface (CPRI) for connectivity betweenradio equipment controllers or base sta-tions and one or more radio equipmentunits. The specification covers Layers 1 and2 of the OSI stack, with the physical layer(Layer 1) defining both the electrical inter-face used in traditional base stations and anoptical interface for base stations withremote radio equipment. The Xilinx CPRILogiCORE implements the PHY in theGT and the data link (Layer 2) in the logicbody of the FPGA.

CPRI specifies line bit rates of 614.4Mbits/second (E6), 1,228.8 Mbits/s (E12)and 2,457.6 Mbits/s (E24). It specifiesboth high-voltage and low-voltage variantsfor E6 and E12, whereas E24 has only alow-voltage specification. The high-voltagevariants are based on IEEE 802.3-2002,Clause 39 (1000BASE-CX) and the LVvariant on IEEE 802.3ae-2002, Clause 47(XAUI). CPRI can also use the low-voltageXAUI specification, LVXAUI.

You can use the GT wizard to define theelectrical attributes of the GT as per thestandard specifications CPRI v4.0 for theXilinx CPRI v4.0 LogiCORE, which uses a

modified version of 100BASE-CS andXAUI for the electrical. Use the menu item“CPRI v4.0” when viewing or modifyingthe Xilinx CPRI v4.0 LogiCORE.

3G and 6G OBSAI RP3-01The architecture of the Open Base StationArchitecture Initiative (OBSAI) RP3-01cellular base station protocol is divided intothe lower physical layer and the higherapplication, transport and data link layers.The application layer can interface withbaseband or RF cards, while the data linklayer interfaces with the physical layer.Xilinx implements the PHY using a trans-ceiver in the FPGA to handle the electricalportion and connecting it to an externaloptical-transceiver module.

In the case of OBSAI, the 8b/10bencoding and word-alignment blocks, aswell as the synchronization and phase-alignment buffers, are embedded in thetransceiver blocks and do not require theuse of FPGA logic.

The OBSAI RP3-01 electrical parame-ters comply with XAUI electrical (Clause47 of IEEE 802.3ae-2002) up to 3.072Gbits/s and Common Electrical IO (CEI)G6 for both the short-reach (CEI-6G-SR) and long-reach (CEI-6G-LR) 6.25-Gbit/s standards. The GT wizardprovides protocol templates for the Xilinx3G OBSAI RP3-01 and 6G OBSAI RP3-01 LogiCORE, implemented using amodified CEI-6G, XAUI/SRIO electri-cal. The menu items are 3G OBSAI RP3-01 and 6G OBSAI RP3-01.

PCE Express, Generations 1 and 2 The PCI Express protocol is defined inthree layers: physical, data link and trans-action. Thanks to its popularity, newerserial protocols may choose to be electri-cally compatible with or similar to PCIExpress, allowing vendors of ASSPs andother PHY devices to reuse portions oftheir well-tested IP portfolios. Xilinximplements the first- and second-genera-tion PCI Express protocols in an integrat-ed hard-IP block (Endpoint Block PlusWrapper for PCI Express LogiCORE)and as soft IP through Xilinx andAllianceCORE partners.





You can use the GT wizard to generatethe settings for customization to be manu-ally migrated into the source code wrapperof the hard-IP block. The wizard has twoprotocol templates to support the PCIExpress physical layer: PCI Express Gen1and PCI Express Gen2.

Serial RapidIOLike PCI Express, the Serial RapidIO pro-tocol is also defined in three layers, in thiscase, physical, logical and transport. It is thenature of this industry to leverage as muchof existing technical specifications as is rele-vant for the purpose of the protocol. Thus,Serial RapidIO’s physical layer uses theXAUI electrical interface (IEEE 802.3ae-2002, Clause 47) as a guide in defining itsown parameters for the AC electrical speci-fication. Since RapidIO and XAUI havesimilar applications goals, Serial RapidIOdesigners were able to reuse their existingXAUI electrical designs. The GT wizardsupports the Serial RapidIO PHY by meansof the Serial RapidIO template.

Triple-Rate SDI VideoTriple-rate SDI video reference designs arebased upon the Society of Motion Pictureand Television Engineers (SMPTE) stan-dards. Our integrated reference design sup-ports several standard- and high-definitionflavors of this standard, namely SD-SDI,HD-SDI, Dual Link HD-SDI and 3G-SDI.

The physical connection to the High-Speed Serial Transceiver is differentialCML driving an external cable driver (fortransmit) or an external adaptive receiverequalizer. The common serialization pro-tocol between the standards is very specif-ic and is designed in the FPGA fabric. Thisprotocol has long runs of 1’s and 0’s thatrequire very large AC coupling capacitors.

Broadcast equipment is networkedtogether with the Triple-Rate SDI standardsrunning over 75-ohm coaxial cables. TheGT wizard supports SD/HD/3G-SDI ref-erence designs using the HD-SDI protocolwith the template of that name. The tri-mode reference design uses the dynamic-reconfiguration port to reconfigure thegigabit transceiver to support the SD and3G trimodes. In these modes, SD-SDI and

3G-SDI use the phase-locked loop insidethe GT to recover the data at 2.97 Gbits/s.

Xilinx Trimode Ethernet The Trimode Ethernet MAC is a Xilinximplementation of the 10/100/1G Ethernetprotocol. Xilinx offers the TEMACLogiCORE (soft IP) and Trimode EthernetWrapper for Integrated Block (hard IP). Forthe soft IP, the 1000BASE-X PCS/PMA orSGMII LogiCORE can connect seamlessly.SGMII is a serial connectivity standard thatsupports 10/100/1G operation.

The connection between the TEMACand the GTX is always 1 Gbit/s, but theinterface will sample less frequently forthe slower rates. The SGMII option isdifferent in the wizard’s PCS and PMAsettings, and is available as a separateprotocol template.

Applications using copper wires(1000BASE-T) do not have the same divi-sion of logic between the FPGA and the1000BASE-T PHY as in a 1000BASE-X(“X” signifies optical) PMD device. In thecase of a 1000BASE-T device, the PCS, thePHY device generally implements the PMAand PMD PHY sublayers. The designer canimplement the MAC in the FPGA and theGT through a GMII interface; the MACdoes not to implement any of the PHYlayer. Thus, there is no GT wizard protocoltemplate for 1000BASE-T.

The TEMAC wrapper is an HDL wrap-per around the hard TEMAC sub-block andthe GT I/O blocks (1000BASE-X/SGMIIis integrated into the hard TEMACalready). The Ethernet LogiCORE docu-mentation supplies implementation details.

The GT wizard supports the Tri-ModeEthernet protocol using the GigE(SGMII/1000Base-X) template.

In conclusion, industry-standard proto-cols are constantly changing, and it seemsone or two new ones emerge every year. Assuch, the terminology and underlying tech-nologies can be as confusing as tax law. Themore you understand about the details of agiven protocol’s physical-layer scheme, theeasier it will be to determine the best high-speed serial transceiver wizard protocol tem-plate to use as a starting point for yourdesign projects.

by Gregg J. MacdonaldDesign EngineerElectronic Compute Systems, [email protected]

Personal computers are all around us, help-ing us in innumerable ways and enhancingour productivity. In fact, so ubiquitous isthe PC that many people now take it forgranted. But since the debut of the micro-processor in 1968, design engineers collec-tively have yet to compress all of the thingswe want the PC to do into one small, easy-to-manage “can-really-do-anything” box.

For many common tasks these comput-ers perform very well. But for many others,they are too slow or unreliable. In somecases we can’t even trust the computer willturn on or be ready when we are.Sometimes they hang. Other times theyslow to a crawl. There are well-known fixesfor most of these problems, but even a top-of-the-line PC can’t smoothly stream fromthe Web, play music, show a video and TVshow, and process a telephone call simulta-neously, as a spreadsheet runs in the back-ground. The jobs it can do will often havelots of dependencies, a fact that points tothe PC’s Achilles’ heel.


FPGA Anchors Innovative General-Purpose PCA rethinking of 41-year-old microprocessor and software-OS model yields a new architecture that’s fast, reliable and immune to viruses.

XPER IMENT

Like it or not, this time-multiplexedserial fetch-decode-execute architecture hasnot fundamentally changed in 41 years. Aconsequence of that architecture is that thePC performs hardly any tasks concurrently,except in the case of recent multicoremachines. But even in these PCs, anothersoftware operating system must sit on topof the multiple processor cores, using thesame time-multiplexed serial fetch-decode-execute architecture. Piling on more andmore microprocessors per chip createscomplexity. Such an architecture consumeslots of concurrent clock cycles fetching,decoding and executing instructions thatmay simply result in NOP (no operation)or in getting another instruction.

A new approach using a Xilinx® FPGAas the core of the personal computer cansolve these problems, providing a tremen-dous breakthrough for the full spectrum ofold and new embedded applications. Idevised this approach—which I call HUM-BLE, for Hardware Unified MultipleBranch Logic Engine—over the course of10 years’ work as an antidote to the existingPC architecture. I call the latter SUSBLE,for Software Unified Single Branch LogicEngine. (For their part, multicore machinesmight be termed SUMBLEs: SoftwareUnified Multiple Branch Logic Engines.)

SUSBLE arises from the Von Neumann-Harvard fetch-decode-execute architecture,which is further subdivided into RISC andCISC (reduced or complex instruction-setcomputing, respectively). So while bothtypes of SUSBLE machines are instruction-set computers, HUMBLE is not. Its archi-tecture could thus be called NISC. Insteadof spending time fetching instructions,HUMBLE springs into action immediately,simply because its foundation is Xilinx logiccells, slices and cores, which themselves areessentially always ready to “do” things.

Instant OnSUSBLE computers typically require 15seconds to 2 minutes of checking and set-ting things up immediately upon turn-on,with many dependencies. This offers a hintat how long consecutive strings of SUSBLEprocesses really take before you can dosomething useful with the mouse or strike

seconds the light on the hard disk contin-ues to flicker. It doesn’t stop until approxi-mately 135 total seconds have elapsed. Forthe sake of argument, let’s say 120 seconds,because it’s a nice round number andbecause we know the PC is still doingsomething relating to readiness; so 120/1second = 120.

The HUMBLE PC, meanwhile, is run-ning a digital clock manager (DCM) at100 MHz, so that means it’s operating at1/12 the clocking frequency of the XPmachine. If I normalize the clock frequen-cies with respect to the 1/10 claim, I mustmultiply 120 by 1.2, giving us a factor of144 in favor of HUMBLE. In other words,HUMBLE with Xilinx Virtex® onboardachieves my design goal 144 times faster.

Certainly 1.2 GHz is no longer state ofthe art in conventional PCs; newer Celeronprocessors operate at 2 GHz or more (notto mention duals and quads), scaling theperformance while typically still running alot of character-per-second applications. Soas a second example, let’s compare HUM-BLE with a new single-core Vista machine.With the Vista PC, equipped with a 2.4-GHz Celeron and 2 Gbytes of RAM, ittakes 44 seconds before I can do anythingwith the mouse or keyboard, though it hasno light to monitor what activity may or

a key. In contrast, the HUMBLE PC witha Xilinx FPGA as the center-stage coreturns on in approximately 1 second, and itseems reasonable to reduce this further.

By definition, HUMBLE doesn’t needto fetch instructions. When using thiscomputer, there is no need to start amicroprocessor, fetch boot code, set up acache, maintain cache coherency and soon. Xilinx logic cells, slices and cores areborn ready. By nature they are “instanton” (after configuration). Thus, configu-ration time for the HUMBLE PC moth-erboard (also known as the XilinxML401-VLX25) in serial mode at 40MHz works out to approximately 0.2 sec-ond. Parallel mode is certainly faster, butour design goal of 1-second readiness atthe user level is easily achieved.

Faster at 1/10 Clocking SpeedWhereas HUMBLE is ready in approxi-mately 1 second, my XP machine, builtaround a 1.2-GHz Celeron processor with384 Mbytes of RAM, sometimes takes aslong as 2 minutes to turn on. To be fair, Idon’t really know everything XP is doing inthat time. However, I observe that it takesapproximately 60 seconds before I canbegin to click on anything with the mouseand have any effect. For an additional 75


XPER IMENT

Figure 1 – The HUMBLE PC turns on in a second, thanks to a microprocessor-less architecture that has no need to fetch instructions.

may not continue to and from the SATAhard drive. Using that number (44) andnormalizing the 2.4-GHz processor speedto HUMBLE’s 100 MHz gives us a 2.4multiplier with respect to my 1/10 clock-ing-speed claim. So, 44 times 2.4 = ~106.

The new HUMBLE approach installsXilinx logic cells, slices and cores in a par-allel manner to provide concurrency,resulting in many multiples of execute-execute-execute at much lower clock fre-quency than a conventional PCarchitecture and correspondingly lowerpower. To say this another way, there sim-ply is no need for gigahertz clocking toperform typical PC functions, includingcharacter-per-second applications. Forother kinds of programs that need maxi-mum algorithmic acceleration, includingembedded circuits and applications

greater than character per second, some-thing closer to whatever makes sense ispossible within a HUMBLE PC.

The HUMBLE PC uses Xilinx DCMsand phase-matched clock dividers(PMCDs). To support the file system andjust about any kind of anticipated userapplication, it also includes a time, date,calendar and time zone hardware programusing Xilinx logic cells, slices and cores. Forembedded applications, HUMBLE withXilinx DCMs and PMCDs provides up-and down-frequency selectivity and the fol-lowing standard user clocks: 200, 133, 100,66, 50, 33, 25, 16.5 and 12.5 MHz; 1, 4and 100 microseconds; 10 milliseconds;1/4, 1/2, 1 second, 1 minute, 1 hour, 1 day,1 week, 1 month and 1 year. There are alsoseveral spare DCMs and PMCDs for gener-ating any other desired frequency. Clearly,

there is no gigahertz speed going on here,but if an embedded application needs it,Xilinx Virtex parts provide this too.

Lower PowerUsing Xilinx parts in combination with theHUMBLE architecture as the engine, thereis no need to include a cache, cache con-troller, microprocessor or soft microproces-sor to fetch, decode and wait the 10 to 100cycles or more per instruction for execu-tion. This is one very basic way we achievelower power. Further, for the HUMBLEPC motherboard, we chose the XilinxML401. This board allows HUMBLE tointegrate the display adapter (aka graphicsadapter) and motherboard clock chip.Historically, neither of these has been partof the microprocessor. Then we use Xilinxlogic cells, slices and cores again in combi-


XPER IMENT

WWW

Serial

USB

USB

PS2 i/f

PS2 i/f

Serial i/f DRAM i/f

Dual IDE1 i/f

Dual IDE2 i/f

Compact Flash i/f

Floppy i/f

Display Adapter

Video out i/f

SRAM i/f

Flash i/f

IPMoEth i/f

Telephony i/f

Spare i/f

Spare i/f

Video in i/f

Audio in i/f

Audio out i/f

USB1 i/f

USB2 i/f

Parallel i/f

FPGAFabric

HUMBLENISC

Xilinx Virtex

Figure 2 – Block diagram of HUMBLE PC built around Xilinx Virtex FPGA


nation with the HUMBLE architecture todescribe the display adapter, enabling fur-ther power reduction. Other methods forcurbing power consumption are also avail-able with the Xilinx Virtex series, such asturning off the clock or regions of clocks.

By nature, Xilinx FPGAs have alwaysbeen popular because they can solve aplethora of real-time problems. HUMBLEleverages this capability, intending to makethings easier wherever possible. If nothingelse, HUMBLE and Xilinx allow justabout any real-time application to coexiston the same silicon. Hence, HUMBLEisn’t just a breakthrough for PCs, it is abreakthrough for all the embedded appli-cations that today’s microprocessors can’tdo—or can’t do well.

Immune to VirusesIn addition, the HUMBLE architecture isimmune to viruses in ways that existing

microprocessor-based computers cannot be.In SUSBLE PCs, the operating system andprograms are not write-protected, nor canthey be with any benefit thereof. When anyprogram is running, it has access to and con-trol of nearly the entire machine and can dowhatever the author has intended. Beforeany program, including the OS, begins torun, it must start from somewhere (akaboot). Thus, a “boot virus” can do whateverit wants to, including disabling the OS fromblocking any intended malicious actions,thereby allowing control of the entiremachine without exception. A simple virusprogram can destroy or change any part ofthe operating system, application programsor user data, because programs (on disk) allappear the same to a running program asread/writable objects (data). In short, withSUSBLE computers, viruses will remain aperpetual problem because there is no realroot solution possible.

In the case of HUMBLE, we remove theOS from the microprocessor—in fact, wego a step further and remove the wholemicroprocessor. HUMBLE doesn’t load orfetch or architect the OS as a running soft-ware program. Although it loads at turn-on, it does so from nonremovablewrite-protected flash memory as a “hard-ware” program. This protected nonremov-able media is not associated with usermedia and drives or ports. Saying thisanother way, while HUMBLE loads its per-sonality upon turn-on, it doesn't load itfrom a floppy or hard disk or other unpro-tected read/write media. Programs cannotaccess HUMBLE’s fixed protected flashmemory (on the motherboard) during run-time; the memory is not writable by itselfor by other running programs.

Presently, the Xilinx Parallel IV cableprogramming interface is required forchanges. This method allows upgrades tothe operating system while preventingvirus entry altogether into the OS. So, byintentional design it is impossible for auser program (virus) to disable or over-write any part or portion of the on-chipsilicon-based-operating system or silicon-based programs.

Certainly it can be argued that virusentry is feasible through the Xilinx pro-gramming interface. While that is theoreti-cally possible, in fact this interface typicallywill be used once or a small number oftimes, usually at the manufacturer or anaffiliate. So clearly, this is not a wide-openpoint of entry (as is the case with presentmicroprocessor-based computers). A viruswould have to come directly from the man-ufacturer or affiliate, hence reducing theprobability of occurrence. The problembecomes one of quality control on the partof a manufacturer. So while viruses are not100 percent impossible, the chance ofoccurrence is tremendously reduced—wesuggest by as much as 99.9 percent.

Even if a virus were to enter the systemin this way, however, it could not spread to

XPER IMENT

Figure 3 – The ML401 board lets the HUMBLE PC integrate the display adapter (aka graphics adapter) and motherboard clock chip.

We remove the OS from the microprocessor—in fact, we go a step further and remove the whole microprocessor. Instead, HUMBLE loads the OS at turn-on

from write-protected flash memory as a ‘hardware’ program.

other HUMBLE computers; it wouldcompromise only the one affectedmachine. So the virus isn’t contagious, itisn’t virulent. In fact, it is self-limiting—self-quarantined—at least among otherinterconnected HUMBLE computers.Assuming a case of virus entry, the solu-tion is simple: just revert to the previousknown-good version of the operating sys-tem or delete the affected user data.

Extending Legacy ProgramsThe present HUMBLE PC and develop-ment system do not support existing x86programs. This is intentional, to make theHUMBLE innovation contained within theopen architecture of a Xilinx VirtexFPGA—and corresponding open architec-ture of the Xilinx ML401 board—clear toall. Having said this, many options are avail-able to bridge this new architecture back tox86—for example, through the file system,Ethernet connection, display adapter (framebuffer) or other creative means.

The reprogrammability of FPGAs con-jures the possibility of selectively configur-ing HUMBLE or SUSBLE, or somecombination of both, at turn-on within thesame Xilinx motherboard. As anotherexample, HUMBLE could serve as thesupervisory operating system on top of agiven connection of gigahertz (or less)CISC microprocessors—hard or soft,duals, quads and so on. This seems to makesense because of the architecture’s naturalconcurrency without need for an instruc-tion stream. Presently, Electronic ComputeSystems, Inc. has no effort under way tobridge HUMBLE to x86. So this is a wide-open opportunity for third parties.

How HUMBLE Parallels Xilinx Logic Cells The idea of using parallel logic or parallel-ing logic doesn’t make much sense from aconventional software perspective, becauseit usually cannot be performed except incases where multiples of the actual hard-ware elements—components, functions

and so on—exist, and the software compil-er allows access. Even then it can be quiteobtuse, given the nature of software-drivenfetch-decode-execute architectures. Hence,paralleling logic is almost completely for-eign when writing software.

Historically, software engineering or atleast a mind-set toward software is necessarywhen in the driver’s seat as a system archi-tect, designer or product engineer. This lega-cy is wise and makes good sense for mostcompanies, because most systems are micro-processor based. Hence we are presentlyliving in a mostly software-driven (serial-time-multiplexed) world of products.

However, Xilinx Virtex FPGAs nowdeliver millions of instant-on reconfig-urable and parallelable (in almost anycombination) transistors on a chip.Unsurprisingly, they are often still used ina serial-time-multiplexed fashion withoutexploiting their additional feature ofinherent parallelism. For example, thereare approximately 11,000 slices (24,000logic cells) in the Xilinx VLX 25, all onthe same silicon (and Xilinx offers muchbigger devices). They all can be configuredduring the same configuration cycle andthen clocked with the same or differentclocks. But clearly they are all on, they areall available and they may be connected(paralleled). Xilinx provides plenty ofinterconnect, so the challenge becomesone of the designer’s ingenuity.

A HUMBLE approach constructs acomputing engine that exploits parallelismupon a foundation that is fundamentallyparallel. So let’s look at three parts ofHUMBLE that run in parallel.

The first is a portion of HUMBLE’s dualhot-pluggable and hot-swappable PS2interface. Within this part is a COREGenerator™ FIFO for temporary storageof received values. There was no need todesign the FIFO from scratch, becauseXilinx offers a parameterizable core that willwork with less effort. After startingCOREGen and selecting Memories &

Storage Elements – FIFOs (Fifo Generator),I entered the component name and desiredparameters. Then I pressed Finish, andCOREGen delivered a ready-to-use FIFOthat synchronously and elastically stores thenumber of values desired.

Using this same mind-set, I’ve created adevelopment system and opened access tothe HUMBLE intellectual property. TheHUMBLE PC Development System addsto the Xilinx cores by including a basic setof components written by ElectronicCompute Systems, Inc. as well as libraryaccess to HUMBLE’s more-sophisticatedcores. Thus, we’ve made the buildingblocks of HUMBLE accessible from alibrary of basic components. Our Website, www.ecs-pc.com, offers a list of librarycomponents and cores (look underServices & Cores).

For design entry when using cores, allwe need is a component declaration and alink to its library (for the compiler). Here isthe VHDL that describes the instantiationfor this Xilinx CoreGen FIFO within thehot-pluggable and hot-swappable HUM-BLE PS2 interface:

U40: fifo8x256

PORT MAP(

Clk => Clk100,

Sinit => Reset,

Din => RxData(8 DOWNTO 1),

Wr_en => DevWrite, --

Rd_en => PushBDBreg(2),

Dout => DevFifoout,

Full => DevFifoFull,

Empty => DevFifoEmpty,

Data_Count => DevFifoCount

);

Consider that two of these buffers arerunning in parallel. One is used for the key-


XPER IMENT

Xilinx Virtex FPGAs deliver millions of instant-on reconfigurable and parallelable transistors on a chip. The HUMBLE approach constructs

a computing engine that exploits this parallelism.


board and the other for the mouse. Hence,they are running in parallel with respect toeach other and everything else, and areboth “on” immediately after configuration.

A second example is the pipelinedNoBL (no bus latency), also called zero busturnaround (ZBT), static-RAM interface.It uses a Cypress CY1354B SRAM. Thisinterface worked the first time withoutmodification. So let’s look at HUMBLE’scontrol signals for this NoBL SRAM.

The Xilinx ML401 board shares thedata bus between linear flash and NoBLSRAM, so to use both requires care. Thissharing is a design trade-off of the ML401board, not of HUMBLE. As it turns out,it’s a reasonable compromise on Xilinx’spart, given all of the other pin- and func-tional-related requirements vs. the numberof pins available. Here is the VHDL for theNoBL SRAM control signals that synthe-size to logic cells and slices:

CEtmp <= '1' WHEN OptionsReg(0) =

SRAM AND (ReadSRAM = '1' OR

WriteSRAM = '1') ELSE '0';

-- Clock Enable

Bwatmp <= '1' WHEN WriteSRAM = '1'

ELSE '0'; -- byte write selects

Bwbtmp <= '1' WHEN WriteSRAM = '1'

ELSE '0';

BWctmp <= '1' WHEN WriteSRAM = '1'

ELSE '0';

Bwdtmp <= '1' WHEN WriteSRAM = '1'

ELSE '0';

Wetmp <= '1' WHEN WriteSRAM = '1'

ELSE '0'; -- wr enable

Oetmp <= '1' WHEN ReadSRAM = '1'

ELSE '0'; -- output enable

Ce1tmp <= '1' WHEN OptionsReg(0) =

SRAM AND (ReadSRAM = '1' OR

WriteSRAM = '1') ELSE '0';

-- Chip Enable

Ce2tmp <= '1' WHEN ReadSRAM = '1'

OR WriteSRAM = '1' ELSE '0';

Ce3tmp <= '1' WHEN ReadSRAM = '1'

OR WriteSRAM = '1' ELSE '0';

CE <= CEtmp ;

Bwa <= Bwatmp;

Bwb <= Bwbtmp;

BWc <= BWctmp;

Bwd <= Bwdtmp;

We <= Wetmp ;

Oe <= Oetmp ;

Ce1 <= Ce1tmp;

Ce2 <= Ce2tmp;

Ce3 <= Ce3tmp;

This is a synchronous interface to andfrom a synchronous SRAM. All of the sig-nals above are coming from or going to reg-isters. Packing the IOB registers is assumed.Corresponding polarity flips as per thedevice datasheet are also assumed. Noticethat the OptionsReg(0) signal makes theNoBL SRAM bus cycles “chip-enable con-trolled,” steering them toward the SRAMor, conversely, to linear flash. Also noticethat there is no need for byte-wide writeenables thus far, so they actually have acommon driving-signal description. Whileit doesn’t need to be as verbose as shown, wecan rely on synthesis to simplify it, or comeback at some later time and manually do thesame. However, if it should make sense inthe future, byte-wide control is possible andI have retained the hooks for it. The SRAMcontrol signaling is fairly simple and residesin the logic cells and slices.

As the third example within theTCP/IP portion of HUMBLE known asipMoE (Internet Protocol OverEthernet), there is a need to complementa checksum value. Here is the line ofVHDL used within ipMoE that synthe-sizes to logic cells and slices.

U263:OnesCompl16 PORT MAP(CheckSum,

CheckSum1sCompl);

And here is the corresponding VHDLcode for performing the one’s comple-ment.

LIBRARY ieee;

USE ieee.Std_Logic_1164.ALL;

USE ieee.Std_Logic_unsigned.ALL;

ENTITY OnesCompl16 IS

PORT(

Ain :IN

Std_Logic_Vector(15 DOWNTO 0);

Cout :OUT

Std_Logic_Vector(15 DOWNTO 0)

);

END ENTITY OnesCompl16;

ARCHITECTURE ArchOnesCompl16 OF

OnesCompl16 IS

-- Internal nodes

SIGNAL Aintmp :Std_Logic_Vector(15

DOWNTO 0);

BEGIN

-- The following describes comple-

menting each bit

g0:FOR i in 0 to 15 GENERATE

Aintmp(i) <= NOT Ain(i);

END GENERATE g0;

Cout <= Aintmp;

END ArchOnesCompl16;

These three examples span four interfaces,including mouse, keyboard, NoBL SRAMand Ethernet. They all operate in parallel,which makes sense for overall operation,because they reside individually in Xilinx logiccells, slices or cores, which themselves are fun-damentally all simultaneously on. Hence,using Xilinx logic cells with a set of HUM-BLE basic components and more-sophisticat-ed cores makes it possible to exploitparallelism, unleashing things previouslythought too difficult or impossible in a PC.

Certainly these examples do not reveal allthe inner workings of the embedded HUM-BLE PC. This is why we offer library accessto these basic components and more-sophis-ticated cores. I hope these examples togetherwith my explanations will begin to give asense of the innovation and its potential.

We are actively seeking embedded and PCpartners with an eye toward making theHUMBLE PC full-featured. If you are inter-ested in knowing more, consider signing anondisclosure agreement with us. Also, feelfree to contact us about your embeddedapplication. To see a demonstration of theHUMBLE PC, visit www.ecs-pc.com. Toview the HUMBLE PC in action, seewww.youtube.com/watch?v=i_v6IaD6-Ts.

XPER IMENT

by Daniele Bagni DSP Specialist FAEXilinx, [email protected]

As a Xilinx FAE, I’m often asked if we offer a DSP core that hasfunctionality particularly suited for all of a customer’s designrequirements. Sometimes a core may be too large, too small or notfast enough. Sometimes, we’ll have a core in development that willexactly fit their needs and soon be available in CORE Generator™.But even in that scenario, customers usually want a particular set ofDSP functionality now, and don’t want to wait. In these cases I oftensuggest they use the interpolated lookup tables in our devices to cus-tomize their DSP functionality.

A lookup table (LUT) is a memory element that “looks up” whatthe outputs should be for any given combination of input states, toensure there are definite outputs for every input. Using LUTs to per-form DSP functions offers some major advantages:

• You can change the LUT content in a high-abstraction-levelprogramming language such as MATLAB® or Simulink®.

• You can design a DSP function to perform math functionsthat would be extremely difficult to emulate with discretelogic operations, for example: y=log(x), y=exp(x), y=1/x,y=sin(x) and so on.

• LUTs are also a simple way to implement a complicatedmathematical function that could require too many FPGAresources in terms of configurable logic block (CLB) slicesand embedded multiply or DSP48 programmable multiply-and-accumulate (MAC) units.

But of course, using LUTs in this manner has some disadvantages,too. When you implement DSP functionality in LUTs you have touse block-RAM (BRAM) elements. Implementing the y=sqrt(x)function for 16-bit input and 18-bit output variables, respectively xand y, requires roughly 64 BRAM units of 18 Kbits each. Giving upthat amount of BRAM may be too costly from a system architecturepoint of view if, for example, you are targeting a small Spartan®

device or if you have too many operations to implement and can’tspare 64 BRAM units for each one.

The interpolated LUT approach delivers the advantages of aLUT-based function implementation without using too muchBRAM. In this technique, you use contiguous outputs from a small-er LUT—for example, one that is 1K words deep—and linearlyinterpolate it to emulate a larger LUT. In doing so, you can achievea finer numerical resolution than is possible with a 1K-word LUT.This makes the cost of using a LUT much more reasonable, as theimplementation only takes one BRAM and one embedded multi-plier (or DSP48) plus a few CLB slices for the control logic. In addi-tion, the numerical accuracy will be more than satisfactory from thepoint of view of signal-to-noise ratio (SNR).

Of course, applying the interpolated LUT (ILUT) technique takesa bit of know-how. As an example, using the approach on they=sqrt(x) function will show clearly how the ILUT performs in termsof area occupation, timing and numerical accuracy. After walkingthrough that exercise, I will review some practical examples where Ihave applied this approach for customers with completely differentdemands—specifically, in the linearization of a sensor with a nonlin-ear transfer function and in an implementation of an adaptive finiteimpulse response (FIR) filter to remove speckle noise from syntheticaperture radar (SAR) images.


Interpolated Lookup Tables: SimpleWay to Implement a DSP FunctionInterpolated Lookup Tables: SimpleWay to Implement a DSP FunctionIf a digital signal processor core doesn’t have precisely the functionality you need, use an ILUT to bridge the gap. If a digital signal processor core doesn’t have precisely the functionality you need, use an ILUT to bridge the gap.

ASK FAE -X

Designing in System Generator for DSP To implement a DSP algorithm on a Xilinx FPGA, I use the SystemGenerator for DSP design and synthesis tool, which works withinthe Simulink model-based methodology from The MathWorks.System Generator benefits from the Xilinx DSP blockset forSimulink and will automatically invoke CORE Generator to pro-duce highly optimized netlists for the DSP building blocks.Simulink is a double-precision floating-point design tool, whereasSystem Generator is a fixed-point arithmetic tool. Nevertheless,using these tools together, you can define the total number of bitsand the binary position of every signal, and can therefore manipu-late fractional numbers in fixed-point arithmetic. The simulationresults are cycle-accurate and bit-true, so you can easily comparethem to floating-point reference results generated with either MAT-LAB scripts or Simulink blocks to check for quantization errors.

Figure 1 shows the System Generator top level of the ILUT scheme.To make the approach as general as possible, the input variable x ofnx=16 bits is supposed to be normalized between 0 (included) and 1(excluded); therefore, its format is “unsigned 16 bits in total with 16bits to the right of the binary point,” and is called the Ufix_16_16 for-mat. The most significant bits (MSB) and least significant bits (LSB)

blocks respectively extract nb=10 MSB and nx-nb=6 LSB from theinput data. These signals are called x0 and dx. The output y=sqrt(x) isrepresented with ny=17 bits, normalized into the Ufix_17_17 format.

Figure 2 illustrates the stage with the small LUT with a depth of1K words that I’ve implemented via a dual-port RAM block. Sincesuch a block is applied as a read-only memory (ROM), the Booleanconstant block, We_const, forces the write enables to zero. The sig-nals x0 and x0+1 are used as two subsequent addresses to the ROMtable. The zero constant of the Data_const block defines the size ofany ROM word (ny, in our case).

The following formula illustrates how to linearly interpolate apoint at coordinates (x, y) between two known points (x0, y0) and(x1, y1), with x0 the MSB of x:

Note that x1 and x0 are two contiguous addresses to the smallLUT and they are one LSB apart from each other. Since the smallLUT has nb bits of addressing space, such LSB has the value 2-nb.


ASK FAE -X

Latency = 3 CLK Latency = 6 CLK

y

1

SMALL _LUT

z -3 X0

Y0

Y1 MSB

[a:b]

LSB

[a:b] Interpolation

z -6

Y0

Y1

dx

sqrt(x)

Delay 1

z-3

Delay

z-1

System

Generator

x

1

UFix _10 _0

UFix _17 _17

UFix _6_ 0

UFix _17 _17

UFix _16 _ 16

UFix _17 _ 17

UFix _ 16 _16 UFix _6_ 0

Y1

2

Y0

1

We_const

0

Offset_const

1

Dual Port RAM

addra

dina

wea

addrb

dinb

web

A

B

Data_const

0

Add

a

b

a + b

X0

1

UFix _17 _17

UFix _17 _17

Bool

UFix _1_0

UFix _10 _0

UFix _17 _17

UFix _10 _0

Figure 1 – Top-level scheme of interpolated lookup table in System Generator for DSP

Figure 2 – Scheme of the smaller LUT in System Generator for DSP

The interpolation stage is shown in Figure 3. The Reinterpret blockchanges the dx=x-x0 signal without altering the binary representa-tion; it relocates the binary point (from UFix_6_0 to UFix_6_6 for-mat) and outputs a fractionary number of nx-nb bits, thuscomputing the value (x-x0)/ 2-nb.

In hardware, this block costs nothing. Generally speaking (anddepending on what function we’re applying using an ILUT approach),if y1=0 and y0=0, we force y1-y0=1 so that we get 1/2-nb instead of zero.We perform this action with the Mux, Relational, Constant andConstant1 blocks. The remaining Mult, Add and Sub blocks executethe linear interpolation formula. In this case, I forced the output signalfrom the Mult block to have 17-bits resolution and not 23 as theoreti-cally required, because the overall numerical accuracy is good enoughfor the experiment. Furthermore, all the results are unsigned, since they=sqrt(x) function is monotonically increasing. In other words, differ-ent functions may require different tuning of the data types, but won’tdeviate dramatically from the principles of the scheme in Figure 3.

Assuming the Spartan-3E 1200 (fg320-4) is our target device andwe’re using the ISE® Design Suite and System Generator for DSP10.1SP3 release tools, the FPGA resources occupied after place androute are summarized here:

Design Summary using target part “3s1200efg320-4”

Logic utilization:

Number of slice flip-flops: 198 out of 17344 1%

Number of four-input LUTs: 086 out of 17344 1%

Logic distribution:

Number of occupied slices: 111 out of 08672 1%

Number of MULT18X18: 001 out of 00028 3%

Number of BRAMs: 001 out of 00028 3%

The design is fully pipelined and can deliver a new output result atany clock cycle. The latency is 10 clock cycles and the maximum datarate achievable is 194.70 MSPS (millions of samples per second). Interms of numerical accuracy, the ratio between the power of referencefloating-point results and the quantization error of the SystemGenerator for DSP fixed-point output yields an SNR of 71.94 dB or77.95 dB, for an ILUT with a depth size of 1K or 2K words respectively.

Alternatively to the ILUT, we can apply the CORDIC SQRTBlock from the Xilinx Reference Math Blockset in SystemGenerator for DSP. In this case the overall latency is 37 clockcycles, max data rate is 115.18 MSPS and area resource occupa-tion is 940 slice flip-flops. There are 885 four-input LUTs, 560occupied slices and two MULT18x18 embedded multipliers. TheSNR is 40.64 dB. These results show that CORDIC is a fantasticway to implement math operations in fixed point. But the ILUTapproach is even better in many cases.

Linearizing Nonlinear Sensors Nowadays a lot of companies use “smart sensors” in industrialcontrol systems requiring low area, low power consumption andhigh performance, together with the lowest possible cost andreduced development time. A generic smart sensor can be seen asone functional unit composed of a sensor and its signal-condi-tioning circuitry, an analog-to-digital converter (ADC) and anassociated DSP subsystem (with or without an additional embed-ded processor)—all integrated in the same device, as shown inFigure 4.

The smart sensor has to convert physical quantities—forexample, currents in an electric motor—into digital signals thatdigital electronic circuits can process. Certain features of the com-ponents and the technology used to construct these sensors com-monly cause errors such as offset, gain and nonlinearity. Sucherrors produce an overall transfer function that is often not linear.

Frequently, customers correct these errors within the DSP sub-system running in their products. If y=f(x) is the digital outputsignal from the sensor and ADC cascade, the DSP subsystem hasto compensate for a nonlinear function by performing its inverseg(y)=f-1(y), so that the overall output z becomes

which is the equation of a straight line with slope m and verticalintercept b.


ASK FAE -X

Out 1

1

Sub

a

b

a - bz-1

Relational

a

b

a = b

z-0

Reinterpret

reinterpret

Mux

sel

d 0

d 1

z-1

Mult

a

b

(ab )z-3

Delay _ dx

z-2

Delay _ Y 0

z-1

Delay

z-4

Constant 1

1

Constant

0

Add

a

b

a + bz-1

System

Generator

dx

3

Y 1

2

Y 0

1

UFix _17 _17

UFix _17 _17

Y 1

UFix _17 _17Bool

sel

UFix _18 _17

out _mux

UFix _17 _17

UFix _17 _17

UFix _17 _17

UFix _6_0

Fix _23 _23

UFix _6_6

UFix _17 _17

UFix _6_6

UFix _18 _17

Figure 3 – Linear interpolationscheme in System Generator for DSP


The simplest linearization method is the LUT approach, with thesensor calibration points stored in a ROM. However, for a 16-bitADC, the ROM size would be too large, requiring 64 BRAM units.This is where an interpolated LUT is a good solution.

Let’s assume, as an example, that the nonlinear transfer func-tion is a parabola. The next MATLAB fragment code explainshow to generate the m and b parameters of the final straight lineand how to compute g(y), the inverse function of f(x). Figure 5illustrates the three curves in different colors. Note that theinversion process of f(x) can cause missing values in g(y). That’sbecause there can be several points with the same y value corre-sponding to distinct, different x points. Therefore, g(y) has to besmoothed to fill all the possible holes. (For reasons of concise-ness, I haven’t included this part of the operation in the MAT-LAB fragment.)

ny = 18; % 18 bits per word of the ILUT applied for sen-

sor linearization

nx = 16; % number of bits of the sensor ADC output codes

% emulation of sensor nonlinear transfer function

x = -1.5 : 3 / 2^nx : 1.5 -3/2^nx;

y = -0.196 + x + 0.1 * (x.^2);

% parabola y=f(x)

min_x = min(x); max_x = max(x); min_y = min(y); max_y =

max(y);

% histogram stretching of y curve to use all the 2^N

available ADC bits

yy = ((y-min_y) .* (2^nx-1))/(max_y-min_y);

yq = round(yy); % to have a purely integer transfer

function

figure; plot(x, yq, 'k', 'LineWidth',2); title 'parabola

y=f(x)';

% linear regression of y = f(x) to determine the slope and

y-intercept of

% the equation y = m*x + b of a straight line; at the end

we will get:

% m = 2.184494117400677e+004 and b = 2.952424998204837e+004

% with a correlation factor r = 0.997036366735360 (values

close to 1 indicate excellent

% reliability of the linear regression)

%

p = polyfit(x, yq, 1); m = p(1); b = p(2);

% let us plot the line approximating the y curve: z = b +

m * x;

z2 = b + m .* x; % generic straight line defined by

b and m

zq = z2 + abs(z2(1)); % vertically translated to have same

intercept of parabola f(x)

zq = round(zq);

hold on; plot(x, zq, 'g', 'LineWidth',2); grid;

title 'parabola y=f(x) and line y=m*x+b'; hold off;

x_of_y = -1 * ones(2^nx, 1); % memory allocation for g(y)

addr = uint32(yq(1:end));

% this is g(y) reverse function of the non linear sensor

curve y=f(x)

x_of_y(addr(2:end)) = ( (x(2:end)-min_x).*2^ny)/(max_x-

min_x);

x_of_y(1)=0; % addr(1)=0 is forbidden in MATLAB

% x_of_y has to be smoothed before being used (not shown

here)

figure; plot(yq, (x_of_y), 'b', 'LineWidth',2); title

'original x=g(y)'; grid

% reference LUT with values to linearize the sensor non

linear transfer function

refLUT = round( m .* (min_x+ smoothed_x_of_y*(max_x-

min_x)/2^ny) +b +abs(z2(1)) );

ASK FAE -X

x

Smart Sensor

NonlinearSensor

ConditioningCircuitry

ADCDSP Subsystem

z=g(y)

y=f(x) z=mx + b

Figure 4 – Block diagram of a smart sensor

Figure 5 – The parabola emulating the nonlinear sensor transfer functionf(x) is shown in black. The straight green line represents the final sensor

obtained by the linearization DSP subsystem, which applies the inverse g(y) function shown in blue.

Running the fixed-point cycle-based simula-tion in System Generator for DSP, I obtain a92.48-dB SNR on the whole output range ofthe nonlinear sensor, with a design pretty simi-lar to the one reported in Figures 1-3.

Speckle Noise ReductionTracking a target object from a high-speedmoving system—namely, a missile—is a chal-lenging task requiring a highly sophisticatedDSP algorithm and different types of acquisi-tion media, such as synthetic aperture radarsensors. As is typical of all electromagneticcoherent sources (such as lasers), SAR imagingdevices suffer from speckle noise. Therefore, the first stage of anySAR-based DSP chain is a bidimensional (2D) adaptive FIR filterto reduce that noise (it isn’t possible to completely remove it).Figure 6 shows a MATLAB simulation of speckle noise. The noisesynthetically deteriorated the image on the left; the right image isthe output of the 2D FIR filter golden model.

Speckle noise is a multiplicative noise following an exponential dis-tribution and is completely defined by its variance σ. Hence, a wide-ly adopted anti-speckle noise method is the Frost filter, named forauthor V. S. Frost, who wrote a paper on the phenomenon in 1981.In the case of a 3x3 block size, it can be modeled by the formula:

with xij and yij representing the input and output samples, respec-tively, of the Frost filter. K is a gain factor to control the filterstrength (for simplicity’s sake, I assume in the following thatK=1), μ1 and σ are respectively the mean and variance values ofthe 2D kernel, Tij is the matrix of the distance from the centraloutput pixel (of index ij=22) and all the surrounding ones. Thenext equations illustrate that the key factor in implementing sucha filter is R1, the ratio between the momentum of first- and sec-ond-order μ1 and μ2 in the 3x3 block:

R1 has a range between 0 and 1, and I have found experimen-tally that it can be represented with 16 to 20 bits to get goodnumerical accuracy.

Once I designed the stage to compute R1 in System Generatorfor DSP, I decided to implement the normalized filter coefficients

via interpolated LUTs. The content of the LUTs is shown in the fol-lowing MATLAB code:

nb = 10; % number of bits to address the LUT

% input addresses to the LUT

R1 = 0 : 1/(2^nb) : 1-1/(2^nb);

R = 9*R1 - 1; % R = N * R1 -1 with N=9

ind_R = (R<0); R(ind_R)=0; % just to be sure that R >= 0

always

x = R*sqrt(2);

M11_LUT = exp( -x ); % coeff. for diagonal indexes

ij=11, 13, 31, 33

M12_LUT = exp( -R ); % coeff. for vert. & horiz. Indexes

ij=12, 21, 23, 32

tot = 4*(M11_LUT + M12_LUT) +1; % sum of all coeff.,

including index ij=22

norm_M11_LUT = M11_LUT ./tot; % normalized coeff. for diag-

onal (ij=11, 13, 31, 33)

norm_M12_LUT = M12_LUT ./tot; % normalized coeff. for vert.

& horiz. (ij=12, 21, 23, 32)

M22_LUT = 1 ./ tot; % normalized coeff. for cen-

tral position (ij=22)

figure; plot(x, norm_M11_LUT, 'r', x, norm_M12_LUT, 'g', x,

M22_LUT, 'b');

grid; title 'normalized M11 (red), M12 (green), M22 (blue)'

Figure 7 illustrates the curves of the normalized coefficients alongthe R1 input signal. There are only three curves, since the Tij matrixis symmetric around the central pixel of index ij=22. The numericalresults exhibit SNR values from 81.28 to 83.38 dB, depending onthe curve, when compared against purely floating-point referencemodels. For the curious reader, the following MATLAB fragmentcode shows the 2D filter processing (for conciseness, ILUT func-tions are not included).

ASK FAE -X

Figure 6 – Speckle noise affects the image on the left; on the right side is the filtered image.



function [out] = synth_frost3x3(inp) % inp is a vector of 9 samples

mu1=0; mu2=0; for k = 1 : 9

mu1 = mu1 + inp(k); % first order momentum (or mean value)mu2 = mu2 + inp(k)^2; % second order momentum

end

R1 = mu2 / (mu1^2 +1); if (tmp_R1<0) R1=0; end % since R1>=0 always

% in MATLAB array indexing goes from 1 to 1024 (not from 0 to 1023 as in HW)if (R1 >=1024) addr=1024; elseif (R1==0) addr=1; else addr=R1+1; end

% the ILUT functions are not shown hereM11 = M11_ILUT(addr); M12 = M12_ILUT(addr); M22 = M22_ILUT(addr);

reg11 = inp(1) + inp(3) + inp(7) + inp(9); % pre-add pixels of index 11, 13,31, 33reg12 = inp(2) + inp(4) + inp(6) + inp(8); % pre-add pixels of index 12, 21,23, 32out11 = M11 * reg11; % sum of filtered pixels index 11, 13, 31, 33out12 = M12 * reg12; % sum of filtered pixels index 12, 21, 23, 32out22 = M22 * inp(5); % central pixel of index 22 in the 3x3 blockout = out11 + out12 + out22; % filter output pixel

In short, these examples show that interpolated lookup tables are a simple yet pow-erful method for implementing DSP functions in Xilinx FPGAs. They can help youachieve very good numerical accuracy (SNR) and high data rates while keeping areaoccupation relatively low.

AcknowledgementsI would like to thank my colleague Michel Pecot, who first introduced me to the interpolated lookuptable approach with his Gamma Correction design in System Generator for DSP, when I joined Xilinxthree years ago.

Daniele Bagni is a DSP Specialist FAE in the Xilinx Milan sales office in Italy. What Daniele enjoysmost about his job is providing customers with feasibility studies and facing a large variety of DSP appli-cations and problems. In his spare time, he likes to play tennis. At home he loves biking or playing tabletennis with his wife and two sons.

ASK FAE -X

0 2 4 6 8 10 120

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

R1

mag

nitu

denormalized M11 (red), M12 (green), M22 (blu)

Figure 7 – Normalized coefficients along R1 parameter of the speckle noise reduction filter

If you want to get up to speed fast onXilinx® Targeted Design Platforms, don’tmiss X-fest 2009, a 36-city global technicalseminar series offering practical, how-totraining for FPGA, DSP and embedded-sys-tems designers. The events will provideinformation on how to design with theXilinx Virtex®-6 and Spartan®-6 FPGAfamilies, ISE® Design Suite software, Xilinxand third-party IP, evaluation kits and refer-ence designs.

The Avnet Electronics Marketingoperating group and Xilinx havenow opened registration for X-fest2009. Seminars will take place inlocations around the globe begin-ning in October and ending inFebruary 2010. The events sched-uled for North America, Europe andIsrael are listed in the table.

This series of free one-day semi-nars will feature multitrack trainingsessions ranging from building-blockto system-level solutions that use thelatest FPGA technologies. Attendeeswill discover the advantages of thenew Xilinx Targeted DesignPlatforms and their constituentparts, including cutting-edge FPGAdevices, IP and design tools.

You will have access to world-classindustry experts and partner exhibitsfrom Cypress Semiconductor, Intel,Maxim Integrated Products,National Semiconductor, NXP, TexasInstruments and Tyco Electronics.Designers will leave X-fest armedwith the tools and practical knowl-edge you need to tackle existing andfuture design challenges, and extendyour product’s life cycle.

X-fest 2009 will include several interestingcourses. In “Interfacing DDR3 andLPDRAM with the Xilinx Spartan-6 HardMemory Controller,” you will learn aboutthe built-in memory controller in theSpartan-6, its features and how to connectlogic to the hard core to achieve maximumperformance with DDR3. Also, the class willdescribe how to implement the low-powerDDR memory for power-saving applications.

In another course, “High-speedClocking: Challenges, Pitfalls andSolutions,” you’ll learn about the manyon-chip resources of the Spartan-6 andVirtex-6 FPGAs, including the mixed-mode clock managers, phase-lockedloops, digital clock managers, BUFH,GCLK and BUFG. You will learn whenyou should use each of them and why.The class will also review several external

clocking solutions, why they are nec-essary and how they support high-performance sampling systems.

Yet another course, “XilinxNetworking: 10/100/1000 to Real-Time,” aims to help designers tack-ling network connectivity in theembedded realm. Many designers inthis space struggle to satisfy opera-tional and performance goals withintheir cost, power and real estatebudgets. In addition, many network-ing designs have unique demandsthat affect the construction of bothhardware and software platforms.For example, applications such asEtherCAT or SerCOS-III requirespecialized IP or the use of IEEE-1588 components for real-time net-work transactions. This course willexamine multiple networkingoptions, starting from the lowesthardware layers to the top of theTCP/IP protocol stack. It will alsodiscuss the advantages of includingan FPGA in your system design.

For complete information about X-Fest 2009 including full course descrip-tions, locations and registrationinformation, please visit http://www.weboom.com/avnet/index.html.

Sign Up for X-fest 2009 ARE YOU XPER IENCED?

City Delivery Date

Americas Austin, Texas Tuesday, Nov. 3

Baltimore Thursday, Oct. 8

Boston Thursday, Oct. 22

Dallas Tuesday, Oct. 20

Denver Thursday, Nov. 5

Irvine, Calif. Thursday, Oct. 29

Melbourne, Fla. Tuesday, Oct. 20

Minneapolis Tuesday, Oct. 13

San Diego Wednesday, Oct. 28

San Jose, Calif. Tuesday, Oct. 6

Toronto Thursday, Oct. 15

Vancouver, B.C. Wednesday, Nov. 18

Europe, Israel Antwerp, Belgium Thursday, Nov. 26

Århus, Denmark Tuesday, Dec. 1

Leipzig, Germany Tuesday, Nov. 24

Madrid, Spain Thursday, Nov. 19

Milan, Italy Thursday, Nov. 12

Munich, Germany Wednesday, Nov. 11

Oslo, Norway Thursday, Nov. 5

Paris Wednesday, Nov. 18

Silverstone, U.K. Tuesday, Nov. 3

Tel Aviv, Israel Monday, Nov. 9

Warsaw, Poland Tuesday, Nov. 17

Seminars in 36 cities center on Virtex-6, Spartan-6 and Xilinx Targeted Design Platforms.


Hit your marketing target by advertising your product or service in the Xilinx Xcell Journal,you’ll reach thousands of engineers, designers, and engineering managers worldwide!

The Xilinx Xcell Journal is an award-winning publication, dedicated specifically to helping programmable logic users – and it works.

We offer affordable advertising rates and a variety of advertisement sizes to meet any budget!

Call today : (800) 493-5551 or e-mail us at [email protected]

Join the other leaders in our industry and advertise in the Xcell Journal!

Let Xcell Publications help you get your message out to thousands of

programmable logic users.

Get on Target


XAPP1020: Post-Configuration Access to SPI Flash Memory with Virtex-5 FPGAshttp://www.xilinx.com/support/documentation/application_notes/xapp1020.pdf

Virtex®-5 FPGAs support direct configuration from industry-stan-dard Serial Peripheral Interface (SPI) flash memories. After configura-tion, it is possible for your application to read and write to thismemory for general-purpose use. However, before the application cancommunicate with the SPI flash memory, it must first instantiate theSTARTUP_VIRTEX5 primitive to gain access to some of the signalsconnected to the memory. In this application note, Daniel Cherryuses a reference design to explain the techniques you can use to imple-ment the STARTUP_VIRTEX5 primitive and interface it with anexternal SPI flash memory. In particular, this application note walksyou through the instantiation process for implementing the START-UP_VIRTEX5 primitive and a software application to demonstratehow, after you configure it, you can use the SPI flash memory.

The reference design targets the ML505 evaluation board, whichincludes a 32-Mbit Numonyx (formerly STMicroelectronics)M25P32 serial flash memory. For custom applications, we recom-mend you use an SPI flash memory that is supported by theiMPACT configuration software and then confirm device function-ality with iMPACT before testing this reference design.

XAPP1107: Getting Started Using Githttp://www.xilinx.com/support/documentation/application_notes/xapp1107.pdf

Xilinx provides many offerings enabling customers to use Linuxwith Xilinx processor architectures. In addition to obtaining ker-nel sources from supported Xilinx third-party partners, you canalso download kernel sources from the Xililnx Linux Git Tree, a

distributed version control system commonly used for distribut-ing Linux kernel sources.

In this application note, Kris Chaplin describes a way to set up auser environment for building Linux kernels using the Xilinx Gittree. To effectively use this application note, you must be running aLinux operating system that is compatible with the Xilinx ISE®

Design Suite and EDK tools. The processes in this documentationare based on running Red Hat Linux 4 with the EDK and the ISEDesign Suite 10.1 or newer. You must have the Xilinx EDK and ISEDesign Suite for generating custom board hardware images.

XAPP875: Dynamically Programmable DRU for High-Speed Serial I/Ohttp://www.xilinx.com/support/documentation/application_notes/xapp875.pdf

In this application note, authors Paolo Novellini and GiovanniGuasti show how to implement a noninteger data recovery unit (NI-DRU) in your Virtex-5 LXT, SXT, TXT and FXT FPGA-baseddesigns to extend the data rate limit of RocketIO GTP and GTXtransceivers in those FPGAs.

The NI-DRU extends the lower data rate limit to 0 Mbits/sec-ond and the upper limit to 1,250 Mbits/s, making embedded high-speed transceivers the ideal solution for true multirate serial interfaces.You can dynamically program the NI-DRU’s operational settings (datarate, jitter bandwidth, input ppm range and jitter peaking), thus avoid-ing the need for bitstream reload or partial reconfiguration. Operatingon a synchronous external reference clock, the NI-DRU supports frac-tional oversampling ratios. As such, you need only one BUFG, inde-pendent of the number of channels you are setting up, even if allchannels are operating at different data rates.

Given the absence of a relationship between the reference clockand incoming data rate, two optional barrel shifters ease the inter-


Application NotesApplication NotesIf you want to do a bit more reading about how our FPGAs lend themselves to a broad number of applications, we recommend these notes. If you want to do a bit more reading about how our FPGAs lend themselvesto a broad number of applications, we recommend these notes.

XAMPLES . . .

facing of the NI-DRU with an external FIFO or with any requireddecoder. The first barrel shifter has a 10-bit output that you can eas-ily couple to an 8B/10B or 4B/5B decoder (neither of which isincluded in the reference design). The second barrel shifter has a 16-bit output, and Xililnx specifically designed it for 8-bit protocolssuch as Sonet/SDH. You can also design your own barrel shifters.

The authors divide the application note into three main parts:the NI-DRU usage model, simulating the NI-DRU and testing theNI-DRU on the ML523 RocketIO™ transceiver characterizationplatform, revision C or higher. In the NI-DRU usage model section,the authors describe in detail a block diagram of the NI-DRU, inwhich they calculate the overall transfer function of the DRU as afunction of all the hardware settings.

XAPP1110: BFM Simulation of an EDK System Which Uses the PLBv46 Endpoint Bridge for PCI Expresshttp://www.xilinx.com/support/documentation/application_notes/xapp1110.pdf

In this application note, Lester Sanders and Mark Sasten demon-strate how to run a simulation using IBM CoreConnect BusFunctional Language (BFL) commands in an EDK system contain-ing the PLBv46 Endpoint Bridge for PCI Express®. The simulationconsists of a PCIe® downstream port model communicating over aPCIe link to an EDK system that contains the PLBv46 EndpointBridge for PCI Express. The bridge uses the Xilinx Block PlusEndpoint core for PCI Express in the Virtex-5 FPGA. A bus func-tional model (BFM) drives the EDK system.

Xilinx provides a simulation environment based on a downstreamport model that has a test program interface. You can build thisdownstream port model with the Xilinx CORE Generator™ toolusing prewritten programs and Verilog tasks to generate transaction-layer packets. In the note, Sanders and Sasten show how to set up thesimulation and the steps you can use to run the system simulationwith BFL commands. The note also provides example stimuli (specif-ically, root-complex-to-endpoint and endpoint-to-root-complextransactions) to test the PLBv46 Endpoint Bridge using the EDKsystem. The authors then show how to analyze the results from thesetests in the waveform viewer.

This application note is a companion note to XAPP1111,“Simulation of an EDK System Which Uses the PLBv46 EndpointBridge for PCI Express.”

XAPP1111: Simulation of an EDK System Which Uses the PLBv46 Endpoint Bridge for PCI Expresshttp://www.xilinx.com/support/documentation/application_notes/xapp1111.pdf

In this companion note to XAPP1110, Lester Sanders shows how torun a simulation of an EDK system containing the PLBv46Endpoint Bridge for the PCI Express core. Where application noteXAPP1110 describes a bus functional model driving the simulation,this note shows how to use C code to do it. In particular, the simu-lation consists of a PCIe downstream port model communicating

over a PCIe bus to an EDK system containing the PLBv46Endpoint Bridge for PCI Express. You can build the downstreamport model using the Xilinx CORE Generator tool. The PLBv46Endpoint Bridge uses the Xilinx Block Plus Endpoint core for PCIExpress in the Virtex-5 FPGA. C code running on the PowerPC®

440 drives the EDK system.This application note is a companion note to XAPP1110, “BFM

Simulation of an EDK System Which Uses the PLBv46 EndpointBridge for PCI Express.”

XAPP1014: Audio/Video Connectivity Solutions for Virtex-5 FPGAs:Reference Designs for the Broadcast Industry, Volume 2http://www.xilinx.com/support/documentation/application_notes/xapp1014.pdf

This comprehensive, 558-page guide describes how you can useVirtex-5 FPGAs to implement various serial digital video interfacescommonly employed in the professional video broadcast industry.After several chapters introducing the various SMPTE standards,the guide delves into such topics as multirate SD/HD/3G-SDIusing Virtex-5 FPGA RocketIO transceivers; SD-SDI using Virtex-5 FPGA SelectIO™ LVDS; DVB-ASI using Virtex-5 FPGASelectIO LVDS, and AES Digital Audio. A final section discussesmiscellaneous audio and video topics.

XAPP1129: Integrating an EDK Custom Peripheral with a LocalLink Interface into Linuxhttp://www.xilinx.com/support/documentation/application_notes/xapp1129.pdf

In this application note, Brian Hill discusses using a LocalLinkDMA peripheral with the Linux operating system, outlining thesteps and methodology you will need to use a custom LocalLinkscatter-gather DMA (SGDMA) core with Linux. The note providesa LocalLink loopback core with a Linux driver. Hill describes thedriver design and operation at length.

XAPP1130: Architecting ARINC 664, Part 7 (AFDX) Solutionshttp://www.xilinx.com/support/documentation/application_notes/xapp1130.pdf

Each new generation of commercial aircraft has grown more com-plex, especially with the heavy reliance on fly-by-wire and the asso-ciated avionics. As more electronic systems are designed intoairframes, traditional point-to-point wiring schemes are no longerpractical. The designers of the Airbus A380 searched for a solutionto reduce the amount of wiring, increase bandwidth and make useof commercial off-the-shelf (COTS) technology where possible.ARINC Specification 664, Part 7 is the result of that search.

In this application note, Ian Land and Jeff Elliott present adetailed overview of the architecture and function of avionics full-duplex switched Ethernet (AFDX), as defined in ARINC 664, Part7. In addition, the authors provide a detailed description of howyou can map various functional blocks required for an AFDX endsystem to both the Virtex®-4 and Virtex-5.


XAMPLES . . .

by Mike SantariniPublisher, Xcell JournalXilinx, [email protected]

After spending his entire adult life workingas an engineer, including stints in a numberof small electronics companies in New YorkState and the last 24 years at EastmanKodak Co., in Rochester, N.Y., Dick Coreyretired three months ago, at the age of 62.He is already getting antsy, missing thetrade, the lab, and is thinking of new designprojects—and a return to the workforce.

At Kodak, Corey primarily worked onadvanced research-and-development proj-ects, with stints in product design. “Mostof those were low-volume, large-equipmentprojects as opposed to consumer electron-ics, which Kodak is most noted for,” saidCorey. “I’m kind of a generalist. I do ana-log, digital and a bunch of software, usual-ly to support my own hardware projects.”

Corey comes from what he describes as“a nontraditional background” for some-one who spent the majority of his career inadvanced R&D. “I have an associate’sdegree from Rochester Institute ofTechnology,” he said. “That used to botherme, but about 10 years ago I passed thepoint where I needed to worry aboutwhether I had a bachelor’s or not.”

Among the many projects Corey workedon in more than two decades at Kodak, hesaid one of the most notable was a parallel-processor array system for the Transputer,the groundbreaking processor built by theBritish firm INMOS. “It was just about thefirst microprocessor developed for parallel-processing use,” said Corey. “We built thesehuge arrays of processors so they couldcrunch on images for motion picture digitalediting in real time.”


Dick Corey Has the Engineering BugForget the golf clubs. Spartan-3A development board is diversion of choice for one newly retired EE.

PROF I LES OF XCELLENCE

One problem was that at the time theTransputer was designed, in the 1980s,“we had no good way to get the data outto a display,” he said. “So while the pro-cessing was going on in real time, theoperator was waiting for display updatesto come across the Sun VME bus. So Icame up with a frame store architecturethat was itself based on Transputers inconjunction with a large XC4000-seriesFPGA. It fit into the mesh of Transputers[designated] for communications, andthen it delivered frame rate data out to adelivery device. We could configure hugearrays of these frame stores, with each onehandling a portion of the display window,and then stitch the tiles all together invideo on its way to the display.” In thisway, said Corey, “we could do real-timedisplay of these really huge images that themotion picture people were working on.”

Corey has also worked on the develop-ment of several digital-cinema projectorproof-of-concept projects. In the earlyyears, he did a lot of VME projects andcreated frame stores for both video andprint applications. In the course of this var-ied work life, Corey has used every genera-tion of Xilinx® FPGA, from the XC2064,Xilinx’s first device, to the Virtex®-5.

“To many people, the FPGA hasmorphed into something that’s really anASIC that you can reconfigure,” saidCorey. “When they first came out, FPGAsweren’t anything like an ASIC, but I’vealways found FPGAs to be devices that aregreat for rapid prototyping and low-volumemanufacturing in all kinds of applications.”

Although “a sizable part of the engineer-ing segment is of the opinion that an FPGAis just a small ASIC,” Corey perceives “someconsiderable differences, especially from thestandpoint of how you design it. The penal-ty for making a mistake in an FPGA designis minor compared with making a mistakein a huge ASIC. For that reason, it is OK totake more risks when designing with anFPGA, especially if you are doing researchprojects or advanced development.”

Corey was at Kodak roughly a year orso when he started using the XC2064.“That would have been in 1985 or 1986,”he said. “After using that first FPGA, I got

self busy. Not surprisingly for the consum-mate engineer, they all had some elementof construction and building. He’s current-ly doing a remodeling project on his homebut said that while ticking off jobs donefrom the list, he’s also been thinking of newelectronics design projects and is stronglyconsidering turning his “retirement” intomerely a nice long vacation.

“I’m one of those people who is afflict-ed with the engineering disease,” said

Corey. “I really love engineering. It has tobe the world’s greatest job. Unfortunately,it’s becoming more and more difficult tojust be an engineer. A lot of companieswant their engineers to become IP writers,and I find that stupefyingly boring. I’m a‘design it, simulate it, test it’ sort of engi-neer. Let other folks write about the IP.”

In particular, Corey said he’s interestedin further examining what he could dowith systems that incorporate multiplesoft cores. “The last few projects I wasworking on incorporated MicroBlaze™processor cores inside a Virtex-5 usingPlatform Studio,” said Corey. “I was doingsome neat stuff with MicroBlaze, and soone of the things I want to do is experi-ment with using multiple PicoBlaze™cores on an FPGA for some machine con-trol ideas I have. Avnet makes a very nice,low-cost Spartan®-3A development boardand I’m going to buy one of those.”

Asked if he has any other more-tradi-tional retirement diversions in mind,Corey said he occasionally goes withfriends to the shooting range but doesn’t,at least yet, like the idea of spending a fullafternoon doing the usual activities, suchas golf or travel.

“Why would anyone waste an entireafternoon playing golf when they could beout building something?” said Corey.

to the point where I never wanted to usehard logic again. I would go to greatlengths to incorporate it straight into theFPGA. These days if I need DACs or A/Dconverters, I usually roll a sigma-deltaimplementation right into the FPGA. I’veeven done a phase-frequency comparatorfor a phase-frequency synthesizer in one ofthe old 3000- and then 4000-series parts.”

As a longtime user of Xilinx FPGAs, he’switnessed firsthand the evolution of not

only the devices but the methodologies toprogram them.

“Back in the days of the XC2064, youwent into FPGA Editor and fiddledaround by turning all the switches,” saidCorey. “There was no real front-enddesign software whatsoever, but thankful-ly that’s not the case now. Today I preferto drive the design process from a highlevel. You can cause a lot more bits to gettoggled [that way].”

Over his many years in engineering, hehas learned a number of programming lan-guages, including Perl, Forth, C, VisualBasic and VHDL. “But I’ve always pre-ferred GUI-based design, vs. hard coding,”said Corey. “I think there still are a numberof us in that arena.” Nevertheless, Coreysaid that one of the many projects he’s slat-ed during his retirement is to become morefamiliar with Verilog. “I dabbled withVerilog a bit, but for the most part, I’vebeen a VHDL user,” said Corey, who notesthat a fundamental part of being in theelectronics engineering profession is a pas-sion to learn new things, evolve your tradeand embrace new technologies.

“It’s one of the reasons FPGAs and Ihave become so fond of each other over theyears,” he said. “They keep evolving, too.”

When he decided to retire in April,Corey penned a list of projects to keep him-


PROF I LES OF XCELLENCE

‘The penalty for making a mistake in an FPGA designis minor compared with making a mistake in a huge

ASIC. For that reason, it is OK to take more risks when designing with an FPGA.’

Frank TornaghiSenior Vice President of Worldwide SalesXilinx, [email protected]

Over my many years in the semiconductorbusiness, including a long stint sellingASICs, I’ve witnessed firsthand the remark-able evolution of the FPGA from a neat lit-tle novelty device that ASIC teams used inthe 11th hour of design projects for fixingproblems in their ASICs, to a system-on-chip that today has a much stronger overallvalue proposition than any other logicdevice, ASIC and ASSP included.

Ever since Xilinx invented the firstFPGA 25 years ago, FPGAs have radicallygrown in performance, capacity and func-tionality, displacing ASICs and ASSPs in agrowing number of designs targeting thewired and wireless communications, auto-motive, ISM (industrial, scientific and med-ical) and aerospace-and-defense markets.

An ever-increasing number of design-ers are coming to realize that FPGAs offerthe greatest mix of flexibility, time-to-market and overall cost savings of anylogic device. If you make a mistake inyour design or have to add last-minutefunctionality, you can simply modify yourdesign, reprogram the FPGA and test it inyour system immediately.

While the FPGA’s value proposition overASICs and ASSPs will only become moreapparent to a wider number of customers asprocess technologies advance, Xilinx ismaking a concerted effort to further thatadvantage by launching Targeted DesignPlatforms. In addition to offering the best

FPGA silicon on the market withour high-performance, 40-nanometerVirtex®-6 FPGAs and low-cost, 45-nmSpartan®-6 FPGAs, we’re also offering youaccess to domain- and market-specific IP,tools, development boards and referencedesigns to help you get your differentiateddesigns to market quickly.

We created the Xilinx Targeted DesignPlatform concept to better serve your needs,so that if you are creating a design that’s tar-geting a new standard or even entering a newmarket, the plan is to put all the elements inplace to help you succeed.

If, for example, your next design is target-ing the 40G/100G communications market,using our Targeted Design Platform you willnot only have in your arsenal the most sophis-ticated, highest-transceiver-rate FPGAs on themarket—offered in our 40-nm, Virtex-6 HXTdevices, which have up to 64 transceivers, eachoperating at 11.2 Gbits/second—you will alsohave access to a comprehensive library ofXilinx and Xilinx-partner connectivity IP, pro-viding you with key pieces of intellectual prop-erty required specifically for 40G/100Gcommunications designs.

You will be able to implement this mar-ket-specific IP, our domain-specific platformIP and our base platform IP to quickly assem-ble a high percentage of your design, so thatyou can focus your efforts on the veneer ofdifferentiation to maximize your success in

your market. What’s more, we offer a familyof evaluation kits (base, domain- and mar-ket-specific kits) and reference designs tohelp you get up and designing quickly.

If you are an engineer in the embedded-software domain, we offer the domain-spe-cific tools as well as microprocessor andsubsystem IP you’ll need to quickly pro-duce embedded innovations in an environ-ment that’s familiar to you. Xilinx remainscommitted to embedded processing, lever-aging our current family of microprocessorcores, with more road map announcementsto come very soon.

If you are an algorithm developer in theDSP domain, we offer the domain-specif-ic tools and IP you’ll need to implementyour algorithms in our DSP-slice-richSpartan-6 and Virtex-6 FPGAs. What’smore, you will do so in an environmentthat’s familiar to you.

In fact, I believe every single one of ourcustomers will see great benefits and reachnew levels of productivity with our TargetedDesign Platform offerings. Over its 25 yearsof existence, Xilinx has taken great pride inworking with you, our customers, in creat-ing new, exciting innovations. WithTargeted Design Platforms, I believe we cantake your innovations to new heights.


Targeted Design Platforms TakeFPGA Innovation to New HeightsNew device families and accompanying IP, tools and development boards will help you get differentiated designs to market faster.

XPECTAT IONS

© Copyright 2009 Xilinx, Inc. XILINX, the Xilinx logo, Virtex, and Spartan, are trademarks of Xilinx in the United States and other countries. All other trademarks are the property of their respective owners.

New Xilinx Targeted Design Platforms give designers a boost to takeinnovation to a much higher level. These comprehensive platforms combine

Virtex®-6 or Spartan®-6 devices, the ISE Design Suite 11, development hardware, IP,

reference designs, documentation, and service and support—so you’re way ahead of the

competition before you even start. Plus, everything works together seamlessly, so design

teams can focus on product differentiation, add more value, and make the whole project

more successful. Learn more now at www.xilinx.com/6.

RISE ABOVE

PN 2400

Documents

Xcell Journal: Issue 68 - Xilinx - All Programmable · · 2018-01-09a formulation that Cal Tech professor Carver Mead famously dubbed “Moore’s Law.” ... The exorbitant cost,