Xcell - xilinx.comSo we decided to make Xcell Journal a quarterly applications update with a lot of technical detail, how-to content and innovative ideas, as well as silicon, tools

R

Issue 66Fourth Quarter 2008

Xcell journalXcell journalS O L U T I O N S F O R A P R O G R A M M A B L E W O R L DS O L U T I O N S F O R A P R O G R A M M A B L E W O R L D

INSIDE

Algorithm Developers PowerNew DA System on XilinxAutomotive FPGA Platform

Engineer Turns Blown Engine into Hot Startup

How to Beat Your Sonat Guitar Hero Using Xilinx FPGA

Tips and Tricks for Using FPGA Editor and SystemVerilog

INSIDE

Algorithm Developers PowerNew DA System on XilinxAutomotive FPGA Platform

Engineer Turns Blown Engine into Hot Startup

How to Beat Your Sonat Guitar Hero Using Xilinx FPGA

Tips and Tricks for Using FPGA Editor and SystemVerilog

www.xilinx.com/xcell/

Automotive Innovators Hit Top Gear in

Driver Assistance with FPGA Platforms

Automotive Innovators Hit High Gear in Driver Assistance

with FPGA Platforms

Support Across The Board.™

Technical Training Brings New Products to Life

Engineering teams from Avnet Electronics Marketing and Xilinx spend countless hours crafting technical training programs tailored to meet an array of customer

needs. With training opportunities that run the gamut from full-day, intensive design courses to short video demos of new tools, Avnet maximizes the time developers

spend exploring new products and technologies.

SpeedWay Design Workshops™

Participate in hands-on training classes that feature technical presentations from factory-trained Field Application Engineers (FAEs), along

with hardware-based lab exercises that spotlight time-saving development tools. Workshop topics include general, embedded, DSP and

serial I/O design. Attendees are eligible for discounted pricing on the featured development tools.

Learn more about available Xilinx SpeedWay Design Workshops at www.em.avnet.com/xilinxspeedway

An Introduction to Xilinx» ® Embedded Development Kit (EDK) and the PowerPC® 440 — Part 1 & 2

An Introduction to Xilinx» ® EDK and MicroBlaze™ — Part 1 & 2

Creating FPGA-Based DSP Co-Processors»

On-Ramp Technical Sessions™

Engage in two-hour technical sessions, typically held over lunch at the customer’s facility, that meld tool demonstrations with instruction on

specific technologies or design topics. Numerous technical sessions featuring Spartan® and Virtex® FPGAs are now available.

Learn more about available Xilinx On-Ramp Technical Sessions at www.em.avnet.com/xilinxonramp

Xilinx» ® Spartan®-3A Configuration

Xilinx» ® Virtex®-5 FXT PowerPC® Processor

Embedded Processor Design Using the Project Navigator Design Flow»

Behind the Wheel Video DemosView short but detailed video demos of new development tools engineered by Avnet, which highlight the tools’ features and capabilities.

Learn more about Behind the Wheel demos at www.em.avnet.com/drc

Xilinx» ® Virtex®-5 FXT Evaluation Kit

Xilinx» ® Spartan®-3A Evaluation KitFEATURED KITS

NEW!

NEW!

1 . 8 0 0 . 3 3 2 . 8 6 3 8

www.em.avnet.com

Copyright© 2008, Avnet, Inc. All rights reserved. AVNET and the AV logo are registered trademarks of Avnet, Inc. All other brands are the property of their respective owners.

L E T T E R F R O M T H E P U B L I S H E R

Xilinx, Inc.2100 Logic DriveSan Jose, CA 95124-3400Phone: 408-559-7778FAX: 408-879-4780www.xilinx.com/xcell/

© 2008 Xilinx, Inc. All rights reserved. XILINX, the Xilinx Logo, and other designated brands includedherein are trademarks of Xilinx, Inc. All other trade-marks are the property of their respective owners.

The articles, information, and other materials includedin this issue are provided solely for the convenience ofour readers. Xilinx makes no warranties, express,implied, statutory, or otherwise, and accepts no liabilitywith respect to any such articles, information, or othermaterials or their use, and any use thereof is solely atthe risk of the user. Any person or entity using suchinformation in any way releases and waives any claim itmight have against Xilinx for any loss, damage, orexpense caused thereby. Mike Santarini

Publisher

PUBLISHER Mike [email protected]

EDITOR Jacqueline Damian

ART DIRECTOR Scott Blair

DESIGN/PRODUCTION Teie, Gelwicks & Associates1-800-493-5551

ADVERTISING SALES Dan [email protected]

INTERNATIONAL Melissa Zhang, Asia [email protected]

Christelle Moraga, Europe/Middle East/[email protected]

Yumi Homura, [email protected]

SUBSCRIPTIONS All Inquirieswww.xcellpublications.com

REPRINT ORDERS 1-800-493-5551

Xcell journal

www.xilinx.com/xcell/

Happy 20th Anniversary, Xcell Journal Readers!wenty years ago this quarter, an applications engineer who had just joined Xilinx fromAMD, with prior stints at Zilog and Fairchild, started a technical publication for Xilinx cus-tomers. He named it Xcell, “The Newsletter for Xilinx Programmable Gate Array Users.”

Two decades later, that Xilinx legend, Peter Alfke, says he fashioned Xcell Journal afterFairchild’s now defunct magazine Progress. “In those days, FPGAs were a really new and unconven-tional technology, and we wanted to tell designers how to best use them,” said Alfke. “We used toissue Data Books once a year—this was, of course, before the Internet. So we decided to make XcellJournal a quarterly applications update with a lot of technical detail, how-to content and innovativeideas, as well as silicon, tools and IP availability information.”

Peter and his daughter Karen, at the time a Berkeley student, created the debut issue of Xcell inthe fourth quarter of 1988. “She brought her Mac down to the office and she did the typesettingand layout for the first five issues,” said Alfke.

The lead story reported on the company’s new Data Book, which contained complete datasheets for the XC2000 and XC3000 device families and for a military-grade version of the XC2000.Issue No. 1 also featured an article on DOS (“All DOS Are Not Created Equal”) and several pieceson the XACT FPGA design tool, the ISE® of its day. One story described XACT as “a large,demanding program, using interactive graphics, requiring megabytes of RAM”—a behemoth thatpushed the IBM-PC, which at thetime could address only 640kbytes, “into uncharted water.”

These older issues are fun toread, and not just for the sake ofnostalgia. Alfke, still at Xilinx andstill involved in applications andtechnical documentation, pointsout that some of the contentremains fresh and somewhat usefultoday. So, instead of blabbingabout how Xcell has evolved overtime, I’d like to honor the guy whostarted an amazing legacy that wehope to continue for at least anoth-er 20 years. At right are Peter’s Picksof the best of the oldies but goodiesfrom issues 1 through 28.

Pack rats will have no problemlaying hands on them. For anyonewho didn’t save every issue thatcame your way, we’re in the processof placing the content online.You’ll soon be able to find backissues at the Xcell Journal Archivespage. In the meantime, if you wanta specific issue, e-mail me [email protected] with theheading “Xcell back issue request,”and I’ll send you an electronic copy.

Peter’s Picks

#11, 4Q93, page 31: “Reduce SPROM Standby Current to Zero” by grounding through LDC

#13, 2Q94, page 25: “Carry and Overflow: A Short Tutorial”

#17, 2Q95, page 30: “Manchester Decoder in Three CLBs”

#18, 3Q95, page 30: “Overshoot and Undershoot”

#18, 3Q95, page 36: “Hold is a Four-Letter Word”—hold-time, that is

#19, 4Q95, page 34: “User-Defined Schmitt Trigger” with two pins, two resistors

#21, 2Q96, page 35: “10-Digit Fully Synchronous BCD Counter @ 87 MHz”

#21, 2Q96, page 40: “A Look at Minimum Delays,” and why they are so elusive

#22, 3Q96, page 28: “Power, Package and Performance” and how to trade off among them

#24, 1Q97, page 20: “Trouble-Free Switching Between Clocks,” with no glitches

#24, 1Q97, page 21: “Demultiplexing 200-MHz Data Streams”

#27, 4Q97, page 27: “Reduce EMI with a Spread-Spectrum Clock”

#27, 4Q97, page 28: “The Dangers of Hot Plug-In”

#28, 1Q98, page 22: “PC-Board Design Considerations”

#28, 1Q98, page 28: “Self-Initiated Global Reset”

#28, 1Q98, page 29: “CMOS I/O Characteristics”

#28, 1Q98, page 33: “Low-Power XC4002XL Achieves 400-MHzPerformance” in a self-contained frequency counter

T

OR

Fastest FPGA Debug Available

+

=

Agilent Logic AnalyzersUp to 1.2 GHz timing, 667 MHz state, and 256 M

deep memory4 scope channels + 16 timing channels

Application software to increase visibility inside your FPGA

2

Agilent Mixed Signal Oscilloscopes

Agilent FPGA Dynamic Probe

Start saving time now. Download FREE application information. www.agilent.com/find/fpgatools

C O N T E N T S

Driver Assistance Revs Up on Xilinx FPGA Platforms

88Cover Story

Letter from the Publisher Happy 20th Anniversary, Xcell Journal Readers! ...4

Xpert Opinion Driver Assistance Systems Pose FPGA Opportunities...16

Xpectations FPGA Platforms: Silicon Was Just the Beginning...66

VIEWPOINTS

Xcellence in Automotive & ISMBuilding Automotive Driver Assistance System Algorithms with Xilinx FPGA Platforms...20

Security Video Analytics on Xilinx Spartan-3A DSP...28

Xcellence in New ApplicationsA/V Monitoring System Rides Virtex-5...34

XCELLENCE BY DESIGN APPLICATION FEATURES

XCELLENCE IN AUTOMOTIVE & ISM

2020

3434

1616

F O U R T H Q U A R T E R 2 0 0 8 , I S S U E 6 6

Xperts Corner Verifying Xilinx FPGAs theModern Way: SystemVerilog ...40

Xplanation: FPGA 101 Extend the PowerPC Instruction Set for Complex-Number Arithmetic...44

Xperiment Using a Xilinx FPGA to Beat Your Son at Guitar Hero...48

Ask FAE-X Digital Duct Tape with FPGA Editor...54

Profiles of Xcellence Engineer Turns Blow-upInto Hot Automotive Electronics Startup...62

XTRA READING

Xamples… A mix of new and popular application notes...58

Are You Xperienced?Embedded for success ...59

Xtra, Xtra The latest Xilinx tool updates and patches, as of Sept. 30, 2008...60

Tools of Xcellence A bit of newsabout our partners and their latest offerings...64

THE XILINX XPERIENCE FEATURES

4848

4040

5454

6262

8 Xcell Journal Fourth Quarter 2008

Driver Assistance Revs UpOn Xilinx FPGA Platforms

COVER STORY

maintain a safe distance from cars ahead.They can inform drivers about threats thatperhaps they would otherwise not see, andaid them in safely changing lanes.

Many of these systems started out justa few years ago as fairly simple technolo-gies. But OEMs have rapidly devisedmore-sophisticated spins and are now inthe process of integrating them into “sen-sor fusion technologies”—essentially, themerger of multiple DA systems, all oper-ating independently off the output of thesame, shared sensors. The goal is to sup-ply drivers with more-accurate informa-tion about their surroundings, to helpthem make informed driving decisionsand enrich the driving experience. Indoing so, the systems require sophisticat-ed compute technology—and a lot of it.That’s where FPGA platforms are startingto make a strong play.

Rapid Evolution of Parking with DA SystemsAround 10 years ago, OEMs launchedtheir first foray into DA with conveniencesystems such as one for use when backingup. A back-up aid system is essentially aseries of sensors in a car’s rear bumper thatsend an ultrasonic or radar signal to meas-ure the distance to an object behind thevehicle. As drivers back up, the sensors typ-ically trigger an audible beeping thatincreases in frequency as the vehicle nearsthe obstacle. The signal becomes a constanttone when the driver backs up to withinfour inches of the obstruction.

“It’s a fantastic feature if you are drivinga big vehicle such as a pickup and you wantto know how close you are to another vehi-cle,” said Barnden. “But it’s the most basicsystem—there really isn’t any intelligencebuilt into it. The driver is doing prettymuch all the work. The system tells you ifyou are nearing something, but you ulti-mately decide if you want to get a bit clos-er or want to stop.”

Back-up aid systems are the most com-monly deployed, highest-volume form ofDA that OEMs offer today. In 2007,Barnden said, carmakers put back-up aidsystems in 5 million vehicles in Europe and2 million in the United States, shipping atotal of 10 million units worldwide (rough-

by Mike SantariniPublisher, Xcell JournalXilinx, [email protected]

It’s widely known that the use of automo-tive safety systems—seatbelts, followed byfront-facing airbags, seatbelt pre-tension-ers, antilock brakes and side airbags—hasdramatically reduced injuries and loweredthe fatality rate in vehicular accidents overthe last 50 years. But now carmakers aregoing a step further, cranking up innova-tion in a relatively new class of systemcalled driver assistance. DA stands poisedto revolutionize the driving experience evenas it further improves safety. And FPGAplatforms (tools, IP, as well as silicon) areplaying a key role in making it happen.

“Driver assistance systems are systemscompanies are putting on vehicles to helpmake drivers better drivers,” said PaulZoratti, automotive system architect andDA specialist at Xilinx. “DA systems pro-vide drivers with information that eitherthey look for or that is ‘pushed’ to them, inthe form of a warning, to make drivingsafer and help drivers make informedchoices about driving in all conditions.”

What DA systems can do now—thankslargely to advances in electronics and theOEM innovations they have enabled—ispretty remarkable; what they may do in thefuture is amazing.

Colin Barnden, Semicast’s principalanalyst covering electronics in the auto-motive market, said that car manufactur-ers, their tier-one suppliers and academiahave been researching DA system formany decades. But only over the last 10years have electronic systems and designtechniques advanced to the point whereOEMs can readily and feasibly deploythem. “It’s amazing how far DA systemshave come in such a short amount oftime,” he said (see Barnden’s XcellentOpinion column in this issue).

There are many types of DA systemsthat OEMs, their suppliers and even after-market electronic control unit (ECU) mak-ers are offering today and designing fortomorrow (see Figure 1). DA systems canhelp drivers park their cars effectively and

Automotive driver aid systems are rapidly evolving, thanks to ingenious engineering and programmable platforms.

COVER STORY

Fourth Quarter 2008 Xcell Journal 9

ly one in six cars built in 2007 has the fea-ture). “I expect that number will grow in allregions because it’s useful, fairly simple andinexpensive,” said Barnden.

OEMs then took that technology a stepfurther, adding intelligence to create a DAsystem called park assist. A camera situatedsomewhere in or near the vehicle’s rearlicense plate connects to the navigation dis-play system in front of the driver’s seat.When drivers put the car in reverse, thenavigation system automatically activatesthe camera so they can literally see howclose they are to objects behind them.Some systems do some basic image recog-nition to put visual cues onto the screen sodrivers know when to turn the wheel orhow far they are from an obstruction.

Barnden noted that five years ago, OEMssuch as Mercedes and BMW started offeringpark assist systems on their highest-endmodels. Today, the technology is showingup in more-mainstream vehicles, especiallySUVs, where owners may find it difficult tolocate obstacles by glancing over their shoul-ders or looking in their rearview mirrors.Barnden said OEMs sold a total of 1 millionunits with park assist systems in 2007.

The next evolution is automated park-ing. Lexus was the first to spin this type ofDA system a few years ago, when the luxu-ry carmaker made an automated parking

system available on its LS-series vehicles. In an automated parking system, several

near-range sensors—some ultrasonic andsome cameras—are located all around thevehicle. As drivers cruise down the road,the system looks at gaps between parkedcars and reports whether the space is bigenough to get into. What’s more, the sys-tem goes even further. At the driver’s ulti-mate discretion and direction, it will takeover the task altogether and automaticallypark the vehicle.

“You push a button and the car willactually steer itself into the space,” saidBarnden. “It’s absolutely crazy. You wouldthink a car that could park itself would be20 or 25 years away, but it’s actually beingdone by OEMs today.”

A park assist system makes measure-ments in 360 degrees and instantly calcu-lates how far it is from other objects onevery side. “And the system knows themechanics of parking, because the design-ers have programmed the algorithms forparking into the system. So it knows whatdirections it can turn the wheel, whatdirection it can move the front relative tothe back of the vehicle and all those sorts ofparameters,” said Barnden.

OEMs in Japan are three to five yearsahead of the rest of the world in this tech-nology, but European carmakers are start-

ing to introduce park assist as well, saidBarnden. “It isn’t a feature that is too com-mon in the U.S. yet, but it definitely willbecome more common,” he said. From itsgenesis in high-end vehicles, park assist istrickling down to smaller Japanese andEuropean cars, and is proving extremelypopular in cities, where parking spaces aredifficult to find and tight to negotiate.

“What’s even more remarkable is thatpark assist systems are unbelievably reliableand incredibly precise,” said Barnden. Andit’s likely that as the technology becomesmore common, owners will grow to trustthat the system is able to consistently parkmore precisely than they can.

“It’s often the case that if you give acomputer just one task to do, it can typi-cally do that task better than we can,” saidBarnden. Nevertheless, one of the keyattributes of DA is that the driver has theultimate say in controlling the vehicle andcan override the automated system.

Rapid Evolution of Cruise Control Another segment of DA that has advancedrapidly over the last 10 years is cruise control.

OEMs have offered standard cruise con-trol for several decades, but those systemswere not intelligent. With traditional cruisecontrol, users push a button to set the speedof their automobile. However, if drivers are


COVER STORY

Performance Enhancement

Lane Departure Warning

Front Collision Warning

Back-up Aid

Pedestrian Detection

Drowsy Driver

Blind Spot Detection

Sign Recognition

Park Assist

Lane Change Assist

Night Vision

Adaptive Cruise Control

Automated Parking

Convenience Oriented

Safety Oriented

Automatic Braking

Lane Keeping

Temporary ControlAwareness Warning

Driver Assistance Features

Panic Brake Assist

Pre-Crash Sensing

Side Impact Detection

Pedestrian Protection

Performance Enhancement

Figure 1 – Automotive electronics companies are rapidly expanding the number, and advancing the maturity, of driver assistance (DA) systems available to consumers, thanks in part to FPGA platforms.


not paying attention, their car will maintainthat speed and possibly veer off the road orrun into something straight ahead.

Around eight years ago, Mercedes beganto offer an intelligent DA system calledadaptive cruise control in its S-class auto-mobiles. A forward-facing radar measuresthe distance to the vehicle ahead. Driverspush a button to determine whether theywant their automobile to maintain a head-way of two, three or more seconds behindthat car in front. The system will thenadjust the car’s accelerator to maintain theuser-specified gap.

Barnden said that recently introducedsecond-generation adaptive cruise controlsystems actually control braking as well. “Ifanother driver cuts in front of you tooclose, the system will apply a small amountof braking power to maintain that two- orthree-second gap, keeping a safe distancebehind that car,” he said.

Likewise, if traffic grinds to a halt, theadaptive cruise control system will slow orstop the vehicle, still maintaining that samedriver-specified distance from the car ahead.

While admittedly more sophisticatedthan traditional cruise control, by DA stan-dards adaptive cruise control is “one of themore basic advanced systems,” Barndensaid. Even more-advanced adaptive cruisecontrol systems will help drivers safely dealwith stop-and-go traffic, adjusting to therapid changes in speed while maintainingreasonable distances. “Some can do thatnow,” said Barnden.

In 2007, he said, OEMs sold 1 millioncars with adaptive cruise control systems.Barnden expects that number to growsubstantially, especially in Europe andJapan, as the technology makes its wayinto mass deployment and becomes evenmore sophisticated.

Indeed, OEMs and their suppliers arealready revving up third-generation adap-

tive cruise control that combines a camerawith the radar system to ensure that the caris actually matching the speed of the vehi-cle in its lane, and to anticipate threats suchas other drivers pulling into the driver’slane, perhaps too closely.

“Third-generation systems add moreintelligence to better estimate what’sahead and how much of a threat it is toyou,” said Barnden. “And based on that,what is the probability that you can sim-ply ignore it or sound a warning to takepreventative action to avoid a crash.”

Night Vision and Threat Assessment Another segment of DA systems aims togive drivers a clearer view of the road ahead.

A prime example is night vision.Typically, these systems employ a camerawith infrared illumination or thermal-imaging technology and use the navigationsystem display of the automobile to showheat-sensitive or IR images to drivers. Thesystems are very useful for spotting peopleand animals in the road that you might nototherwise see in time to avoid a collision.Due to the current cost of night visiontechnology, OEMs typically offer systemsonly on their highest-end vehicles, such asthe Mercedes S-class and BMW 7 series.However, as the technology evolves, theindustry expects night vision systems tomigrate to mainstream vehicles.

OEMs sold 800,000 units with nightvision technology worldwide in 2007, saidBarnden, who expects that number willgrow to 3 million units by 2015.

Evolution of Lane Changing with DA SystemsYet another segment of DA that companiesare rapidly innovating involves assistingdrivers in changing lanes.

OEMs have just started offering blind-spot detection systems that use radar sen-sors or, in some cases, cameras located on

COVER STORY

More-advanced adaptive cruise control systems will help drivers safely deal with stop-and-go traffic,

adjusting to the rapid changes in speed while maintaining reasonable distances.

the side and rear of the automobile to mon-itor whether another car is approaching adriver’s blind spot. If so, the system willdisplay a warning light to inform the driv-er that someone is there, perhaps reducingthe amount of time drivers spend lookingover their shoulders and allowing them tokeep their eyes on the road ahead.

Barnden is waiting to see if blind-spotdetection will take off as a market. Henotes that in 2007, the automotive indus-try sold 300,000 units. OEMs typicallyoffer the systems only in high-end vehicles,such as the Audi Q7 luxury SUV.

Barnden said that blind-spot detectionbecomes more compelling when OEMscombine it with other DA systems. Forexample, if a driver took his or her eyes offthe road to double check what was goingon in a blind spot, the adaptive cruise con-trol would sense a slowdown and adjust thespeed accordingly.

“That’s why driver’s assistance is so inter-esting,” said Barnden. “There are all thesethings coming together that can reallychange the driving experience and providesome useful and helpful advice to the driver.”

Barnden sees great growth potential inanother emerging DA technology calledlane departure warning—and great poten-tial for FPGAs to power these systems.

Lane departure warning systems typi-cally use a camera module mounted on therearview mirror to gather images of theroad ahead. “Typically there is a DSP or,increasingly, an FPGA that is doing high-speed math to identify the white markingson the road to help users see what lanethey are in relative to the road ahead,” saidBarnden. And should the driver swerveinto another lane accidentally, perhapsbecause he or she is sleepy or busy tuningthe radio, the system would sound anaudible warning, jiggle the driver’s seat ormake the steering wheel vibrate.

But the system will not send a warningif the driver first uses the turn indicatorsbefore changing lanes, Barnden notes. Soin essence, the technology not only warnsdrivers if they are accidentally swervinginto another lane, but also encouragesgood driving habits.

OEMs today offer lane departurewarning systems in their highest-end vehi-cles (Peugeot-Citron, BMW, Infineon,Cadillac and Buick), and in 2007, soldjust under 1 million units. “I see that asone of the highest-growth segments goingforward, growing to 17 million units overthe next decade,” said Barnden. “I thinkone in every four vehicles will have thetechnology by 2015.”

Like lane departure warning systems,another emerging DA technology, signrecognition, uses forward-facing camerasto read signs posted on the sides of theroad. “For example, the system could seea speed indicator and detect that you’vegone past the sign, and display on thedashboard that you are now in a 30-mphzone,” said Barnden. “Or it could displaythings you may have overlooked, such as‘no right on red.’”

Sign recognition systems are stilllargely in development and must over-come some big challenges before comingto market, Barnden said. For example, ifyou are driving past a series of signsgrouped very closely together, the systemmay have to figure out how to prioritizewhich sign to display to the driver. Oneroad sign could read “Speed Limit 35mph,” for instance, and the next one,“Flooding Ahead.” Clearly, the systemwould need to display the more ominousof those two signs.

“The technology could prove very use-ful,” said Barnden. “It’s an assistance sys-tem to inform the driver, but the liabilitystill remains with the driver.”

The Future: FPGA PlatformsDrive Sensor Fusion While all these segments of DA systems haveadvanced rapidly over the course of the last10 years, experts say the real innovation is justbeginning. Indeed, those designing DA sys-tems and other automotive electronics are inthe process of combining many of these func-tions, for example using one set of sensors toperform multiple jobs and then connectingthem with other ECUs in the automobile.The idea is to reduce system cost, along withthe cost and heft of the wire harness that linksthese multiple systems—thereby cutting fuelconsumption. Ultimately, that will make theautomobile more affordable and environ-mentally friendlier.

A key to fusing these advanced systems isthe use of FPGA platforms. Many first- andsecond-generation systems that use ultra-sonic, radar and camera-based sensor tech-nologies have relied on DSPs, or DSPs inconjunction with FPGAs, to quickly per-form the calculations needed to informdrivers of their surroundings. But as DAsystems become more complex—and espe-cially as developers seek to use a given set ofsensors to perform multiple DA systemtasks—experts say DSP-only systems can’tdo the job effectively. As a result, the role ofFPGAs will expand.

For example, Xilinx’s Zoratti said thatevolving a basic lane departure warning sys-tem to simultaneously evaluate the roadahead for sign recognition and oncomingheadlamps requires a significant amount ofcompute power as well as advanced algo-rithm development.

“Today, DSPs can perform the comput-ing for less advanced systems,” said Zoratti.“But as the systems become more advanced,and especially as companies start using sen-sors to do multiple functions and intercon-nect them with other systems, DSPs justdon’t have the compute horsepower. The


COVER STORY

As developers seek to use a given set of sensors to perform multiple DA tasks, DSP-only systems can’t do the job effectively.

As a result, the role of FPGAs will expand.

parallel-processing resources offered byFPGAs provide a cost-efficient, scalable andflexible solution” (see Figure 2).

According to analyst Barnden, sensorfusion is proceeding especially rapidlyaround the forward-facing camera. Thiscamera is typically located behind the auto-mobile’s rearview mirror and faces forward,peering out the front windshield. Today,OEMs are using it to perform lane depar-ture warning. But in future automobiles,they would like to be able to team thatcamera with one radar sensor to handleadaptive cruise control, sign recognitionand perhaps night vision, as well as lanedeparture warning and, maybe one day, alane-keeping feature.

To have one camera feeding images toseveral DA systems requires advanced com-puting. Having separate DSPs on a boardsomewhere in the system is cumbersome,demands multiple connections (making fora heavier wire harness) and introduces pos-sible latency and reliability issues. ButOEMs can use the parallel resources on oneFPGA platform to do the job of severalDSPs, to create a much more cost-effective,scalable and flexible FPGA-based fusionsensor system, Zoratti said.

“Today’s lane departure warning sys-tems can use VGA cameras with 640 x 480resolution,” he said. “Systems will need

twice that resolution soon, when we get toapplications such as sign recognition.”Zoratti said DSP devices can’t keep up withthat amount of data processing. “Whenyou talk about multiple features, each ofthose features may require a different pro-cessing algorithm. That’s where FPGAsreally offer a strong value proposition.”

An ASIC might be an option, he wenton, “but the problem is, in most cases we arein the infancy of the market, so you don’tknow what your algorithm is ultimatelygoing to be. FPGAs provide power, flexibil-ity and scalability…because now you cantake the strength of the FPGA and adjust itto whatever feature set a particular vehiclewants to have. It’s a platform design—onedesign that can be modified for multiplemodels of a vehicle and multiple class sets.Invest in one platform that you can scaleyourself—activate and deactivate.”

Tools and IP play a crucial role in help-ing designers to rapidly create innovationsin DA system development. Xilinx and itsmany partner companies offer advanced IPblocks (see Figure 3) for many sophisticat-ed FPGA-based DA applications.

On page 20 of this issue, Zoratti and co-authors Daniele Bagni, Xilinx, and RobertoMarzotto, Embedded Vision Systems,describe how engineers at the two compa-nies developed an FPGA-based platform

design for Embedded Vision Systems’ lanedeparture warning image-processing algo-rithms using System Generator for DSP.

TRW Conekt, a division of TRWAutomotive, and Ibeo Automobile SensorGmbH are two of the many companiesthat have evolved their technologies fromDSP or other IC-based DA systems toadvanced sensor fusion systems usingXilinx Automotive platform FPGAs.

TRW offers a camera-based lane depar-ture warning system that has just gone intoproduction, but the company is currentlydeveloping a system that fuses lane depar-ture warning with adaptive cruise control.The video camera in its current systemresides behind the rearview mirror andfaces forward. “We pick out the lane mark-ings, and if the driver gets too close tothem, we trigger the electric-powered steer-ing system to give the steering wheel asmall nudge,” said Martin Thompson,principal electronic engineer at Conekt.“It’s designed to feel like the sensation oftouching a curbstone at the side of theroad. The system helps drivers keep fromstraying from their lane—for example, ifthey are fatigued or lose concentration.”

At the heart of the TRW system is aConekt-programmed XA Spartan®-3E250FPGA, which handles low-level imageprocessing: edge detection and feature

COVER STORY

SDRAM or DDRFLASH

CAN PHY

Micro or DSPCoProcessor

and/or

OPBArbCTRL

LVDS

I2C Comms

Co-Processor I/F

Mem Controller

Digital uP or uC ProgrammableMixed Signal IP Block SWMemory

CAN Controller

Format Decoding

Pixel Correction

Noise Reduction Filter

Sharpening

Edge Detect (Sobel)

Line Tracker

Diagnostics

LDW App / Cam Ctrl

CMOSImager

Figure 2 – DA designers can integrate many advanced functions on a single Spartan-3E FPGA.


extraction. The product also uses an exter-nal microcontroller that Conekt may inte-grate into the FPGA for future versions,Thompson said.

Positioned behind the rearview mirror, thesystem picks out white lines and estimates thegeometry of the road ahead and the offset ofthe vehicle within the lane—how laterally thevehicle is positioned, its angle relative to thelane and how the lane is curving one way orthe other. “That gives us the information weneed for the lane departure system,” saidThompson. “But for the future, it can befused with the data from the adaptive cruisecontrol radar to select the vehicle that’s inyour lane rather than it having to infer thisfrom indirect measurements.”

In addition, Conekt engineers are devel-oping video obstacle detection. “Radar isvery good at measuring distance. It isn’t sogood at lateral-position measurements,because the radar return rarely comes offthe middle of the vehicle–potentially, wedon’t know which of the two corners itcame off,” Thompson said. “Video is theopposite. We can determine the width andangle to the vehicle and fuse this with theradar distance measurement.”

In addition, he said, “video allowsConekt to perform some classification: toidentify if an object in front of the car is apedestrian or a bicycle, for example.”Similarly, “a fused video-radar system coulduse dynamic classification of whether thevehicle is a car or a truck. The tracking sys-tem can make use of that assessment toimprove its behavior,” said Thompson.

Thompson pointed out some otheradvantages of fusing radar and camera sen-sors in one system. The radar sees throughrain, falling snow and fog, while the videowould deliver better information about theactual visual range of the driver (and pre-sumably, the driver ahead) to help estimatea safe driving speed given the currentweather and visibility conditions.

At some point down the road, he said,fusion products could start to benefit exist-ing collision-mitigation systems as well, sothat if an accident looked inevitable, thesystem could activate the seatbelt pre-ten-sioners, prime the airbag and engage thebrakes to start dissipating the collisionenergy earlier than is now possible.Beyond that capability comes collisionavoidance, where the car autonomously

takes some kind of action (possibly includ-ing steering as well as braking) to prevent acrash if it appears the driver isn’t going to.“That’s some time in the future, however,”Thompson said.

Ibeo Automobile Sensor GmbH isanother company using Xilinx FPGA plat-forms for its advanced DA systems.Director of sales Mario Brumm said the 10-year-old company has developed a laserscanner with accompanying software thatdetects the environment around the car—other vehicles, pedestrians, bicycles—andmeasures their position and speed. Ibeodesigned the system to provide adaptivecruise control for high-speed driving andtraffic jam assistance. And in critical situa-tions where, for example, a child darts outin front of you, the sensor can trigger brak-ing to help avoid an accident.

“We’ve developed hardware and the soft-ware, but I think the software will grow inimportance over the next few years,” saidBrumm. “We have some sensors in the mar-ket already for single applications, but ourintent from the beginning is to use one sen-sor for multiple applications. The FPGA is avery important component in our design.”


COVER STORY

Video Interface & Conversion

Format Decoding

Color Space Conversion(RGB, YUV, YCC, Bayer)

Frame Buffer Control(Memory Controller)

De-Interlacing(Motion Adaptive)

Image Restoration

Image Stabilization

Gamma / Color Correction

Bad Pixel Replacement (BPR)

Distortion Correction

Noise Reduction(Spatial Filtering)

Non-uniformity Correction(NUC)

Object Recognition

Hough Transform

Edge Detection(Sobel Kernel)

2-D Correlation(Pattern Matching)

(Spatial/Freq Convolution)

Graphics / Video Display

Image Enhancment

Sharpening

Spatial Contrast Enhancement(Local Area Processing-LAP)

Thresholding

Histogram Equalization

Temporal Filtering

Multi-target Tracking

FIR Filtering

Motion Detection

Object Tracking

Graphics Overlay (OpenGL)

Alpha Blending

Rate Conversion

Graphics Acceleration

Display Controller

Image Fusion

Picture-in-Picture

Image Scaling / Conversion

Translate / Rotate / Zoom

Figure 3 – Xilinx and its partners offer many types of IP for the DA market. Shown are some of the varieties of image-processing functional IP available for vision-based DA.


Prior to using an FPGA, Ibeo relied on ananalog chip, “and we could use it to measurefrom up to 80 meters,” Brumm said. “But thatisn’t good enough for adaptive cruise control,especially in Germany, where we tend to drivequite fast. Our customers told us that theyneeded it to measure 200 meters. It wasn’t pos-sible with an analog system. The idea was to doit with digital and the core measurement.”

Specifically, the analog system had a limit-ed V-range (area and width that the sensorcould detect). “With an analog system you alsohave more noise in the signal, and that meanswe could see objects up to 80 meters and nomore,” said Brumm. “With the new FPGAsystem, if you have a big car or truck you willsee up to 350 meters, which is absolutelyunique for laser scanner systems. This is onlypossible because of the digital measurement,and it can detect very low energy.”

Typically, said Brumm, these types of sys-tems debut in luxury cars, such as theMercedes S-class and BMW 7 series. “But ourmain goal is to bring down the cost so thatthese technologies can be widely deployed andavailable in all classes of cars,” he said.

Brumm foresees great possibilities formerging laser technology with video.“Digital camera technology is nice because itallows you to see what’s going on with yourown eyes, but it can have the same draw-backs too,” said Brumm. “For example, cam-era technology doesn’t work in the dark, sodesigners need to augment it with nightvision, which can be expensive. And digitalcamera processing typically requires a lotmore data processing than laser technology.”Lasers, for their part, are not “hindered bydarkness or even fog,” he said.

But cameras could eliminate some issueswith laser scanners. For example, laser scan-ners have trouble distinguishing pedestriansfrom trees. Pairing a laser with a camerawould allow the sensor to react appropriate-ly, Brumm said. If, for example, it seemedunavoidable that the car was going to hit atree, the system would send information toother sensors to protect the driver. And if itsensed the car were about to hit a pedestri-an, it could send information to other sen-sors to help protect the potential victim,perhaps activating an airbag under or on thehood of the automobile.

FPGAs are also playing a key role inaftermarket DA systems. PLX Devices, forexample, developed its first product—anaward-winning, user-customizable multi-functional gauge popular with car enthusi-asts—with a Xilinx FPGA platform. Thecompany then built Kiwi, a mainstreamconsumer product, which in a fun wayhelps drivers monitor their fuel efficiency.Xilinx devices are central in that design aswell (learn more about PLX Devices and itsCEO, Paul Lowchareonkul, in the Profileof Xcellence section in this issue).

The End Market and Liability RestrictionsWhile engineers have made leaps andbounds in developing ever more advancedDA systems over the last 10 years, justabout everyone in this market is aware thateach step of sensor fusion progress has to betempered and well thought out to considerthe real value to the driver as well as region-al liability constraints.

Indeed, the experts interviewed forthis story, including Barnden, Zoratti,Brumm and Thompson, all noted thatone of the reasons manufacturers inEurope and Japan are leading the way inDA development—and why consumersin those regions typically become theearly adopters—is largely because legalliability is a much bigger issue in theUnited States. As a result, auto manufac-turers are far more cautious in introduc-ing new features into the U.S. market.

Most of the DA systems discussed hereare merely assistance features that providethe driver with information; ultimately, it’sthe driver who is responsible for makingthe right decisions—and the driver whobears the liability. However, it’s not unfore-seeable that as advanced sensor fusion tech-nology progresses, many of these systemscould be tied directly to safety systems.

Some say the rapid evolution of DA sys-tems is even a crucial step toward achievingthe automotive industry’s holy grail of driv-ing: the autonomous vehicle—a day, per-haps in the not too distant future, whencars can drive themselves and in doing so,alleviate traffic congestion and lower fuelconsumption, while drastically reducingtraffic-related injuries.

COVER STORY

FPGA Powered –

That‘s it!

FPGA-Module:TQM hydraXC – smallest, most universal

Hardware Platform for

Reconfi gurable Computing

• Based on XILINX Spartan 3,

Virtex 4 and Virtex 5 technology

• Ethernet 10/100, USB 2.0, RTC

• SPI-, NAND-Flash, DDR2 / SDRAM

• Size 2.13 Inch x 1.73 Inch

(54 mm x 44 mm)

• Programmable VCC IOs

Embedded solution with

TQM hydraXC for

• Fast time to market

• Economical series production

• Highest fl exibility

• Hardware reduction

:: Starter kits available ::

Email: [email protected]

Infoline: (+49 ) 81 53 / 93 08 - 333

Starter kit

with module

by Colin BarndenPrincipal [email protected]

One of the mostimportant issues in theautomotive industry

over the last 10 years has been the rapidadoption of safety systems. Features such asairbags, antilock braking and tire pressuremonitoring have become increasingly com-mon on many new vehicles, as drivers havebecome aware of their benefits and, in somecases, lawmakers have made installationmandatory to increase road safety and reducefatalities. The emphasis on driver safetyshows no signs of abating, and over the nextdecade, drivers will become increasinglyfamiliar with a new generation of systemscommonly referred to as “driver assistance.”

Driver assistance systems differ subtlyfrom conventional safety systems. Theirprimary purpose is to detect conditionsthat could potentially lead to an accident,and to either warn the driver accordingly orto take preemptive action. A conventionalsafety system such as the airbag, by con-trast, serves as the “last line of defense” andis triggered only in the event of a crash.

Perhaps the best introduction to thefunctions and utility of driver assistancesystems is to look at everyday driving expe-riences and see how these devices can helpin certain challenging conditions.


Driver Assistance Systems Pose FPGA OpportunitiesDriver Assistance Systems Pose FPGA OpportunitiesInvisible intelligence is coming soon to a car near you.Invisible intelligence is coming soon to a car near you.

XPERT OPINION

Lane Departure WarningLet’s start with the commute home after anall-day meeting. Chances are you are not atyour most alert and your thoughts are prob-ably not all focused on the road ahead—somuch so that while changing the play list onyour iPod, you begin to drift out of yourlane and move dangerously close to theadjacent vehicle. Left unchecked, this couldbe one commute you never finish.

For assistance in such a situation, lanedeparture warning systems use sensorsmounted in the front of the car to “see” themarkings in the road ahead, combined withcomplex computers that do the high-speedmath to detect the car’s position on the high-way. Stay comfortably within the confines ofyour lane and all is well; stray too far to theleft or right without using the indicators andthe system will detect your lapse and auto-matically provide an audible warning.

While lane departure warning maysound somewhat extravagant, there actu-ally is a need. According to statistics fromthe National Highway Traffic SafetyAdministration, around 130,000 peopleare injured each year in the United Statesalone in accidents related to lane changing.No wonder, then, that automakers havealready taken note of the safety benefits.

As is often the case with new technology,a number of European carmakers are slight-ly ahead of the pack, offering lane departurewarning on vehicles such as the Audi Q7SUV and BMW 5-Series. In the UnitedStates, GM has already shipped Buicks andCadillacs with such systems, and in Japan,both Nissan and Toyota have cars in pro-duction with lane departure warning.

Blind Spots and Night VisionTake another scenario: Perhaps you’ve beenout on the highway for an hour with thestereo blasting and the cruise controlengaged. You go to overtake a truck, forgetto check to see if it is safe to pull out andmiss the compact in your mirror’s blindspot that was passing you at the same time.

This is an accident that wouldn’t hap-pen in a car with blind-spot monitoring, atechnology that uses camera modules oreven short-range radar to constantlyperuse the blind spots to the left and right

prior to the impact by firing the seatbelt pre-tensioners to move the driver and front-seatpassenger into an optimum safety position.

Brisk Market Foreseen Table 1 presents the worldwide market fordriver assistance systems in terms of systemshipments.

As can be seen, the intense interest indriver assistance systems is well justifiedwhen looking ahead to the forecast forfuture years. Shipments of lane departure

warning systems, for example, are set to riseto almost 11.5 million units in 2012, upfrom less than 1 million in 2007. Further,intelligent cruise control is expected toenjoy rapid adoption over the next fiveyears, as this feature makes the transitionfrom luxury cars to high-volume upper-and midrange models. Overall, Semicastestimates shipments of driver assistance sys-tems to pass 23 million in 2012, comparedwith about 3 million last year.

The growth in unit shipments means aconcomitant rise in semiconductor con-tent. Table 2 presents the worldwide mar-ket for semiconductors in driver assistancesystems in terms of revenue.

Semicast forecasts that semiconductorrevenue in driver assistance systems willgrow at a CAGR approaching 33 percentfrom 2007 to 2012, from $229 million to$926 million. By comparison, the totalautomotive semiconductor market, whichtotaled $20 billion in 2007, is forecast togrow at around 5.5 percent a year, to $27

of your vehicle. You pull out to pass andthe system will know there’s a vehicle inyour blind spot, and warn you of the dan-ger accordingly.

Another driver assistance technology isdesigned to help out in the dark. Nightvision assist is, as the name suggests, a sys-tem used at night to see objects ahead ofthe vehicle that are farther away than theillumination range of conventional frontheadlamps. Night vision assist systems usecameras with either IR illumination or

thermal imaging to project an enhancedimage of the road ahead onto the screen inthe center console.

Intelligent Cruise Control Perhaps the most obviously useful exampleof driver assistance is intelligent cruise con-trol. One of the greatest drawbacks of con-ventional cruise control is that this is a“set-and-forget” system and will, quite lit-erally, drive the car into a wall if the driveris distracted or not paying attention to theroad ahead. In contrast, intelligent cruisecontrol typically uses radar to provide areal-time measurement of the distance tothe object directly ahead of the vehicle,controlling the throttle and brakes toadjust the speed accordingly.

Taking safety another step further, themost advanced systems feature predictivecollision warning, which combines func-tions of intelligent cruise control with theairbag system. In the event that a crash isdeemed imminent, the system takes action


XPERT OPINION

Driver assistance systems differ subtly from conventional safety systems.

Their primary purpose is to detect conditionsthat could potentially lead to an accident, and to either warn the driver accordingly

or to take preemptive action.

billion by 2012. Clearly, driver assistancesystems will be one of the highest-growthareas by far for automotive semiconductorsover the next five years.

The highest revenue growth for semi-conductors in driver assistance systems isforecast for the optoelectronics category,where CCD and CMOS image sensors andmillimeter-wave radar modules are likely tobe the main drivers. The next-highest rev-enue growth is forecast for MCUs, MPUsand DSPs, where growth is mostly limitedto 32-bit devices for high-end control.

Opportunity for FPGAs Historically, whenever automotive OEMsneeded a highly customized logic productto meet the demands of a specific applica-tion, they would automatically turn to anASIC vendor to help them develop either agate array or standard-cell-based product tomeet their exact requirements. As ASICdevelopment costs have risen and designtimes have extended over the last five years,

automotive OEMs have been forced toconsider other solutions.

In some cases, semiconductor vendorshave been successful in developing off-the-shelf ASSPs that meet many of the genericneeds in systems for, say, entertainment ornavigation. In driver assistance systems,however, design changes are so frequentthat an ASIC is rarely a suitable choice.Moreover, OEMs typically each have suchcomplex and individual requirements thatit is often not possible for semiconductorcompanies to develop suitable off-the shelfASSPs that meet the cost, power or reliabil-ity goals of the application.

To meet their needs in terms of cost,flexibility and time-to-market, automotiveOEMs developing driver assistance systemsare increasingly looking to FPGAs. WhileFPGAs have been used in the developmentand prototype stages of many automotivesystems for the last 10 years, the design hastypically involved an ASIC conversion priorto full-scale production, to minimize cost in

high volume. However, as FPGA unit costscontinue to come down, the technologybecomes financially viable in designs up tomuch higher volume. Add in the unrivaledflexibility to make changes late into thedesign process and the excellent perform-ance of FPGAs when configured to do thehigh-speed computational analysis requiredby many driver assistance systems, and youhave a winning combination.

As the data in Table 2 shows, Semicastdoes not forecast an end of ASICs andASSPs in driver assistance systems anytimesoon. ASIC/ASSP technology will continueto appeal over the long term, especiallyonce the design requirements of the systemsbegin to stabilize.

However, Semicast forecasts substantialrevenue growth for FPGAs in driver assis-tance systems over the next five years, from$55 million in 2007 to $182 million in2012, a growth rate exceeding 25 percent.The FPGA market in automotive applica-tions is now accelerating at full throttle,with driver assistance the leader of the pack.

To the average consumer, driver assis-tance systems may sound like unnecessaryand expensive luxuries. Of course, muchthe same was said about airbags andantilock brakes when they were introduced,but their ability to reduce fatalities andinjuries and improve road safety did not gounnoticed by lawmakers for long.

The road forward for driver assistance sys-tems looks assured, with vehicle makersincreasingly using the technology to demon-strate their commitment to road safety. Whilethese systems will not save lives in quite thesame way that airbags and ABS do, they surecan help in a crisis. And as with all the besttechnology, you don’t even notice they arethere until you need them—at which point,you’re grateful you had them.

About the AuthorColin Barnden is principal analyst forSemicast’s Automotive Electronics &Entertainment Systems Service. He hasworked as a market analyst for 14 yearsand has researched and reported on theautomotive industry since 1999. He holdsa BS in electronic engineering from AstonUniversity, England.


2007 2008 2009 2010 2011 2012 CAGR

Land Departure Warning 0.9 2.5 4.5 6.9 9.9 11.5 65.9%

Blind Spot Monitoring 0.3 0.6 1.1 1.6 2.6 3.6 60.9%

Night Vision Assist 0.8 1.3 1.4 1.9 1.9 2.4 24.4%

Intelligent Cruise Control 1.1 1.7 2.6 3.6 4.9 6.1 41.8%

Total 3.1 6.1 9.6 14.0 19.3 23.6 49.9%

2007 2008 2009 2010 2011 2012 CAGR

MCU/MPU/DSP 54 94 136 182 232 260 36.8%

ASIC/ASSP/Other Logic 12 28 46 63 82 89 49.8%

FPGA 55 107 133 155 173 182 26.8%

Optoelectronics 93 144 185 240 284 318 28.0%

Other Semiconductors 15 28 39 54 67 77 38.2%

Total 229 400 540 693 838 926 32.2%

XPERT OPINION

Table 1 – Driver Assistance System Shipments (MU)

Table 2 – Worldwide Market for Semiconductors in Driver Assistance Systems ($M)

1 . 8 0 0 . 3 3 2 . 8 6 3 8www.em.avnet.com

Copyright© 2008, Avnet, Inc. All rights reserved. AVNET and the AV logo are registered trademarks of Avnet, Inc. All other brands are the property of their respective owners.Prices and kit configurations shown are subject to change.

Xilinx® Spartan®-3A Evaluation Kit

The Xilinx® Spartan®-3A Evaluation Kit provides an easy-to-use, low-cost platform for experimenting and prototyping applications based on the Xilinx Spartan-3A FPGA family. Designed as an entry-level kit, first-time FPGA designers will find the board’s functionality to be straightforward and practical, while advanced users will appreciate the board’s unique features.

Get Behind the Wheel of the Xilinx Spartan-3A Evaluation Kit and takea quick video tour to see the kit in action (Run time: 7 minutes).

Ordering InformationPart Number Hardware Resale

AES-SP3A-EVAL400-G Xilinx Spartan-3A Evaluation Kit $39.00* USD (*Limit 5 per customer)

Take the quick video tour or purchase this kit at: www.em.avnet.com/spartan3a-evl

Target ApplicationsGeneral FPGA prototyping»MicroBlaze» ™ systems

Configuration development»USB-powered controller »Cypress» ® PSoC® evaluation

Key FeaturesXilinx XC3S400A-4FTG256C »Spartan-3A FPGA

Four LEDs »Four CapSense switches »I» 2C temperature sensor

Two 6-pin expansion headers »20 x 2, 0.1-inch user I/O header »32 Mb Spansion» ® MirrorBit®

NOR GL Parallel Flash

128 Mb Spansion MirrorBit »SPI FL Serial Flash

USB-UART bridge »I» 2C port

SPI and BPI configuration »Xilinx JTAG interface »FPGA configuration via PSoC» ®

Kit IncludesXilinx Spartan-3A evaluation board »ISE» ® WebPACK™ 10.1 DVD

USB cable»Windows» ® programming application

Cypress MiniProg Programming Unit»Downloadable documentation »and reference designs

by Daniele Bagni DSP SpecialistXilinx, [email protected]

Roberto Marzotto Design EngineerEmbedded Vision Systems, [email protected]

Paul ZorattiAutomotive Senior System ArchitectXilinx, [email protected]

The emerging market for automotive driverassistance systems (see cover story) requireshigh-performance digital signal processingas well as low device costs appropriate for avolume application. Xilinx® FPGA devicesprovide a platform with which to meet thesetwo contrasting requirements. However,algorithm designers unfamiliar with FPGAimplementation methods may be apprehen-sive about the perceived complexity of themove from a PC-based algorithmic model toan FPGA-based hardware prototype. Theydon’t need to be.

A Xilinx tool, the System Generator forDSP, offers an efficient and straightforwardmethod for transitioning from a PC-basedmodel in Simulink® to a real-time FPGA-based hardware implementation. Thishigh-abstraction-level design tool played acentral role in a project conducted betweenengineers at Xilinx and Embedded VisionSystems. The goal of the project was toimplement an image-processing algorithmapplicable to an automotive lane departurewarning system in a Xilinx FPGA usingSystem Generator for DSP, with a focus onachieving overall high performance, lowcost and short development time.

Challenges in DA System DevelopmentAutomotive driver assistance (DA) systemengineers commonly use PC-based modelsto create the complex processing algo-rithms necessary for reliable performancein features such as adaptive cruise control,lane departure warning and pedestriandetection. Developers highly value the PC-based algorithm models, since such modelsallow them to experiment with, and quick-


Building Automotive Driver Assistance System Algorithms with Xilinx FPGA Platforms

Building Automotive Driver Assistance System Algorithms with Xilinx FPGA PlatformsSystem Generator for DSP is a high-abstraction-level design toolthat gives algorithm developers and system architects an efficientpath from a Simulink-based algorithmic reference model to anFPGA hardware implementation without any need for HDL coding.

System Generator for DSP is a high-abstraction-level design toolthat gives algorithm developers and system architects an efficientpath from a Simulink-based algorithmic reference model to anFPGA hardware implementation without any need for HDL coding.

XCEL LENCE IN AUTOMOTIVE & ISM

ly evaluate, different processing options.However, in the end, a properly designedelectronic hardware solution is necessary torealize economical high-volume produc-tion and deployment.

Verifying algorithm performance con-sistency between deployable target hard-ware and the software algorithm model canbe problematic for many developers.Moving from floating-point to fixed-pointcalculations (for example, employing dif-ferent methods for trigonometric func-tions) can sometimes cause significantvariation in the outputs between the refer-ence software algorithm and the hardwareimplementation model. Further complicat-ing the algorithm performance consistencyproblem for DA system developers is thefact that the input stimulus is very non-deterministic. For DA systems, whichgenerally rely on inputs from remotesensing devices (cameras, radar and soon), the inputs are the incredibly diverserange of roadway and environmental con-ditions that drivers may encounter.Engineers may find designing a process-ing algorithm to adequately address allsituations is extremely challenging, andverifying compliance between the soft-ware model and its electronic hardwareimplementation is critical.

For many applications involving imageprocessing, the parallel resources of XilinxSpartan®-3 FPGA devices deliver higherperformance per dollar than VLIW DSPplatforms (see our paper in Xcell Journalissue 63: http://www.xilinx.com/publications/xcellonline/xcell_63/xc_pdf/p16-19_63-block.pdf ). However, some system design-ers still wrongly assume that the only wayto program an FPGA is with a hardwaredescription language such as VHDL,which many engineers don’t know. But infact, this is no longer the case. Our designmethodology, applying the Simulink mod-eling tool and the Xilinx System Generatorfor DSP FPGA synthesis tool, provided notonly an easy, efficient way of implementingFPGA designs, but also a means of acceler-ating algorithm compliance testing withhardware-software co-simulation. Further,you don’t have to be familiar with an HDLto use System Generator.

and Simulink as algorithm and system-level design tools. In particular, Simulinkenables automotive algorithm engineers toquickly and easily develop a sophisticatedDSP algorithm, thanks to its very highabstraction level and the tool’s graphicalschematic entry.

Figure 1 shows the top-level block dia-gram of our LDW system model, designedin Simulink. The green block, labeled LaneDetection, contains an image-preprocess-ing subsystem, the various stages of whichwe show in Figure 2. The purpose of thelane detection function is to extract those

Lane Departure Warning Model DescriptionThe overall function of a lane departurewarning (LDW) system is to alert the driv-er when the vehicle inadvertently straysfrom its highway lane. A camera in front ofthe host vehicle captures images of the road-way to identify markings that constitute thelane boundaries. The system continuouslytracks those boundaries, as well as the hostvehicle’s position relative to them. Whenthe vehicle crosses the bounds of the lane,the system issues a warning.

The automotive industry and academ-ic world have widely adopted MATLAB®



Testbench for Lane Detection Demo in System Generator

percentage of pixels

to be saturated (HST )

0 .01

percentage of not

edge pixels (ETH )

0 .9

Output

RGB

Pts

Out Hough

RGB

Line

Ref I

Lane Tracking

Line

Enable

EOF

Repository

CountLane Detection

frame _in

hst _th

eth _th

Line

Enable

Input

ROI

RGB

EOF

Frame Rate

Display

33 .2854

Departure Warning

Repository

Count

Ref I

Pts

1

1

Enable2

Line1

Pattern Search

MatrixViewer

Transpose 1

uT

Transpose

uT

Peak Detection

Hough

Theta_list

Rho_list

Line

Enable

Hough Transform

HoughTransformBW

Hough

Theta

Rho

Gain

-K-

FPGA Processing

frame _in

hst_th

eth_th

frame _out Boolean

eth _ th3

hst_ th2

frame _ in1

Input Image Lane Marking Candidates

ROI Cropping

Gaussian NoiseReduction

HistogramStretching

MorphologicalOperators

Lane MarkingPattern Search

Adaptive Threshold+ Thinning

Magnitude and PhaseCalculation

H/V GradientsExtraction

Figure 1 – LDW Simulink high-level block diagram

Figure 2 – LDW preprocessing function chain

features of the roadway image most likelyto represent lane boundaries.

To improve the performance of theedge detection with respect to noise, thefirst stage of the pipeline is a 2-D 5x5Gaussian noise reduction (GNR). Thesecond stage is histogram stretching(HST), a technique developers use toenhance the contrast of the image,exploiting as much as possible the wholegray-level range. The third step, the hori-zontal/vertical gradient (HVG), enhancesthose pixels in which a significant changein local intensity is seen. Developers per-form HVG by computing the 2-D 5x5gradients of the image (via the 2-DEuclidean distance).

The edge-thinning (ETH) block deter-mines which points are edges by thresh-olding the gradient magnitude andapplying non-maximum suppression togenerate thin contours (one-pixel thick).The lane-marking pattern search (LMPS)acts as a filter, selecting a subset of edgepoints that display a particular configura-tion consistent with the lane markings,and removing spurious edge points thatarise due to shadows, other vehicles, trees,signs and so on. The last step of thepipeline is a 3x3 morphological filtering(MRP) the system uses for the final clean-ing of the lane-marking candidate’s map.

In the Simulink model, we have imple-mented the various stages of the image-preprocessing subsystem using a mixtureof Simulink blockset functions and MAT-LAB blocks. Because of its ability toprocess large amounts of data throughparallel hardware paths, an FPGA is well-suited for the implementation of the lanedetection function of our model.Therefore, we targeted this function as thestarting point for transitioning the LDWSimulink design to an FPGA.

With this partitioning, the FPGA per-forms the processing-intensive pixel-levelanalysis of each frame and reduces the datafrom a 10-bit gray-scale image to a simplebinary image for our downstream process-ing. For the entire system design, we aretargeting an XA Spartan-3A DSP 3400,but we could also fit the system on a small-er 3A DSP 1800 or a 3E 1600.

System Generator OverviewThe System Generator for DSP design toolworks within Simulink. It uses the XilinxDSP blockset for Simulink and will auto-matically invoke the Xilinx COREGenerator™ tool to generate highly opti-mized netlists for the DSP building blocks.You can access the Xilinx DSP blockset viathe Simulink Library browser, which youcan, in turn, launch from the standard MAT-LAB toolbar. More than 90 DSP buildingblocks are available for constructing a DSPsystem, along with FIR filters, FFTs, FECcores, embedded processing cores, memories,arithmetic, logical and bit-wise blocks. Everyblock is cycle- and bit-accurate and you canconfigure each of them for latency, area vs.speed performance optimization, number ofI/O ports, quantization and rounding.

Two blocks, called Gateway-In andGateway-Out, define the boundary of theFPGA system from the Simulink simula-tion model. The Gateway-In block con-verts the floating-point input to afixed-point number. Afterwards, the toolcorrectly manages all the bit growth infixed-point resolution, depending on themathematic operation you are implement-ing during the following functional stages.

Since Simulink is built on top ofMATLAB, System Generator allows theuse of the full MATLAB language forinput-signal generation and output analy-sis. You can use the From-Workspace andTo-Workspace blocks from the SimulinkSource and Sink libraries to read an inputsignal from a MATLAB variable (From-Workspace) or to store a partial result of asignal to a MATLAB variable (To-Workspace). Furthermore, you can set a lotof parameters of the System Generatorblocks via MATLAB variables, thus allowingyou to customize the design in sophisticatedways, just by updating a MATLAB scriptcontaining all such variables (you can assignMATLAB functions to the model and callthem back before opening it, or even beforestarting or after stopping the simulation).

Another important feature of SystemGenerator for DSP is the hardware-soft-ware co-simulation. You can synthesize aportion of the design into the target FPGAboard (hardware model), leaving the

remaining part as a software model in thehost PC. That allows you to make an incre-mental transition from software model tohardware implementation. The tool trans-parently creates and manages the commu-nication infrastructure via Ethernet andshared memories (between the host PC andthe target FPGA device). In such a way,when running a simulation, the part you’veimplemented in the hardware is really run-ning on the target silicon device, while thesoftware model emulates the rest in the hostPC. You can use the shared memories tostore, for example, the input image and thegenerated output image. The Ethernetcommunication provides enough band-width for pseudo-real-time processing. Youcan find more details in the user manual.

The flexible partitioning between soft-ware model and hardware processing,combined with the hardware-software co-simulation capabilities, provides you witha powerful verification tool to measurecompliance between the original software-only algorithm and the production-intenthardware implementation. You can useSimulink itself to compare the results ofthe software processed data to the hard-ware processed data. This functionality isespecially useful in driver assistance appli-cations, where the general system inputimages are nondeterministic.

Now, let’s examine in detail how tomodel an image-processing algorithm inSystem Generator for DSP using as anexample, for the sake of conciseness, theGNR, which is the first module of theimage-preprocessing pipeline.

System Generator Implementation of GNR FunctionRandom variations in intensity values—akanoise—often corrupt images. Such varia-tions have a Gaussian or normal distribu-tion and are very common among differentsensors—that is, CMOS cameras. Linear-smoothing filters are a good way to removeGaussian and, in many cases, other types ofnoise as well. To achieve such functionality,we can implement a linear finite impulseresponse (FIR) filter using the weightedsum of the pixels in successive windows.

Before starting the implementation of the




GNR System Generator block, we realizedits behavioral model in MATLAB. It takesonly a couple of code lines to implement.First, we have to calculate the kernel, specify-ing the mask size (5x5 in our case) and thesigma of the Gaussian. Then we can filter theinput image by convolution:

n_mask = fspecial('gaussian', 5, 0.8);

out_img = conv2(in_img, n_mask, 'same');

We can also use this behavioral modelto tune the coefficients of the mask by test-ing the filter on real video data. It can alsovalidate the hardware by verifying that theoutputs of the System Generator for DSPsubsystem are equal to the output of theMATLAB function, within a specifiedaccuracy, since MATLAB works in float-ing-point and System Generator in fixed-point arithmetic.

The 2-D GNR module processes theinput image in a streaming way (that is,line by line). Figure 3 shows the top-levelSystem Generator block diagram for theentire preprocessing chain as well as thetop-level diagram specific to the Gaussiannoise reduction function.

The data_in and data_out ports inFigure 3 receive the input stream of pixelsand return the filtered stream, respectively.We use the remaining ports for timing syn-chronization and processing controlbetween adjoining blocks.

We based the internal architecture ofthe GNR block on two main subsystems,as highlighted by the yellow and blueblocks in the figure. We will drill downinto each of these blocks to describe thedetails of the System Generator design.

We needed the first main subsystem, theline buffer (shown in yellow), to buffer fourlines of the input image stream in order tooutput the pixels aligned 5 x 5. For eachinput pixel I(u,v), the line buffer returns a5x1 vector composed by the current pixeland the four previous pixels on the samerow, that is [I(u,v-4); I(u,v-3); I(u,v-2);I(u,v-1); I(u,v)]. In Figure 4, we imple-mented the line buffer block via concate-nating two dual-line buffers, each using adual-port block RAM (a memory resourceresident on the FPGA device), a counter


valid _ out

4

hs _ out

3

vs_ out

2

edge _ out

1

delay _ phase

phase _in phase _out

MRP

data _in

vs _in

hs _in

valid _in

fip

clean _out

erode _out

dilate _out

vs _out

hs _out

valid _out

LMPS

edge

phase

vs _in

hs _in

valid _in

fip

lmc

vs _out

hs _out

valid _outHVG

data _in

vs _in

hs _in

valid _in

fip

hg_out

vg _out

magn

ArcTan

vs _out

hs _out

valid _out

HST

data _in

offset _in

gain _in

threshold

vs _in

hs _in

valid _in

fip

data _out

offset _out

gain _out

vs _out

hs _out

valid _out

GNR

data _in

vs _in

hs _in

valid _in

fip

data _out

vs _out

hs _out

valid _out

ETH

grad _h_in

grad _v _in

magn

cutoff _in

th _not _edges

vs _in

hs _in

valid _in

fip

edge _out

cutoff _out

vs _out

hs _out

valid _out

fip

7

valid _ in

6

hs _ in

5

vs_ in

4

eth _ th

3

hst _ th

2

data _ in

1

valid _out4

hs_out3

vs_out2

data _out1

rts_gen

fip rst

ctrl _sig_buffer

rst

vs _in

hs _in

valid _in

vs _out

hs _out

valid _out

Logical 1

andz-1

Logical

andz-1

Inverter

notz-1

Five Line Buffer(latency =6)

DataIn

ValidIn

reset

L1

L2

L3

L4

L5

Delay 6

z-6

Delay 5

z-1

Delay 4

z-1

Delay 3

z-1

Delay 2

z-1

Delay 1

z-1

5x5 FIR(latency =7)

Y _in1

Y _in2

Y _in3

Y _in4

Y _in5

sync _inp

filtered

fip5

valid _in4

hs_in3

vs_in2

data _in1

Latency 3

The latency of this block is 6

L5

5

L4

4

L3

3

L2

2

L1

1

Terminator

Dual Line Buffer 1

DataIn

ValidIn

reset

Out1

Out2

ValidOut

Dual Line Buffer

DataIn

ValidIn

reset

Out1

Out2

ValidOut

Delay 3

z-6

Delay 2

z-3

Delay 1

z-3

reset

3

ValidIn

2

DataIn

1

The latency of this block is 3

ValidOut3

Out22

Out11

Relational

a

b

a=bz-1

Logical

orz-1

Dual Port RAMlatency=2

addra

dina

wea

addrb

dinb

web

A

B

Delay4

z-1

Delay3

z-2

Delay2

z-2

Delay1

z-1

Delay

z-1

Counter

rst

enout

Constant2

637

Constant1

1

Constant

0

Concat1

hi

lo

Concat

hi

lo

reset3

ValidIn2

DataIn1 datain

cnt

Figure 3 – Top-level preprocessing and Gaussian noise reduction diagrams

Figure 4 – System Generator implementation of a four-line buffer

for the memory addressing, some simplebinary logic and registers to implementappropriate delays.

The blue shading and the Xilinx logoindicate the System Generator primitiveblocks, each of which corresponds to opti-mized synthesizable HDL code. Throughthis graphical interface, System Generatorallows the algorithm developer to easilyimplement a DSP algorithm in hardwarewithout having any knowledge of HDLcoding techniques.

We use the second main subsystem, the5x5 FIR kernel (shown in blue in Figure 3),to implement the convolution operation. Itreceives in input a block of 25 pixels fromthe line buffer block, multiplies them by the25 coefficients of the Gaussian mask andaccumulates the result into a register. Weshow the details of the implementation inFigure 5. To fit the performance of the tar-

get device and to achieve a specified samplerate of 9 million samples per second(MSPS), we chose to use three multiply-accumulators (which we implemented usingthe FPGA’s DSP48 programmable MultiplyAccumulator functional units) in parallel.We dedicated each of them, at most, to ninemultiplications, via time-division multiplex-ing (see the larger System Generator blocksin the top diagram of Figure 5). We split andstored the set of 25 coefficients of the maskin three ROMs that we implemented as dis-tributed RAM (another memory resourcethe FPGA provides).

In our implementation, we chose to fixthe coefficients of the mask at compile time.However, we could have easily updated thedesign to accept these values as real-timeinputs. In this way we could dynamicallychange the run-time intensity of the filter-ing based on environmental conditions.

(Implementation Note: Because of theisotropic shape of the Gaussian, the kernelcould be decomposed in two separablemasks and the filtering could be obtained astwo consecutive convolutions with 1x5 and5x1 masks along the horizontal and verticaldirections. Even though this approachwould reduce the FPGA resources required,we didn’t adopt it to avoid losing generalityand readability of the design.)

Another powerful feature found inSystem Generator is the ability to apply apreload MATLAB function to customizethe design before compilation. By settingparameters of the System Generator mod-ule (such as image resolution, FIR kernelvalues and the number of computationalprecision bits) and initializing the work-space signals we used during the run-timesimulation, we can quickly and easilyexperiment with different processing



filtered

1

synch counter and reset

sync

counter

reset _out

Register 4

d

en

qz-1

Register 2

d

en

qz-1

Register 1

d

en

qz-1

ROM C

addr z-2

ROM B

addr z-2

ROM A

addr z-2

Down Sample 2

↓↓↓↓9

z-1

Down Sample 1

↓↓↓↓9

z-1

Down Sample

↓↓↓↓9

z-1

Delay 3

z-3

Delay 2

z-3

Delay

z-1

DSP 48 Macro (latency = 4 ) C

Sel

PP

DSP 48 Macro (latency = 4 ) B

Sel

PP

DSP 48 Macro (latency = 4 ) A

Sel

P

Convert 1

castz-1

AddSub 3

a

b

a + bz-1

AddSub 1

a

b

a + bz-1

4 tapsdelay 4

In 1

Out 1

Out 2

Out 3

Out 4

4 tapsdelay 3

In 1

Out 1

Out 2

Out 3

Out 4

4 tapsdelay 2

In 1

Out 1

Out 2

Out 3

Out 4

4 tapsdelay 1

In 1

Out 1

Out 2

Out 3

Out 4

4 tapsdelay 0

In 1

Out 1

Out 2

Out 3

Out 4

3 mux 9 (latency = 2 )

sel

A0

A1

A2

A3

A4

A5

A6

A7

A8

B0

B1

B2

B3

B4

B5

B6

B7

B8

C0

C1

C2

C3

C4

C5

C6

Out A

Out B

Out C

sync _ inp

6

Y _ in 5

5

Y _ in 4

4

Y _ in 3

3

Y _ in 2

2

Y _ in 1

1

coeff _from _rom

mux _out

mux _out

coeff _from _rom

coeff _from _rom

mux _out

Out C3

Out B2

Out A1

mux 9 Clatency =3

sel

in 0

in 1

in 2

in 3

in 4

in 5

in 6

in 7

in 8

out

mux 9 B(latency =2)

sel

in 0

in 1

in 2

in 3

in 4

in 5

in 6

in 7

in 8

out

mux 9 A(latency =2)

sel

in 0

in 1

in 2

in 3

in 4

in 5

in 6

in 7

in 8

out

C626C525C424C323C222C121C020

B819B718B617B516B415B314B213B112B011

A810A79

A68

A57

A46

A35

A24

A13

A02

sel1

out1

Slice 1

[a:b]

Slice

[a:b]

Mux6

sel

d0

d1

d2

d3

z-1

Mux1

sel

d0

d1

d2

d3

z-1

Mux

sel

d0

d1

d2

d3

z-1

Delay 1

z-1

Delay

z-1

in 810

in 79

in 68

in 57

in 46

in 35

in 24

in 13

in 02

sel1

Figure 5 – System Generator implementation of FIR (convolution) filter


methods and sets of input data.Furthermore, after we completed the sim-ulation, a Stop MATLAB function iscalled to display the results by computingsome useful information and comparingthe fixed-point results against the floating-point reference ones. This methodologyallows algorithm developers to closely ana-lyze any portion of the hardware imple-mentation and compare it to the originalsoftware model to verify compliance.

System Generator FPGA Synthesis ResultsThose developing driver assistance systemsmust implement their designs at a cost levelappropriate for high-volume production.The die resources needed to achieve a cer-tain level of processing performance willdefine the size of the FPGA device theyrequire and, therefore, its cost.

In our lane departure warning pre-processor implementation, we target the XASpartan-3A DSP 3400, currently the largestdevice available in the Xilinx automotiveproduct line. We took this approach to sup-port future planned development activitieswith this model, but analysis of theresources consumed for the preprocessingfunction clearly shows that this designwould fit into a much smaller device.

The following table reports the resourcesoccupied by the GNR block on the XASpartan-3A DSP 3400 device. The estima-tion assumes gray-level input images at VGAresolution at a 30-Hz frame rate (whichimplies an input data rate of 9.2 MSPS).

From the perspective of timing perform-ance, the GNR design runs at 168.32-MHzclock frequency and accepts an input datarate of up to 18.72 MSPS.

Total resources needed for the entirelane detection preprocessing subsystem aresummarized below:

The corresponding timing-performanceanalysis showed a clock frequency of128.24 MHz, with a maximum input datarate of 14.2 MSPS.

Given these resource requirements, weestimated the preprocessing function wouldeven fit into an XA Spartan-3E 500—roughly one-seventh the density of the XASpartan-3A 3400A device.

Results and Future WorkFigure 6 provides a sample of the perform-ance of our LDW system, including anFPGA-based image-preprocessing func-tion for lane-marking candidate extrac-tion. You can see the input frame in thetwo images on the right. The pair ofimages on the left illustrate the perform-ance of the preprocessing function weimplemented in the FPGA. The picture atthe top left represents the magnitude ofthe edge detection function after thresh-olding. The one at the lower left is takenafter the edge-thinning and lane-markingpattern search processes. Clearly, ourLDW preprocessor is very effective at tak-ing a roadway scene and reducing the datato only the primary lane-marking candi-dates. The yellow and red lines, respective-ly, in the top and lower-right imagesrepresent instantaneous and tracked esti-

mations of the lane boundaries based on asimple, straight-line roadway model.

In order to accurately predict the trajecto-ry of the vehicle with respect to the laneboundaries, our future LDW system will usea curvature model. We are currently adopt-ing a parabolic lane model in the objectspace, assuming the width of the lane is local-ly constant and is located on a flat groundplane. We can describe a parabolic lanemodel by creating four parameters incorpo-rating the position, the angle and the curva-ture. Using a robust fitting technique, it ispossible to estimate the four parameters foreach frame of the video sequence.

Noise, changing light, camera jitter,missing lane markings and tar strips couldweaken the model extraction. To compen-sate for this information gap and makethis phase more robust and reliable, thesystem needs a tracking stage. Trackingcan be done using the Kalman filter in thespace of the parameters of the lane model.

The extraction and the tracking of thelane model are the next two stages we willimplement in the FPGA soon. For this jobwe plan to use the AccelDSP™ synthesistool, another Xilinx high-level DSP designtool. Because it supports a linear algebralibrary, we can use AccelDSP to imple-ment a four- or six-state Kalman filter.


DSP48s 3 out of 126 2%BRAMs 3 out of 126 2%Slices 525 out of 23872 1%

DSP48s 12 out of 126 9%BRAMs 16 out of 126 12%Slices 2594 out of 23872 10%

Figure 6 – LDW processing model outputs

AccelDSP can also generate gates direct-ly from MATLAB code and we can use itsynergistically with System Generator forDSP. Furthermore, the AccelDSP synthesistool is well suited for feasibility analysis andfast prototyping. It automatically quantizesthe original floating-point MATLAB intofixed-point, and it maps the various MAT-LAB instructions into the FPGA resources.This is probably the easiest-to-use DSP toolthat Xilinx provides to driver assistancealgorithm designers and system architects.

In short, algorithm designers and sys-tem architects working on driver assis-tance technology can nowadays rely on ahighly sophisticated DSP design tool tobuild their reference algorithmic models,and then easily implement those modelsinto Xilinx FPGA low-cost devices. The

result is high quality, high performance andlow cost simultaneously.

One crucial feature of System Generatorfor DSP is the capability to implement aportion of the design into the silicon targetdevice (of a specific board connected viaEthernet) while the remaining part runs onthe host PC. Such a hardware-software co-simulation allows easy verification of thehardware behavior while also acceleratingsimulation speed.

As you can see, we used the XilinxSystem Generator for DSP to create animage-preprocessing pipeline for anLDW system. While in this discussion weonly revealed some of the details of one ofits modules, namely the GNR 2-D FIRfilter, the entire lane detection prepro-cessing function (shown in Figure 2) took

only 12 DSP48, 16 BRAM and 2,594slices of an XA Spartan-3A DSP 3400device, running at 128.24 MHz with aninput data rate of 14.2 MSPS, 50 percenthigher than what is needed by VGAimage resolution. The whole algorithmdesign and FPGA implementation requireda few weeks of work and did not necessitatethe writing of any VHDL code.

We look forward to continuing the projectby implementing the extraction and trackingof the lane models in the AccelDSP designtool and then integrating such stages withinthe System Generator for DSP model. Forfurther detail you can contact any of us by e-mail. (The authors are grateful to professorVittorio Murino of the computer sciencedepartment at Verona University for hissupport and contributions.)



More gates, more speed, more versatility, and ofcourse, less cost — it’s what you expect from The Dini Group. This new

board features 16 Xilinx Virtex-5 LX 330s (-1 or -2 speed grades). With over 32 Million ASICgates (not counting memories or multipliers) the DN9000K10 is the biggest, fastest, ASICprototyping platform in production.

User-friendly features include:

• 9 clock networks, balanced and distributed to all FPGAs

• 6 DDR2 SODIMM modules with options for FLASH, SSRAM, QDR SSRAM, Mictor(s),DDR3, RLDRAM, and other memories

• USB and multiple RS 232 ports for user interface

• 1500 I/O pins for the most demanding expansion requirements

Software for board operation includes reference designs to get you up and running quickly. Theboard is available “off-the-shelf ” with lead times of 2-3 weeks. For more gates and more speed,call The Dini Group and get your product to market faster.

www.dinigroup.com • 1010 Pearl Street, Suite 6 • La Jolla, CA 92037 • (858) 454-3419 • e-mail: [email protected]

by Csaba Rekeczkyco-CTO and Vice PresidentEutecus, [email protected]

Joe MallettSenior Product Line ManagerXilinx, [email protected]

Akos Zarandyco-CTO and Vice PresidentEutecus, [email protected]

The processing bandwidth requirements fora wide range of security analytics applicationsare forcing companies to reconsider theirapproach to system hardware. A single videoand imaging DSP processor is insufficient forperforming some of the computationallyintensive analytics operations at acceptabledata rates. Also, no reliable and robust solu-tion has been demonstrated that handleshigh-definition (HD) resolution at full videoframe rates. This has forced systems engi-neers to consider either a multichip or analternative single-chip system. Both solutionshave advantages and disadvantages.

A multichip system comprised of mul-tiple DSPs generally offers designers amore familiar design flow, but has addedPCB costs, takes up board/system spaceand can create system performance issues.A single-chip solution, on the other hand,would seemingly have cost, footprint andpower advantages—but it could potential-ly present designers with a steeper learningcurve, adding complexity and engineeringcost to the design project and potentiallydelaying the product release.

That was the dilemma we faced here atEutecus, Inc., a video analytics companybased in Berkeley, Calif., during the systemspecification phase of our next-generationanalytics product, the Multi-core VideoAnalytics Engine (MVE™).

We had implemented our first-genera-tion product on Texas Instruments’DaVinci Digital Media System-on-Chipplatform. But for our second generation,

we needed a bit more processing powerand system integration. We quickly decid-ed that a multidevice, DSP solution wasn’tcost- or system-effective. We needed a sin-gle-chip solution that would allow us toeasily port the IP developed in our earlierproduct and add more to it for the MVE.

With a bit of research, we found the Xilinx® Spartan®-3A DSP 3400A.The device provided 126 dedicatedXtremeDSP™ DSP48A slices, had morethan enough performance to accommo-date our system requirements and camein at an attractive price.

Further, our migration fears werequickly laid to rest when we realized thatthe Spartan-3A DSP was supported by theXilinx Embedded Development Kit. TheEDK allowed us to implement a dual-processor hardware architecture, based onthe Xilinx MicroBlaze™ embeddedprocessor, similar to the dual-processor


Security Video Analytics on Xilinx Spartan-3A DSPSecurity Video Analytics on Xilinx Spartan-3A DSPThe processing requirements of video analytics take advantage of Xilinx FPGA parallelism, embedded and DSP processing. The processing requirements of video analytics take advantage of Xilinx FPGA parallelism, embedded and DSP processing.


hardware architecture we had been usingon Texas Instruments’ DaVinci platform.

With our device selected, we set out tocreate a single-chip analytics design by port-ing our existing DaVinci code base to theXilinx dual-processor embedded system.We then created just the right set of acceler-ator blocks in the FPGA fabric to meet ourexact performance requirements, whichincluded processing high-definition videoat full frame rates. The result was the MVE,which is sold into the aerospace/defense,machine vision and surveillance markets.

Video Analytics Product OverviewThe Multi-core Video Analytics Enginerelies on our InstantVision Embedded™software and a specialized Cellular Multi-core Video Analytics (C-MVA™) coproces-sor equipped with many advanced featuresand capabilities.

The latest version of the MVE/C-MVA iscapable of handling HD resolution at videoframe rates. It consumes less than 1 watt andexecutes multiple event-detection and classi-fication algorithms fully in parallel. Figure 1shows the output of a video analytics traffic-monitoring example, which classifies differ-ent types of vehicles, flow direction, lanechanges and lane violations—all concurrent-ly and marked by different colors.

We designed the C-MVA coprocessor insuch a way that we can significantly expandthe complexity of its operations to supportthe analytics functions in the dense-objectspace, which is particularly challengingbecause it requires analysis of overlappingand incomplete objects/events. Application-specific DSPs offer extremely poor supportfor this type of feature as well as for process-ing scaling. Both are much more flexiblewithin FPGAs.

The 126 XtremeDSP DSP48A sliceswithin the Xilinx Spartan-3A DSP 3400AFPGA are capable of 30 GMACs of DSPperformance, so the device was well suitedto the demanding cost and performancerequirements of video analytics. TheXilinx FPGA also allowed us to add futurevideo analytics functions and the associat-ed event-detection examples, based on ourcustomers’ needs. We’ve summarizedthem in Table 1.

allowed our engineers to reuse our dual-MicroBlaze embedded system to create adifferent FPGA programming file,resultingin an extremely scalable solution that wecan easily tailor to a wide variety of analyt-ics applications.

Migrating from DaVinci to Xilinx FPGAOur previous-generation video analyticsproducts were based on the TI DaVinciDigital Media System-on-Chip TMS320-DM6446, which included both theARM9x processor and the C64x+ DSPcoprocessor. Our design used the ARM9xfor communications and control and theC64x+ for the DSP processing for the ana-lytics algorithms. However, that combinedsystem could not address the processingrequirements our second-generation

Further, the Xilinx FPGA and ISE®

Design Suite tools gave our analytics designteams more flexibility in customizing solu-tions for end customers. We can tailor thevideo analytics engines and system-on-chip(SoC) solutions quickly, by rapidly proto-typing for both standard- and high-defini-tion video processing. This allows us toefficiently use the available resources in theSpartan-3A DSP 3400A or the lower-costSpartan-3A DSP 1800A FPGA devicebased on the customer’s needs.

An FPGA solution has the added bene-fit of allowing us to create a variety ofderivative end products that use the samehardware platform. Since we have designedmultiple analytics accelerator engines usingVHDL, we can integrate specific cores intothe C-MVA coprocessor. This approach



Resolution (pixnum)

2073kHD 1080p MVE/C-MVATMV3

Multi-core Video AnalyticsEngine (MVATM)

MVE/C-MVATMV2

MVE/C-MVATMV1

HD 720p

QV GA

921k

77k

Power(W) Speed (Frames/sec)0.01 0.1 1 30 3000 3000

Event Detection Examples Enabled by C-MVA Ver. 1 Ver. 2 Ver. 3

Detect and count all moving persons, vehicles and other objects

Detect, count and differentiate people, vehicles and other objects

Detect objects moving in line with the specified motion flow/counterflow

Detect and classify lost or stolen objects, entering-loitering in forbidden area, lane violation

Detect and classify gestures, characteristic motion paths

Detect and classify related behaviors within the crowd

Table 1 – Supported video analytics functions for typical event detection applications

Figure 1 – Road map of the Multi-core Video Analytics Engine (MVE) and example application

product would need. Thus, we turned tothe Spartan-3A DSP FPGA family.

We simplified the task of design migra-tion by creating a Xilinx embedded systemthat included two MicroBlaze v7 soft-coreprocessors running independently. Thisarchitecture allowed us to port the ARMand DSP processor code separately, whichgreatly simplified design migration. Figure2 shows a block diagram of the Eutecushardware system and the MVE-based refer-ence SoC design.

Our MVE engine consists of theInstantVision Embedded software runningon the MicroBlaze (MB0), system controland communications on the MicroBlaze(MB1) and the C-MVA coprocessor, whichwe designed as a modular chain of hard-ware accelerator IP cores running in theFPGA fabric.

Migrating our ARM and DSP codeproved to be straightforward using theXilinx ISE Design Suite and theMicroBlaze soft cores. One of the distinctadvantages of our InstantVision cross-plat-form environment is that it was written inhigh-level, standard C/C++ language andrequired little modification.

Once we ported the code, we validatedthat it had the correct functional behaviorand identified any performance bottle-necks. Accelerating the C/C++ code thatwe initially developed for the TI processorsproved to be the critical challenge, as weused several of the DaVinci C64x+coprocessor accelerator blocks during theassembly-level optimization for this plat-form. We followed a series of steps in thistransition, starting with initially replacingthese blocks with high-level C functions.Eventually, we replaced the majority ofthese functions with equivalent acceleratorblocks running on the FPGA fabric.

From a functional point of view, oursolution has three layers that comprise theMVE, which receives the standard-/high-definition video flow as input data andthen generates the event-detection meta-data. This resultant metadata provides theobject/event-tracking and classificationresults, along with several image flows fordebugging purposes as the output of theanalysis. We implemented functional

blocks as either embedded software run-ning on a MicroBlaze processor or spe-cialized IP cores. We placed thesespecialized hardware accelerators into theFPGA fabric, and the complete chain ofthese accelerators comprises the C-MVAanalytics coprocessor.

As shown in Figure 3, the three algo-rithmic layers of the MVE video analyticsengine consist of several main functionalblocks, most of which we can significantly

accelerate by using specialized IP cores thatrely on dynamic configuration of theresources available in the FPGA. Wedesigned the C-MVA coprocessor based onthese IP cores, so as to accelerate the pro-cessing front end and midlayer (see Figure4) of the entire analytics algorithm. Thismodular approach, supported by Xilinx’sISE Design Suite, allowed us to scale thesystem in terms of both performance andpower consumption.



MB1

CommunicationuP

C-MVABack-End

C-MVAFront-EndMid-Layer

Display

Ext uP/DSP

ITNetwork

NTSC, ITU BT.656, GPIO(Video In-Out)

MB1

MVETM

ETHERNETUART

DDR2MPMC

OPB

PLB

XOL XOL PLB

SOMA SOMA

SOMA

XOL

FSL

FSL

FSL

XOL

CLOCK&

RESETJTAG

DEBUG

OPB2PLBBRIDGE

MAILBOX

C-MVA CHAIN

C-VIO DVI OUT

IP Cores(HW Accelerators)

IP Cores(HW Accelerators)

Multi-Object Tracking

Spare-Time SignatureExtraction

Event/Object Classification

Event Based Control

Contextual FeatureExtraction

Event/Object FocusedWindowing

Event/Object FocusedEnhancement

Event/Object FocusedFeature Extraction

Scaling(Filtering/Interpolation)

Noise Filtering andConditioning

Foreground-BackgroundSeparation

Morphological Shaping

Video In

(HW Accelerators Only) (Application DependentMapping)

(Application DependentMapping)

C-MVATM

MVETM

Front-End

InstantVisionTM

EmbeddedMicroBlaze

(Embedded SW)

Mid-Layer Back-End

Video/Event Out

Figure 3 – Block diagram of the video analytics algorithm organization

Figure 2 – Dual-MicroBlaze™ System-on-Chip (SoC) architecture MVE Engine coprocessor block diagram


Turbo-charging using FPGA Accelerator BlocksTo truly realize the full potential of anFPGA-based video analytics system, weneeded to design and integrate the videoaccelerator engines into the embedded basesystem. We anticipated several of the per-formance bottlenecks, so our design teamhad begun early development of a set ofaccelerators using VHDL. The code profil-er, included as part of the Xilinx ISE DesignSuite and the Embedded Development Kit,proved instrumental in helping us identifyfurther performance bottlenecks and devel-op all the accelerator blocks we required forthis design. Table 2 provides a comprehen-sive list of IP core families.

Our development team, like those atmany other companies, consisted of sepa-rate hardware and software developers. Itwas critical to the success of this project tomaintain developer productivity by pre-serving sufficient abstraction betweenthese two design domains. We streamlinedthis task using a feature in Xilinx PlatformStudio, Create IP Wizard, which generatesRTL templates and software driver files forhardware accelerator blocks. These tem-plates include the interface logic the designrequired to access registers, DMA logicand FIFOs from the embedded system.Once we used the template to create theRTL, we placed the RTL into the embed-

ded IP catalog, where a developer can fur-ther modify it as needed.

Our IP core development procedureincludes a generic, modular peripheryblock development flow for the PLB46-MPMC-OPB-based backbone. Theseperipheries consist of both single- andmulti-I/O prototypes (SIMO, MIMO,MISO models), allowing us to flexibly cre-ate a multithread coprocessor pipeline fordemanding image flow processing algo-rithms. We achieved this by combining theIP cores in almost arbitrary order and con-figuring them during the design and cus-tomization of various analytics engines.

The MVE analytics engine consists ofthe InstantVision Embedded softwaremodules and the hardware accelerators thatmake up the C-MVA analytics coprocessor.We prototyped the MVE in a XilinxSpartan-3A-DSP 3400A FPGA and creat-ed our SoC reference design. It includes allthe required I/O functions for communica-tion and data streaming (see Figure 2 forthe complete hardware-firmware block dia-gram). This complete SoC referencedesign, encompassing not only the MVEanalytics engine but also all the supportingI/O modules, uses 91 percent of the logicslices, 81 percent of the block RAMs and32 percent of the DSP slices.

Separating out the MVE analyticsengine (excluding the MPMC-PLB part ofthe backbone and specialized I/O compo-nents) uses only 46 percent of the logicslices, 44 percent of the block RAMs and23 percent of the DSP slices, thus makinga migration path to the lower-cost Spartan-3A-DSP 1800A FPGA device feasible.

We designed all the IP cores of the C-MVA coprocessor to complete their asso-ciated processing within a single clockcycle. This feature, combined with theasynchronous FSL interfaces, in turnallows the system integrator to drive theC-MVA coprocessor with a different clockdomain from the rest of the system.Doing so allows the C-MVA to run at thelower pixel clock frequency while drivingthe backbone at a higher-frequency inter-nal system clock, greatly reducing powerconsumption while maintaining the sys-tem’s performance requirements.


C-MVA IP Core Family Ver. 1 Ver. 2 Ver. 3 Function

IPC-WSC Image flow, up/down scaling and windowing

IPC-CNF Image flow conditioning and noise filtering,including gain control and contrast modification

IPC-FBS Foreground-background separation

IPC-BMF Binary morphological filtering, with size classification and contour-structure shaping

IPC-SFE Multi-event/object signature and/or feature extraction

IPC-EFE Event/object-focused enhancement

IPC-EBC Application-specific event/object-based control

InstantVision™ Algorithmic framework and specific modules forEmbedded video flow analytics

Display and VideoOut Data

Streaming/Conversion

Communcationand Media Data

Streaming

Video In DataStreaming/Conversion

DVI, GPIO UART, EthernetITUBT.656,ITUBT.1120

MVE - SoC

MVE

MVE

InstantVisionEmbedded

Running on MBC-MVA IP

Cores

.NET APP(VA Configuration)

WIN32 APP1(User Def Cfg)

.NET APP2(User Def Cfg)

"Client"

"Server"

InstantVisionEmbedded

Running on MBC-MVA IP

Cores

MVE Drivers (Serial/Ethernet)

MVE Application(InstantVisionTM Embedded

.NET API

WIN32 API

Drivers (Serial/Ethernet)

Figure 4 – MVE analytics engine, InstantVision and driver software

Table 2 – IP core families developed as special hardware accelerator blocks for three generations of MVE / C-MVA

Customization, Packaging and System IntegrationTo prove out and further develop the system,we created a security/surveillance demonstra-tion along with all software layers, whichallows users to rapidly integrate our productin their systems at various layers (see sidebar).The high-level block diagram of the com-plete SoC design, which encompasses hard-ware IP cores, firmware and software in asingle reference design, is shown in Figure 4.

We can combine system integrationwith flexible customization at varying levelswithin the hardware, firmware and soft-ware components. The server-level cus-tomization can include tailor-made SoCdesigns in FPGA, while at the client (con-figuration) level, modifications are applied

to the WIN32 or .Net API layers. Thisscheme allows us and our customers to rap-idly prototype various configuration andtest interfaces.

Users can build client-server communi-cation on UART or TCP/IP to provideflexible configuration management, per-formance fine-tuning, status monitoringand firmware updating.

Even though we’ve just finished our sec-ond-generation product, we’ve already begunto look at requirements for our third genera-tion. Judging from our experience with thisproject, we’ll strongly consider Xilinx for thenew one, especially as the company intro-duces reliable, newer and more advanceddevices and DSP capabilities on the mostadvanced process technologies.



FLASHMemory

VideoIN

UARTInterfaces

(Serial Port)

TCP / IP(Ethernet

PHY)Pentium X86/

AMD

DDRMemory

MVE(FW & SW)

MVE - SoC

Video Analytics Sub-System(e.g., IP Camera FPGA Processor Board)

PC/LapTop(Configuration, Testing)

XilinxSpartan-3A DSP

3SD3400A

MVE Config(SW Only)

MetaData

VideoOUT

ITU BT.656,ITU BT.1120

EthernetDVI,GPIO

Accelerating Development Using the XtremeDSP Video Starter Kit, Spartan-3A DSP EditionAs part of our development and demonstration strategies, Eutecus created an MVE VideoAnalytics Development Kit to give users a rapid development and prototyping platformfor FPGA-based video systems. Our development kit is built upon the XtremeDSP VideoStarter Kit–Spartan-3A DSP Edition (http://www.xilinx.com/vsk_s3), which includes anFMC video I/O daughtercard, CMOS camera, cables and Xilinx development software.

After migrating our MVE analytics engine, we were able to leverage this devel-opment platform and provide our MVE analytics solution to an existing communi-ty of video systems developers for evaluation and purchase with no added hardwarecosts. Developers who don’t already have a Video Starter Kit can easily buy one froma Xilinx distributor. Once programmed into the FPGA, the VSK will boot andbegin performing the Eutecus analytics operations. The result is to give developersa quick and easy way to evaluate the performance, capabilities and cost of an FPGA-based video analytics system.

Figure 5 – Complete hardware-firmware-software reference design

by Manish DesaiProject Lead – ASIC - [email protected]

The growing demand for high-end securitysystems with video monitoring for a varietyof applications has fueled a need forembedded systems that can quickly capturemultiple audio/video channels, process andcompress the information and send it to acentral monitoring system via a high-speedInternet connection or host PC interface.

The new state-of-the-art Xilinx®

Virtex®-5 FPGA offers an exciting oppor-tunity for single- or few-chip solutions tosuch A/V monitoring applications, thanksto advanced features such as a high-endsystem and networking interface, built-inprocessor, high-speed serdes, advancedclock management and pre-bit deskew inthe I/O blocks. It might seem as if all thesesophisticated features could complicate thedesign process. But in fact, early-stageplanning can streamline the job whileensuring effective usage of the FPGA’savailable resources.


A/V Monitoring System Rides Virtex-5 A/V Monitoring System Rides Virtex-5 State-of-the-art FPGA forms the

basis for a multi-input audio/videoremote-monitoring application.

XCEL LENCE IN NEW APPL ICAT IONS

Let’s examine some of the challenges ofimplementing designs in Virtex-5 FPGAsand delineate techniques to get the mostout of its feature set, as demonstrated by arecent project. The process involved anumber of steps, starting with choosing theright Virtex-5 for the application. Otherconcerns included clock requirementanalysis, initial floor planning, core genera-tion and IP integration, timing considera-tions and constraint definition, all the wayto post-place-and-route timing analysis andtiming fixes. The Virtex-5 FPGA’s hard-wired macros proved central, as did its I/Oblocks and source-synchronous design. Forthe sake of this article, we’ll assume thatwe’ve assembled the IP blocks and that theyare either ready to use or already generatedwith CORE Generator™.

Picking the Right Device for the JobMost audio/video capture devices support asingle channel and generate source-syn-chronous digital in the Y/Cr/Cb data for-

expandability up to 2 Gbytes.) The internallogic of the FPGA performs compression (ifenabled) and sends data over Ethernet; alter-natively, it sends uncompressed data to thehost PC for storage and further processingvia the PCI/PCI Express® link.

For our design, the FPGA had to sup-port up to 10 digital audio/video source-synchronous input channels (20-bitsource-synchronous Y/Cr/Cb data format),and it had to be configurable for theSD/HD data format. It also needed a PCIExpress link with support for four lanes(default, one lane), single-channel10/100/1,000 Ethernet, a DDR memorycontroller with interface capacity of 128Mbytes to 1 Gbyte and an embeddedmicrocontroller in the form of a soft corefor configuration control. Other require-ments included an A/V signal-processingand optional compression algorithm, acentral control unit with an advancedDMA engine and one A/V output portconnected to VGA or a standard TV.

mat. Although DSPs are capable of captur-ing digital audio/video and can performdigital signal-processing tasks, they typical-ly support only a few channels. Therefore,in this design we chose an FPGA, whichproved to be a good alternative for bothmultiple-channel inputs and signal-pro-cessing tasks.

Figure 1 shows a typical security/videomonitoring system with a 3G/SD/HD/SDIvideo interface. For this design, the camerasends information in 3G-SDI format to theboard, which in turn collects the data andconverts it into 10-bit (Y/Cr/Cb formatted)source-synchronous video data (10/20-bitinterface) at a maximum clock frequency of145.5 MHz. It handles source-synchronousaudio data at a maximum clock frequencyof 96 kHz.

We used the Virtex-5 to capture the videoand audio data, and then synchronize it withthe internal FPGA clock and store it in DDRmemory. (The memory was 512 Mbytes and32 bits wide, so the FPGA had to support



Camera Module #1

Camera Module #2

Camera Module #3

Camera Module #4-10

Analog/Digital TV(for Local Monitoring)

3G/HD/SD RX

3G/HD/SD RX

3G/HD/SD RX

3G/HD/SD RX

AudioVideo

AnalogI/F

Video DataVideo CLKAudio DataAudio CLKVideo DataVideo CLKAudio DataAudio CLK

Video DataVideo CLKAudio DataAudio CLK

Video DataVideo CLKAudio DataAudio CLK

Audio-VideoData Capture




AV o/p IF

PCIE_IF toSystem Bus Bridge

PC Interface(PCI/PCI-E)

Audio VideoSignal

Processing[ImageResize+ OSD]

+Compression

(Optional)

EmbeddedMicro

Processor

SystemClock-Reset

NetworkController

DDRController

1000 BaseTPHY

DDRMemory

System InterfaceController

[DMA Engine]

Host PC forLocal Monitoring

ASYNC-FIFO

3G SD – Video Monitoring Card

Figure 1 – Typical Video Monitoring System Block

To meet these specifications, we had toconsider several factors when implement-ing the design. Chief among them wereclock requirement analysis, initial floorplanning, core generation and IP integra-tion, timing-constraint definition andpost-place-and route timing analysis andtiming fix. But the first decision was choiceof the FPGA.

Selection of the FPGA We based our selection on a number of fac-tors. The device needed to meet our esti-mated I/O requirements, and it had tohave an appropriate number of logic cells, asuitable block RAM size as well as a num-ber of clock buffers and clock managementdevices such as phase-locked loops (PLLs),digital clock management (DCM) andmultiply-accumulate blocks. In lookingover the data sheets from various FPGAvendors, it became clear that the FPGAthat best suited our requirements was theXilinx Virtex-5 XCVSX95T-FF1136.

The Virtex-5 contains all the designfeatures we needed. It is equipped with640 I/Os and an additional multi-gigabittransceiver (MGT) for PCI Express,along with Gigabit Ethernet media-accesscontrol (MAC) and RocketIO™ or anexternal PHY for local- and wide-areanetworking. A 3G-SDI receiver handlessource-synchronous data capture. ThePCI Express one-, four- or eight-laneinterface links with a host PC for process-ing and storage requirements.

The Virtex-5 also boasts a built-insoft-core processor for configuration con-trol, high-speed multiply-accumulateunits (multiple DSP48E blocks) for digi-tal signal processing and compression-algorithm deployment. Advanced clockmanagement is accomplished by means ofa global clock buffer, regional clock bufferand I/O clock buffer with clocking mod-ule—namely, a PLL and two DCMs.Finally, it offers asynchronous FIFO and

digital signal processing through morethan 200 block RAMs of 18 kbits.

Clock Requirement Analysis Once we selected the FPGA, we began thedesign process by analyzing clockingrequirements before mapping the signals tothe I/O bank or I/O pins. Over the years,we’ve learned to do this step early—it oftenproves to be the most important part of thewhole system design and plays a major rolein determining overall performance.

For the analysis of clock requirements, itis important to consider a few factors:

• Does the FPGA have sufficient clock-capable I/O and global clock I/O lines?

• Are there enough PLLs, DCMs andglobal clock buffers?

• Does the global clock I/O/buffer sup-port the maximum required frequency?

For example, this design’s clockingrequirements consisted of one global systemclock running at 150 to 200 MHz withPLLs used by all internal logic for process-ing; one global clock with a PLL/DCMPCI Express link running at 250 MHz; oneglobal clock buffer/PLL and DCM for theEthernet MAC running at 250 MHz; 16regional/local clock I/Os for the source-syncvideo clock running at 145 MHz; one glob-al capture clock for audio data/clock signals(48 kHz/96 kHz); one regional clock-capa-ble pin for the DDR memory interface run-ning at 200 MHz, and one 200-MHz clock(generated by the PLL/DCM) for pre-bitdeskew in I/O blocks.

In total, we needed four to six globalclock buffers and 16 local clock buffers.The FPGA XCVSX95T-FF1136 offers 20global clock input pins and four clock-capable I/Os in each bank. The device has14 banks with 40 pins, each supportingregional clock input buffers, and fourbanks containing 20 pins, each supportingglobal clock input buffers. You can directly

connect the clock-capable pins of the I/Obanks to regional or I/O buffers, and usethem in specific or adjacent regions. Inaddition, each GTP/MGT has a reference-clock input pin.

Initial Floor PlanningAfter performing the clock analysis, we cre-ated an initial floor plan. This is a criticalphase of the design, because the decisionsmade at this point will determine whetherthe final design is going to meet timing. Thebank selection and pin assignment areimportant steps of floor planning. How youhandle them depends on the placement ofother components around the FPGA.

The Virtex-5 FPGA has a total of 18I/O banks on which to map variousinput/outputs. A few I/O banks support 20input/outputs or 10 global clocks. Most ofthe other banks support 40 input/outputs,on which there are four input and eightoutput clock-capable pins.

IOBANK #3 and IOBANK #4 eachsupport 10 single-ended/differential glob-al clock inputs. Each bank supports 20pins. Any pins not used for clock/resetinput can be employed for general-pur-pose I/O. Two other banks, IOBANK #1and IOBANK #2, are close to the centerof the FPGA and each supports 20 I/Opins. Xilinx dictates that you must mapall single-ended clock inputs to positiveglobal clock input pins.

Meanwhile, the upper and lower halvesof the FPGA consist of three clocking mod-ules (CMTs): a PLL and two DCMs. Weneeded to ensure that we properly mappedall global clock signals that required a PLLin the upper and lower half of the device,such that the design had a direct connectionfrom the global clock input buffers to thePLLs. We then used the remaining 14 I/Obanks, supporting 40 I/O lines, in single-ended/differential mode. Each bank con-sists of four single-ended and eightdifferential clock-capable pins. We could



In looking over the data sheets from various FPGA vendors, it became clear that the FPGA that best suited our requirements

was the Xilinx Virtex-5 XCVSX95T-FF1136.


then map or connect the clock-capable pinsto regional or I/O clock buffers.

Normally, it is good to use these clock-capable pins and regional buffers (BUFR)to map source-synchronous clock inputs.The regional buffer has a lower skew andcan access three regions (one where theregional buffer is located, one above andone below). But for bank selection ofsource-synchronous data, we prefer to use asingle I/O bank. If we need additional I/O,it is better to use I/O banks for data signalsthat we’ve previously mapped to adjacentbanks. (For package information, refer toug195.pdf from the Xilinx Web site.)

We followed several steps for the initialfloor planning of the design. First, weplaced the system clock in the upper halfand then placed the audio capture (option-al) clock in the lower half. We locked theCMT of each half for the I/O bank 3/4requirements. This map ensures that eachhalf is left with two PLL/DCMs (CMTs)that we can use for the PCI Express andGigabit Ethernet MAC (SGMII) features.

Because we mapped synchronous datato banks that consisted of regional clocks,we mapped 10 audio/video channel inputson the remaining I/O banks. Each videochannel consisted of 20 data lines, threecontrol signals and video clock inputs.Meanwhile, each audio channel consistedof four data signals, three control signalsand one audio clock signal. This made atotal requirement of 32 signals with at leasttwo clock-capable pins (the FPGA’s 14banks can support 40 pins and four clock-capable pins).

For this design, 10 A/V channels use 10I/O banks. We mapped the video clock andaudio clock to clock-capable pins to ensurewe effectively used the regional and I/Oclock buffers. Based on the PCB require-ments, we selected for audio/video channelsbanks 5, 6, 13, 17, 18, 19, 20, 22 and 25.

For DDR memory, the design supportsa 32-bit data bus, 14 address lines and afew control lines. We needed 85 to 90 sig-nals to map the DDR memory interface.As per the PCB layout, we used I/O banks11, 23 and 15 to map all DDR I/O signals.Since DDR memory works on the systemclock, we chose to map the read data strobe

signal generated by the DDR to clock-enabled I/O lines.

Xilinx offers the PCI Express and GigabitEthernet (GbE) MAC as hard macros. TheXilinx CORE Generator tool generates theproper IP core with the combination of hardmacro, block RAM and some advancedRTL logic to render the blocks usable. Thetool also provides detailed constraints forpin mapping, the PLL/DCM and timing fora specific Xilinx FPGA. We advise using therecommended pin definitions as describedin the release notes or UCF file that COREGenerator creates for your design. Also, youcan use Xilinx’s Plan-Ahead™ tool to con-firm or cross-check any pin mapping you’vedefined manually.

Core Generation and IP Integration The task of generating cores with COREGenerator and integrating intellectualproperty can be tricky. Let’s examine someof the challenges involved in generatingand integrating the CMT, ASYNC FIFO,block RAM, PCI Express, GbE MAC andDSP48E blocks. (For more detailed infor-mation on the PCI Express and GbE MACblocks, visit the Xilinx Web site to makesure you have the most recent version ofCORE Generator and the latest IP.)

The Virtex-5 supports various configu-rations of clocking modules that you cangenerate with the CORE Generator utility.They include filter clock jitter PLLs, a PLL-DCM pair with filter clock jitters, a PLL-DCM pair or DCM for output dual-datarate (ODDR), a standard phase-shift clockDCM and dynamic clock-switching PLLs.

To generate PLLs, you first need to seewhether the input is single-ended or differ-ential (in the example design, it is all single-ended). Then you must determine whetherclock jitter is appropriate (in our case, it was120 picoseconds) and whether you’ve usedthe global buffer to buffer all the outputs.

Each PLL can generate up to six differ-ent frequency clocks. In our case, thedesign needed four 200-MHz systemclocks, each with 0, 90, 180 and 270degrees of phase, and one audio captureclock of 19.048 MHz or 39.096 MHz.

To drive clocks using ODDR flip-flopsin source-synchronous outputs, we imple-

mented a DCM that drives the ODDRflip-flops for forward clocking. This DCMruns in parallel to the DCM we used forinternal clocking.

We generated the ASYCN FIFO orblock RAM using CORE Generator andsupported ECC with interrupting logic onan embedded microprocessor core to per-form data error detection.

While generating the PCI Express core,we had to ensure the reference clock hadthe same performance as the PC mother-board’s PCI Express slot output (that is,100 MHz). Also, we needed to define howmany base address registers (BARs) the coreneeded and whether the BARs were mem-ory mapped or I/O mapped. We used theBAR monitor, which helps in generatingBAR hits, for address decoding.

During the design of the bridgebetween PCI Express and the system localbus, we used the BARs—which act asmemory or I/O region chip select—toaccess memory-mapped or I/O-mappedregisters or block RAM. We designed thebridge logic in such a way as to make surethat the core and bus properly accessed allthe register or block RAM. The XilinxPCI Express core also has a default ROMexpansion capability, and to accommodateit we had to implement an address andmap inside the bridge. Bit position 6 ofthe BAR hit points in this expansionROM area and the internal interface mustrespond to these BAR hits.

If any of the above is missing, the hostPC won’t get any response if it tries to com-municate and perform a read transaction.It will enter an unknown state or generatean unrecoverable error.

We used the CoreGen utility to generatethe GbE MAC with an RGMII/SGMIIexternal interface. We used the built-inGTP module to communicate with select-ed PHY devices. The GbE MAC supportsthe MDIO interface to configure externalphysical devices, a host interface and a 16-bit single-channel client interface.

The DSP48E block, for its part, is a25x18-bit multiplier and 48-bit hard-macro accumulator. You can use it directlyas an instance or, by mapping the multiply-accumulate, add and subtract functionality


implemented in RTL logic with Xilinxtools. We recommend using standard RTLlogic to implement the multiply-accumu-late, ADDR and multiplier. Include thedesign constraints during synthesis andplacement and routing.

For IP integration, be sure to have a sep-arate clock-reset module for each FPGA.The asynchronous reset must be synchro-nous with each and every clock, both glob-al and regional. Internally, the reset signal isasserted asynchronously and deassertedsynchronously with respect to specificclocks, and its output is applied to the spe-cific module to which the clock belongs.Make sure you have connected all the glob-al input clocks to the PLL/DCM core gen-erated by CoreGen.

Also, be sure you’ve connected theregional clock to BUFR/BUFIO. In addi-tion, to keep your placement-and-routingtool from using unnecessary routingresources, make sure you generate only thenecessary reset signals. You need to ensurethat PLL/DCM lock conditions arebrought to external pins or to the config-uration register. In our case, we only con-nected the 200-MHz system clock PLLlock to the I/O pins.

Definition of the IOB (input, output orboth input and output) synthesis and map-ping constraints must be part of theFPGA’s top RTL module. We instantiatedthe topmost module of core logic inside theFPGA’s top module. It must communicatewith the external interface via these IOBdefinitions only.

IOB definitions consist of IBUF,OBUF, IBUFDS, OBUFDS and the like.Each in turn consists of supported user-defined parameters for IO_STANDARD(LVTTL, LVCMOS, etc). We used theinstance definition of the above to mapexternal I/O signals with the topmost RTLmodule signals.

Since we were designing with high-speed source-synchronous inputs and out-

puts, the Virtex-5’s pre-bit deskew capa-bility, which is built into all I/O blocks(IODELAY primitive), helped us to meetsetup-and-hold requirements at input andoutput stage. For source-synchronousinputs, the source-synchronous clock usesBUFIO or BUFR, and it introduces addi-tional delay. To compensate for this delay,we drove data and clock inputs via anIODELAY instance that we configured ininput delay mode with known delaycounts. Changing the delay count valuehelped us meet timing requirements at theinput stage.

Similarly at the output stage, as syn-chronous clock signals are driven withdata, we needed to make sure that dataand clock signals were driven so as to meetthe setup-and-hold of an FPGA or ASICat the other end. We used IODELAYinstances configured in an output delaymode with a known delay count value forboth clock and data outputs. The IODE-LAY needs an IODELAYCTRL primitiveinstance at the top of the FPGA. The 200-MHz input clocking to the IODELAY-CTRL instance creates a delay countprecision of 70 ps on IODELAY.

Timing Consideration and Constraints DefinitionsAfter generating and implementing the IP,the next step was to perform timing. Weconstrained all the input clocks for period,jitter and input offset delays, and set all out-put delays with respect to the source clockand input-to-output delay. We then createdthe timing and placement constraints inXilinx User Constraint Files (UCFs).

We constrained all the input clocks tospecific frequencies and also defined thejitter input using the following UCF code:

NET “i_clk_200_s”

TNM_NET = “IN_200_CLKGRP”;

TIMESPEC “IN_200_CLKGRP”

= PERIOD 5 ns HIGH 50%

INPUT_JITTER 0.1 ns

With respect to source-synchronousdata, we can set the input clock to a 0-degree phase shift or 180-degree phaseshift, in the case of SDR, and 90-degreephase shift in case of DDR. Figure 2 showsthe source-synchronous DDR data inputtiming with the clock at a 90-degree phase



Video ch1_data

Video ch1_clk

tv

ts_pth

ts_ntclk

DATA_P DATA_N

DDR Source Synchronous Data Input Timing Parameter

Figure 2 – Timing Diagram of DDR Inputs

The Virtex-5’s pre-bit deskew capability, which is built into all I/O blocks (IODELAY primitive), helped us to meet setup-and-hold

requirements at input and output stage.


shift. Table 1 shows the external interfaceinput timing details.

The following are constraints we appliedfor minimum timing values in UCF:

// Define Clock Net

NET “i_video_ch1_clk”

TNM_NET = “VIDEO_CH1_CLK”

TIMESPEC “VIDEO_CH1_CLK”

= PERIOD 6.8 ns HIGH 50%

INPUT_JITTER 0.1 ns

// Define Time Group for Rising and

Falling (In case of DDR Inputs)

TIMEGRP “VIDEO_CH1_CLK_R”

= “VIDEO_CH1_CLK” RISING;

TIMEGRP “VIDEO_CH1_CLK_F”

= “VIDEO_CH1_CLK” FALLING;

// Define Input Constraints.

OFFSET = IN 0.5 ns VALID 1ns BEFORE

“VIDEO_CH1_CLK” TIMEGRP

“VIDEO_CH1_CLK_R”

OFFSET = IN -1.5 ns VALID 1ns BEFORE

“VIDEO_CH1_CLK” TIMEGRP

“VIDEO_CH1_CLK_F”

For timing constraints on the PCIExpress and Gigabit Ethernet MAC cores,we applied all timing and placement con-straints for block RAM and PLL/DCM asdefined in the CORE Generator example.

We defined the output timings withrespect to input clock or PLL-generatedclocks in a UCF.

// Define OFFSET OUT with respect

to clock.

NET “video_data_p0” OFFSET = OUT 3

ns AFTER “i_clk_video_in”;

// Define MAXDELAY from Flip Flop to

pad to be minimum (Say 0.1 ns to

0.2 ns),

NET “video_data_p0_to_pad” MAXDELAY

= 0.1 ns

The OFFSET OUT doesn’t confirmthat all outputs on all data and clock signalsare exactly at 3 ns. The tool tries to meettimings with zero or positive slack (that is,less than 3 ns).

Since many Virtex-5 designs use multipleasynchronous clocks, we then had to definethe false paths in the design so those clockswould not be affected. We did this with thefollowing constraints settings in a UCF.

// Define False Path.

NET “video_data_p0”

TNM_NET = “VIDEO_CH1_TIMGRP”;

NET “core_clk_0”

TNM_NET = “CORE_CLK”;

TIMESPEC TS_FROM_VIDEO_CH0_TO_CORE =

FROM FFS (“VIDEO_CH1_TIMGRP”) TO FFS

(“CORE_CLK”) TIG

Post-P&R Timing Analysis and Timing FixAfter placing and routing our design, weran static timing analysis (STA) and timingsimulation to see if we had any further tim-ing errors. For STA, we ensured that thetiming report covered all the constrainedand unconstrained paths. By using an STAreport, we can validate input/output tim-

ing and internal system timings. To fix theinput timing violations (setup and hold),we use IDELAY with the appropriate tapvalue to meet timing requirements. To fixoutput timing violations, we made sure therespective signal flip-flop was in IOB. Tofix the internal logic timing, we used anFPGA editor to make changes to the floorplan and the design’s RTL code.

We then ran timing simulation to catcherrors that we didn’t detect during statictiming analysis. The process involved gen-erating a netlist compatible with the simu-lator we used during RTL simulation, andadding it to the Xilinx library path in tim-ing simulation script.

Timing simulation will catch errors thatSTA doesn’t. One critical example is anaddress collision in dual-port RAM thatoccurs when two logic blocks generate twoasynchronous clock domains and address-es. Timing simulation also helps identifyslow-changing signal or multicycle pathsand multiclock domain paths in a design,thereby prompting designers to apply bet-ter timing constraints. That also helps fixtiming issues in STA.

The Virtex-5-based FPGA proved well-suited for our video monitoring systemrequirements. The regional clock bufferand I/O clock buffers (with pre-bit deskewat IOB level using IODELAY) allowed usto support multichannel source-synchro-nous audio/video inputs. Moreover, thedevice’s PCI Express and Gigabit EthernetMAC hard macros gave us global connec-tivity for remote monitoring.

The end result was a cost-effective solu-tion for our A/V remote-monitoring appli-cation. A bit of work at the early stage ofdesign made it easy to meet timing closure.In our future design work, we will rely onearly-stage planning to ensure the effectiveusage of the available resources of specificFPGAs. Defining global and regional clocksin detail and performing clock-requirementanalysis and initial floor planning will makeour flow more efficient, enabling us to rap-idly design value-added products.

For more information, contact eInfochipsat [email protected].


Timing Parameter Min (ns) Max (ns) Description

Tclk 6.6 – Clock period

ts_p (Setup Time) 0.5 1 Input Setup time for pos-edge of clock with respect to posedge

ts_p (Setup Time) -1.5 -2.0 Input Setup time for neg-edge of clock with respect to posedge

th (Hold Time) 0.5 – Input Hold time

tv (Data Valid Window) 1 1.5 Data Valid window

Table 1 – External Interface Input Timing Details

by Stacey SecatchSenior Staff Design EngineerXilinx, [email protected]

Bryan RamirezSenior Design EngineerXilinx, [email protected]

Running an “is it alive?” test and theninstantly downloading an FPGA designto a board is no longer sufficient for sys-tem development. Due to the increasingcomplexity of modern FPGAs, designsnow require the same level of functionalverification that engineers once reservedsolely for ASSPs and ASICs. The goodnews is that advanced verification tech-niques are ready for use in FPGA devel-opment and will improve the quality ofyour design right now.

While it’s true that FPGA designentails no expensive masks to turn if youuncover a bug during system bring-up,ensuring that your design is correct beforegoing to hardware is still critical to thesuccess of your project. Finding andsquashing bugs before heading to the labwill speed your overall design cycle andincrease the likelihood of releasing yourproduct on time, saving you and your cus-tomers money and frustration.


Verifying Xilinx FPGAs theModern Way: SystemVerilogVerifying Xilinx FPGAs theModern Way: SystemVerilogYou can improve design quality and time-to-market by investing upfront in currently available verification tools and methodologies.You can improve design quality and time-to-market by investing upfront in currently available verification tools and methodologies.

XPERTS CORNER

The Xilinx design team recently devel-oped a new methodology and SystemVeriloginfrastructure for verifying the SerialRapidIO™ LogiCORE™ (SRIO). In ourlatest release of the core, a new intelligentbuffer automatically reorders and managestransaction priorities should your systemneed to retransmit packets. For this verifica-tion project in particular, our engineers usedSystemVerilog (simulated using the MentorGraphics® Questa™ tool) on top of stan-dard AVM base classes (provided by MentorGraphics) to verify the interactions betweenour newly designed Buffer LogiCORE andour existing Logical LogiCORE, whileensuring compliance to the applicable layersof the RapidIO standard.

We think you’ll find it easy to make useof some of the advanced techniques weemployed in our verification project as youdevelop new, portable, robust test benchesof your own. We’ll also describe some asser-tion and functional coverage techniques weused to improve the quality of our designwithout even changing our test bench.

Abstract Test Bench DevelopmentTransactions give us a way to track datamovement and control events during simu-lation. One transaction class in the SRIOverification infrastructure representsRapidIO logical packets and containsmember elements for each field. We alsouse transaction classes to represent otherevents and conditions within the core. Forexample, we use one scheduling class toindicate whether packets need to bereplayed on the link interface, and anotherconfiguration class to represent read andwrite transactions on the host interface.

SystemVerilog interfaces abstract coresignals in order to provide simple connec-tions between the test bench and thedevice under test (DUT). The only class-es that communicate with core interfacesare the drivers (which convert transac-tions to vectors) and the monitors (whichconvert vectors back into transactions).Figure 1 depicts vector-based interfaceswith solid dark lines. All other connec-tions between elements are shown ashashed gray lines and represent transac-tion-based communications. When pro-

At the end of a transaction sequence,the driver or monitor sends the transac-tion to the scoreboard, as shown in Figure1. The scoreboard contains a referencemodel for the DUT to verify that itbehaves as expected. Specifically, the refer-ence model checks that the core correctlyarbitrates between interfaces, constructsproper packets and retransmits packetscorrectly when required.

The SRIO core uses Local Link inter-faces, not only for external communica-tion, but also among all of its top-levelmodules. An instance of a common moni-tor class checks each Local Link interfacein order to catch problems as they occurwithin the core, rather than waiting forthem to become visible on external inter-faces. Using standard interfaces in yourdesign detects potential problems in yourcircuit early, allowing for quicker debug.Flagging an error immediately eliminatesthe need to track the problem backthrough a sequence of events.

Randomization and CoverageA solid test bench infrastructure is impor-tant, but it doesn’t do you any good with-out transactions that really exercise your

cessing the vector streams, the drivers andmonitors use functions that are built intothe transactions to translate betweenfields and data streams, as well as helpermember functions such as comparisons.

Using abstracted transactions allows youto focus on what your DUT is supposed tobe doing, rather than how it does it. Forexample, the transaction representing aRapidIO packet allowed our test bench tomove and operate on a single entity with-out requiring knowledge about how thepacket entered or left the DUT.

With transaction class definition com-plete, our next step was to specify howtransactions travel through the core andhow the test environment creates and con-sumes them. Individual test cases directgenerators to create transactions and passthem along to drivers, as shown in Figure1. Drivers and monitors contain protocolcheckers to verify proper operation withexternal circuits. The main differencebetween the two classes is that driversactively interface to the core, while moni-tors passively snoop what is happening. Atthe same time, functional coverage trig-gers measure what the test bench has exer-cised within the core.


XPERTS CORNER

DriverDriver

Generator

Scoreboard (reference model)

GeneratorGenerator

Generator

LocalLinkDriver

LocalLinkDriver

LocalLinkMonitor

LocalLinkMonitor

LocalLinkMonitor

LocalLinkMonitor

LogicalCore Buffer

Transmitto Core

Transmit Transmitto Link

Receivefrom Core

Receivefrom Link

Receive

Figure 1- Simplified diagram of the SRIO cores surrounded by the test bench. Solid dark lines represent vector-based interfaces, and hashed light lines represent transaction-based interfaces.

design. The old style would be to create atest plan and write a directed test to coverthe functionality you care about.Unfortunately, advanced verification stillrequires a test plan. You can cover a lot ofthe functionality automatically by usingthe built-in constrained randomization ofSystemVerilog. One benefit of randomiza-tion is the ability to hit hidden test pointsthat you haven’t actually defined. Afterfinding these new points, you can addthem to your test plan to ensure the simu-lation environment exercises them.

We used the randomization engine toautomatically create valid packets withvarying content, and to issue them againstthe core with any valid timing. This guar-antees that we tested the core against anykind of packet, in any sequence, with anyother kind of packet already active in thecore. This methodology very quickly provedthe core data path was able to handle anypossible combination of packets and for thecore control to handle any possible state.

Defining Test StimulusCompletely random vectors will exercisethe DUT, but won’t often provide chal-lenging vectors for it to process. What’sneeded is useful test stimulus. We recom-mend starting with very simple tests tocheck the basic functionality of yourdesign. To illustrate, our very first test casesent only one packet in each direction.Sending a single packet verified that thecommunication between the logical andphysical layers was correct, and that the testbench itself was working properly.

Once that basic test worked, weturned on randomization. This changedpacket content and size, as well as timingon all of the interfaces—a simple step tohelp us test more thoroughly. We thenincrementally turned on increasinglycomplex tests to fully verify the core. Thisstair-step approach also allowed us toturn off some of the simpler tests and savesimulation cycles as we progressed, as themore-advanced tests had already coveredbasic functionality.

To quickly get to tests that were inter-esting, we set up test cases that con-strained portions of randomization space.

For example, to test interface arbitrationwithin the transmit side of the Logicalcore, we used fork/join constructs todirect the generators in parallel. Eachgenerator created transactions that weconstrained to each of the logical transmitinterfaces at or near the same time in sim-ulation. This shortened the testing time,because we could simulate this function-ality directly, without requiring the con-straint solver to bring us there as part ofrandom simulations.

Once regressions were up and running,we examined our functional and code cov-erage data. As expected, there were cover-age gaps. We wrote directed random teststargeting the areas that randomizationdidn’t adequately address. For example, weadded a test case that changed data throt-tling on the user and link interfaces tosimulate the buffer while mostly emptyand mostly full. When closing coveragegaps, a hybrid approach using directedrandom test stimulus tends to be the most

effective. By focusing the random vectorstoward the area of interest, the stimulus ismore useful and the benefits of random-ness are maintained.

Immediate Adoption: Assertions and CoverageThe SystemVerilog language is dividedinto verification and development com-ponents. Binding is a verification con-struct that connects one module toanother. It allows verification code todirectly observe register-transfer-level(RTL) synthesizable code without makingany changes to the RTL source files.

Binding enables you to start addingmore-advanced verification features with-out having to fundamentally change yourtest bench. All you need do is create a newmodule with the verification constructsand “bind” it to your code. This white-box testing method allows immediateidentification of bug events, in additionto determining that desired testing eventshave occurred.


XPERTS CORNER

module my_rtl

#(parameter WIDTH = 8)

(input clk,

input rst,

output [WIDTH-1:0] dout,

input [WIDTH-1:0] din1,

input [WIDTH-1:0] din2)

// ensure no carryout lost

always @(posedge clk) begin

if (rst) begin

dout <= #`TCQ 0;

end else begin

dout <= #`TCQ din1 + din2;

end

end

module my_rtl_bind

#(parameter WIDTH = 8)

(input clk,

input rst,

input [WIDTH-1:0] dout,

input [WIDTH-1:0] din1,

input [WIDTH-1:0] din2)

// ensure all combinations

// of din1/din2 "signs"

wire din1_s = din1[WIDTH-1];

wire din2_s = din2[WIDTH-1];

covergroup sign_bits @(posedge clk)

coverpoint din1_s;

coverpoint din2_s;

combos: cross din1_s, din2_s;

endgroup

// overflow not allowed

// on dout

wire [WIDTH:0] tmp = din1 + din2;

property p_dout_over;

@(posedge clk) tmp[WIDTH] == 1;

endproperty

ap_dout_over: assert property

(p_dout_over) else

$error("dout overflowed!");

endmodule

Embedded assertions and coverageRTL source code

Bind verification to source

module top_bind

#(parameter WIDTH = 8) ();

bind <your hierpath to inst

name for my_rtl> my_rtl_bind

#(.WIDTH (WIDTH))

my_rtl_bound

(.clk,

.rst,

.dout,

.din1,

.din2);

endmodule

my_rtl> r my_rtl> my_rtl_bindmy_rtl_bind

my_rtl_bindmy_rtl_bindmy_rtle m

.dout,

input [WIDTH-1:0] dout,

1 0

t output [WIDTH-1:0] dout,

1 0

yo

c

TC

n

T

rc

H

8

r

d

rp

]

0]

c

n1

in1_

ire d

t

e

, d

w no

on

wir

in

)

to

>

))

>

Figure 2- Binding embedded assertions and coverage example for a simple adder. At the top-level simulation, also include “top_bind” as a simulation target.


For each RTL module, we created anembedded verification module contain-ing assertions and functional coverage.We first copied the port declaration foreach RTL module, but changed outputports to input ports. Then, we addedparameters to the verification module forany parameters or local parameters thatwe needed access to. Finally, we addedinput ports corresponding to any internalsignals we wanted to probe.

The final step was a wrapper module tobind the verification modules to the exist-ing RTL modules. The SystemVerilogbind keyword in combination withimplicit connect-by-name made creatingthe wrapper module very simple. Notethat SystemVerilog doesn’t allow implicitconnections for parameters, so they mustbe explicitly called out as ports at thislevel. Hierarchical references allowedaccess to signals and parameters residingin submodules.

This wrapper module becomes a sec-ondary top-level module when simulatingthe design. Figure 2 shows this methodolo-gy at work on a simple adder example. Youcan also create any temporary signals youmight need inside the verification module.

Embedded Assertions, Functional Coverage We used SystemVerilog assertions toquickly detect any problems. Like inter-nal monitors, assertions immediately findany bug that has been triggered, withoutrequiring it to propagate to a port foranalysis. SystemVerilog assertions are veryflexible; we primarily focused on verify-ing correctly formed input (such as mal-formed packet demarcation andinconsistent control signals) and errordetection (such as X detection and over-flow conditions). The assert property inFigure 2 detects an overflow condition onthe carry bit.

We found embedded assertions to bemost useful for detecting initial failingevents to sequences that cause functionalfailures much later in simulation. Theywere even more useful when the failurehappened in logic not related to thesource of the issue. Assertions allowed thefailure to immediately present itself dur-

ing simulation and be located in theincorrect source logic.

In addition, we used SystemVerilog cov-ergroups and cover properties to ensurethat the test coverage for the DUT wasadequate. In general, covergroups workedbest to check signal values or when wewanted to ensure that combinations of val-ues were available using cross-coverage(such as a discontinued packet with everypossible priority). Cover properties excelledwhen we wanted to detect sequences ofevents, such as a discontinued packetimmediately followed by a valid one. Thecovergroup in Figure 2 provides coveragefor all permutations of the high bit of eachaddend seen in simulation.

Embedded coverage was most usefulin two situations. The first was to provethat internally generated sequences cov-ered all possible values, since the testbench doesn’t directly control thosesequences. The second was to prove thatbuffers were in all possible states duringsimulation (empty, full and every possibleone-from-full configuration). Moreover,embedded coverage lets you fully meas-ure the testing of coverage points createdas a result of implementation detailsrather than specification requirements.You can improve the quality of yourdesign by ensuring that you cover thesepoints during testing.

Looking forward, we’ll continue to usethis base methodology on our LogiCOREIP cores while finding ways to integratenew verification techniques such as OVM(a new base class library developed jointlyby Mentor Graphics and Cadence™).There are a lot of helpful books and whitepapers available about high-level testbenches. For usable examples, we like theWriting Testbenches series by JanickBergeron and the AVM Cookbook fromMentor Graphics Corporation. We foundthe SystemVerilog language referencemanual to be irreplaceable.

For details about the Serial RapidIOprotocol, go to the trade association Website at www.rapidio.org. For more infor-mation about the Xilinx Serial RapidIOLogiCORE and the new buffer design,go to www.xilinx.com/rapidio.

XPERTS CORNER

by Endric Schubert, Felix Eckstein, Joachim Foerster, Lorenz Kolb and Udo KebschullMissing Link [email protected]

Automotive multimedia systems face aserious technical challenge: how to achievesystem upgradability over the product'slengthy life cycle?

Cars and trucks have a typical lifetimeof more than 10 years. That makes it diffi-cult for automotive multimedia systems to

keep pace with the rapidly changing stan-dards in consumer electronics and

mobile communications. In mostcases, it is not sufficient (or evenpossible) to update only the multi-media software. Many applications,especially multimedia codecs, also

require an increase in compute per-formance. Yet designing a system with

“spare” compute power for later use is nei-ther economical nor technically feasible,since many technology changes are simplynot foreseeable.

One solution is to upgrade the computeplatform together with the software in sucha way that the upgraded system provides suf-ficient computation power for the addition-al software processing load. If you build asystem with a Xilinx® Virtex®-5 FXTdevice, you can give your design additionalcomputation power by adding special-pur-pose compute operations to the PowerPCprocessor’s Auxiliary Processing Unit (APU).


Extend the PowerPC Instruction Setfor Complex-Number ArithmeticUsing the APU inside a Xilinx Virtex-5-FXT can ease the design of field-upgradable automotive multimedia systems.

XP LANAT ION:FPGA 101

At Missing Link Electronics, a companyworking on reconfigurable platforms to linkreliable automotive and aerospace electron-ics with the rapidly changing consumer andmobile-communications markets, webelieve the PowerPC processor APU insidethe Xilinx Virtex-5 FXT devices is a littlegem. It provides embedded-systems design-ers with the same optimization powers tra-ditionally available only to the “big guys”who build their own custom ASSP devices.Given what’s now available to Xilinx users,we think everyone should tap the APU tooptimize their designs.

Basics of Extending Instruction Set via the APUWhen designers want to optimize anembedded system, they typically do so bylooking for ways to extend the instructionset of the microprocessor at the heart oftheir design. Traditionally, this is the bestoption when the complexity of the embed-ded system lies in the software portion ofthe design. You could also simply put newfunctionality in the design by adding dedi-cated hardware blocks.

However, you’ll likely find that increas-ing the instructions holds some greatadvantages that complement hardwarechanges, but is somewhat easier fordesigners to implement. For example, byextending the instructions, you can opti-mize your design in finer granularity.Also, extending the instruction set typical-ly does not interfere with memory access;thus, it has the potential to optimize thesystem’s overall performance.

Even though individuals, companiesand academic researchers have publishedpapers on how to do it, extending aninstruction set may seem like a “black art”to anyone new to this technique. But inreality, it isn’t that complex. Let’s examinehow you can optimize your Virtex-5 FXTdesign by making some fairly simple addi-tions to the PowerPC processor’s instruc-tion set via the APU interface.

In general, to extend the instruction setof an embedded microprocessor, you needto understand you are modifying both thesoftware and the hardware. First, you areadding hardware blocks to your system toperform specialized computations. These

out of our design without increasing theCPU clock frequency (which may causeother headaches).

The key reason to use the APU ratherthan connecting hardware blocks to the

microprocessor via the PLB bus is the supe-rior bandwidth and lower latency betweenthe PowerPC processor and the APU/FCM.Another advantage lies in the fact that theAPU is independent of the CPU-to-periph-eral interface and therefore does not add anextra load to the PLB bus, which the systemneeds for fast peripheral access.

The APU provides various ways of inter-facing between the PowerPC and the FCM.We can use a load-store method or the user-defined instruction (UDI) method.Chapter 12 of the Xilinx User GuideUG200 offers detailed descriptions of thesetechniques (http://www.xilinx.com/support/documentation/user_guides/ug200.pdf ).

In our example we’ll deploy the UDImethod, because it provides the most con-trol over the system, enabling the highestperformance. The example design is avail-able for download from our Web site, athttp://www.missinglinkelectronics.com/support/.

Example Design DescriptionBy adding a UDI, we have extended thePowerPC processor’s instruction set to per-form complex-number multiplications, ahandy optimization for many multimediadecoding systems. The EDK diagram (seesidebar, Figure 1) shows the overall design,including how we connected the complex-number multiplier FCM to the PowerPCprocessor via the APU, and how softwarecan make use of it.

We picked complex-number multiplica-tion as an example because of its wide

computations execute in parallel in theFPGA fabric rather than sequentially insoftware. In Xilinx-speak, these hardwareblocks are called Fabric CoprocessingModules, or FCMs. You can write FCMs

in VHDL or Verilog, and they will end upin the FPGA fabric of the Virtex-5 FXTdevice. You can connect one or more FCMto the PowerPC processor APU interface.

The next step is to adjust your softwarecode to make use of those additionalinstructions. You have two options (assum-ing you are programming in C language).The first is to change the C compiler toautomatically exploit cases where the use ofthe additional instructions would be benefi-cial. We’ll leave this option to the academ-ics and certain folks working on ASSPs.

The second, and more elegant, option isnot to touch the compiler but instead useso-called compiler-known functions. Thismeans that manually, in our software code,we’ll call a C macro or a C function thatmakes use of these additional instructions.

Using either option, we must adjust theassembler so that it supports the new instruc-tions. Fortunately, Xilinx includes aPowerPC compiler and assembler with theEmbedded Development Kit (EDK) thatalready support these additional instructions.

When the PowerPC encounters thesenew instructions, it quickly detects thatthey are not part of its own originalinstruction set and defers the handling ofthem to the APU. Xilinx has configuredthe APU to decode these instructions, pro-vide the appropriate FCM with theoperand data and then let the FCM per-form the computation.

If this is properly done, the softwarerequires fewer instructions when running.Therefore, we can get more compute power



The PowerPC processor APU inside the Xilinx Virtex-5FXT devices is a little gem. It provides embedded-

systems designers with the same optimization powerstraditionally available only to the ‘big guys’ who build

their own custom ASSP devices.

applicability in decoding streaming-mediadata, and because it clearly demonstrateshow to make use of the APU by adding aspecial-purpose instruction.

Complex-number multiplication isdefined as multiplying two complex num-bers, each having a real value and an imag-inary value.

(a_R + j a_I, where j*j = -1):

(a_R + j a_I) * (b_R + j b_I) =

(a_R * b_R - a_I * b_I) + j (a_I *

b_R + a_R * b_I)

For efficiency, the complex-numbermultiplication hardware block (cmplxmul),performs the multiplication in three stages.This saves hardware resources by usingonly two multipliers and two adders in thismulticycle implementation. Figure 2 showsa block diagram (in sketch form) of thecomplex-number multiplication FCM.

As the VHDL code in cmplxmul.vhddemonstrates, we perform the complex-number multiplication in three clockcycles. In file cmplxmul.vhd we haveimplemented the FCM to perform thiscomplex-number multiplication. File fcm-cmul.vhd provides the FCM/APU inter-face wrapper to connect our FCM to theAPU. As we will show in our step-by-stepprocedure (see sidebar), you can use thiswrapper as a template to connect your ownFCM to the APU when using the UDImethod (the load-store method requires adifferent interconnect).

We synthesized our design with XilinxEDK/XPS 10.1.02 using Xilinx ISE®

10.1.02. We simulated and tested thedesign with ModelSim 6.3d SE.

The PowerPC processor APU thatresides within the Xilinx Virtex-5 FXTdevices allows embedded engineers toaccelerate their systems in a very efficientway, by adding special-purpose user-defined instructions for hardware accelera-tion and coprocessing. Using the exampledesign described here as a starting pointwill show you that mastering the APU isstraightforward, and can give your designsa major performance boost without the useof special tools.



Step-by-Step Guide to Using the APUHere we present detailed information on how the engineers at Missing Link Electronicsgenerated the necessary files for our example design, and how to use these files to repro-duce the results on the Xilinx ML507 Evaluation Platform, which contains a XilinxVirtex-5 XC5VFX70T device. We also show how to use this design as a starting pointfor your own APU-enhanced FPGA design.

Step 1: Build Your Coprocessor Theoretically, you can build almost any coprocessor as long as it fits into your FPGA,but keep in mind that a user-defined instruction (UDI) can transport two 32-bitoperands and one 32-bit result per cycle. Our coprocessor for complex-number multi-plication is implemented in file src/cmplxmul.vhd.

Step 2: Build the FCM Wrapper To be area-efficient, your coprocessor may need a multicycle behavior similar to ours.Therefore, you will need a state machine to implement a simple handshake protocolbetween the coprocessor and the Auxiliary Processing Unit (APU). In our example, wedid this inside the wrapper “fcmcmul,” which we implemented in file src/fcmcmul.vhd.

Inside the wrapper fcmcmul, we instantiated the complex-number multiplicationhardware block cmplxmul, which becomes the Fabric Coprocessing Module (FCM).Thus, fcmcmul provides the interface we need to connect it to the APU. You can find adetailed description of those interface signals in Xilinx document UG200, starting atpage 188. The important detail is the timing diagram for “Non-AutonomousInstructions with Early Confirm Back-to-Back” on page 216, which shows the protocolbetween the APU and the FCM.

Step 3: Connect the FCM with the APUIn general, you can connect your FCM to the APU in two ways: by using the XilinxPlatform Studio (XPS) graphical user interface, or by editing the .mhs file. We havefound that when cutting and pasting a portion of an existing design into a new one, itis easiest to edit the .mhs file. So for this example, we connect the FCM/wrapper andthe APU in file syn/apu/system.mhs.

We suggest that you do the same. Just copy the section from “BEGIN fcmcmul” to“END” from our example and paste it into your .mhs file.

To make it all work in XPS, you must also provide a set of files in a predefinedfile/directory structure. In our example, we have called the wrapper block fcmcmul;therefore, the file/directory structure looks like this:

syn/apu/pcores/fcmcmul/data/fcmcmul_v2_1_0.mpd

syn/apu/pcores/fcmcmul/data/fcmcmul_v2_1_0.pao

syn/apu/pcores/fcmcmul/hdl/vhdl/fcmcmul.vhd

syn/apu/pcores/fcmcmul/hdl/vhdl/cmplxmul.vhd

The .mpd file contains the port declarations of the FCM. The .pao file provides thenames of the blocks and files associated with the FCM, and XPS finds the VHDL sourcefiles for the coprocessor and the wrapper in the hdl/vhdl directory.

You should replicate and adjust this tree as needed for your own APU-enhancedFPGA design.

Step 4: Hardware SimulationWe have provided the necessary files to test the APU example using ModelSim. As a pre-requisite, and only if you have not done this yet, you must generate and compile the Xilinx



simulation library. You can do this from the XPS menu XPS→ Simulation→Compile Simulation Library. Then generate all RTL simulation files for theentire design from the XPS menu XPS→ Simulation→ Generate Simulation.

Next, run the RTL simulation to verify your APU design—in particular, thehandshake protocol between the APU, the wrapper and your coprocessor.

The simulation shows the two possibilities for the APU delivering theoperands in either one or two cycles (as explained in UG200, on page 216).Look for the signals FCMAPUDONE and FCMAPURESULTVALID.

Step 5: Software TestingFor the complex-number multiplication, we have written a small standaloneprogram, syn/apu/aputest/aputest.c, that demonstrates the use of the APUand our coprocessor from a software point of view.

This program configures the APU and defines the UDI. Then it runs a loopto compute the complex-number multiplication using our hardware coproces-sor, compares it against the result of a software-only complex-number multipli-cation and provides a performance analysis.

You must configure the PowerPC APU before it can function properly, inone of two ways: You can either click in XPS and enter the initialization valuesfor certain control registers of the APU, or you can configure the APU directlyfrom the software program that uses the APU. We feel the latter option is moreexplicit and robust.

In our C source code file, you will find descriptive C macros and functioncalls to properly initialize the APU. Feel free to copy-and-paste into your pro-gram as needed

Within the loop, we do complex-number multiplication first using theUDI and then using the software macro ComplexMult. We use our routinesStart_Time and Stop_Time for performance analysis. The three callsUDI1FCM_GPR_GPR_GPR implement the three-cycle hardware complex-numbermultiplication. We define the C macro UDI1FCM_GPR_GPR_GPR infile syn/apu/ppc440_0/include/xpseudo_asm_gcc.h, which is aXilinx EDK-generated file. We implement the C macroUDI1FCM_GPR_GPR_GPR via assembler mnemonic udi1fcm.Because Xilinx has patched the assembler, this udi1fcmmnemonic—though obviously not part of the original PowerPC440 processor instruction set—is already a proper instruction theAPU can handle.

In our test case, aputest is the XPS software project that wecompiled, assembled, linked and downloaded into the Virtex-5FXT block RAM for execution by the PowerPC processor.

Step 6: Generate the FPGA ConfigurationYou can generate the FPGA configuration bit file from the XPSmenu XPS→ Hardware→ Generate Bit Stream. To save yousome time, we have included a bit file for the Xilinx ML507Development Platform. You can find it in Syn/apu/implementation/download.bit.

Step 7: Execute the Example DesignDownload the FPGA configuration bit file, start the XPS debugger XMD (UART settings are 115200-8-N-1) and view theexample design.

The run-time it reports is 4,717 cycles for the software-only design and 1,936 cycles for the UDI hardware-accelerated com-plex-number multiplication. Thus, the acceleration is approximately 2.4 times, with the complex-number multiplication run-ning with no timing optimizations at 50 MHz. Of course, if we were to use pipelining and increase parallelism, the coprocessorcould run much faster, increasing the overall acceleration to five to ten times.

Figure 1 – EDK processor system block diagram

Figure 2 – Complex-number multiplication coprocessor was so easy to design, you can do it on graph paper.


Using a Xilinx FPGA to Beat Your Son at Guitar HeroUsing a Xilinx FPGA to Beat Your Son at Guitar Hero

Electronically monitor the video signal from a Nintendo Wii console to operate a Guitar Hero game in real time.

XPER IMENT

by Michael [email protected]

This project started as a way to hang outwith my 16-year-old son, Alex. “Dad,Guitar Hero rocks. We’ve got to get one”—simple words that would unknowingly setme on a quest to play that perfect game ofGuitar Hero myself. Simple, I thought,because on and off, I’ve had a guitar in myhands for upwards of 45 years. How hardcould this be? Surely I was capable of press-ing a plastic fret button and a plastic strumbutton on a plastic guitar.

Well, although my son and I started atthe same level, it wasn’t long before he hadmoved up the difficulty scale and placed hisinitials on the top of the “high score” dis-play of every song. Try as I might, there wasjust no way I could get my four fingers tothe right place, at the right time, and playthose five buttons on that plastic fret boardanywhere close to the speed my son wasable to do it. A quick search of YouTubeconfirmed my suspicions. Thousands ofkids could beat me in their sleep.

There was only one way I was going tobeat Alex at this game—I was going to haveto cheat. I was going to have to put myengineering background to work to buildmy surrogate.

My first thoughts about a system to con-quer Guitar Hero on my behalf were sim-ple. Some kind of light sensors stuck to thefront of the TV, connected to some kind ofmicrocontroller-based computing platform,driving some kind of solenoids to actuatethe physical buttons on the guitar. RubeGoldberg would have been proud.

Many of my projects start as a kludgeand get—well, let’s say refined over time.This one was no exception. It didn’t takelong for my friend Steve to say, “Just lookat the composite video signal and electron-ically operate the buttons,” quickly fol-lowed by another piece of advice: “Use aXilinx FPGA as the computing engine.” It’slike building a project the old-fashionedway, Steve told me.

“You go into your box of TTL parts andwire them together into any function youneed, except you do it on your computer

posite video or FPGAs, I decided this wasmy kind of project. Not to mention, I’d getthe chance to melt some solder.

What is Guitar Hero? It’s an insidiously simple game that runs (inthis case) on a Nintendo Wii. Guitar Herohas been a huge commercial success. Thegame comes with a plastic guitar that has

only five buttons on the neck and a plasticbar on the guitar body that you use to“strum” the instrument. When you start thegame, you’re presented with a list of songs.After you select one—say, “Purple Haze”—a vanishing-point guitar neck appears downthe center of the screen. Around it swirl ani-mated graphics and a scoreboard.

As you cross a line at the bottom of theguitar neck, the player’s job is to press thecorrect button or combination of buttonsand hit the strum bar. Success gains youpoints at an ever-increasing multiplier, fail-ure gets you an annoying sound and a mul-tiplier reset. A whammy bar augments yourscore during sustained notes, and shakingthe guitar at certain times during a songincreases the point multiplier even further.

It takes consistency to build big scores.Miss enough notes, and the game deliversa notice of failure along with degrading

and you have an almost unlimited numberof gates,” he said. “You want a 13-bit shiftregister, make one. You want a 5-bit adder,make one. Rewiring the project takesabout 30 seconds. And everything works atgate speeds—no more counting cycles tosee if you can get a task done in a certainperiod of time.”

Steve, you see, is a video engineer whowas responsible for building one of thefirst nonlinear editing systems in the1980s. No self-respecting video guywould let anyone build a kludge systemlike the one I was describing.

After doing some research, I found thatnot only can you build arbitrarily complexlogic, but you could download a micro-processor core with code into the FPGAand get the best of all worlds. Long ago,gates were my crayons. But gates wereexpensive. They were packaged in their ownplastic case with their own leads and in needof external connection, which was providedwith a wire-wrap gun and a handful ofKynar-covered wire. It was time-consumingand error-prone. Changing the design waspainful. Unwrap, wrap, unwrap, wrap.

This was going to be different. Writesome code, compile it, download it to thepart and try it out. Need an 11-bit counter?Just write a few lines of code. Problem withthe design? Redo it. Want to extend it? Don’tmove it from the test bench to the prototypearea, just download new code one moretime. A full loop takes a couple of minutes.

It’s not like this capability hasn’t beenavailable for years—it’s just that I hadn’ttried it. I’ve been a microprocessor guy.Well, now I’ve tried it and I like it—a lot.

Soon the system design was starting totake shape. Composite video in, a littleanalog processing and level shifting, anFPGA-based computing engine and driversfor the buttons.

I know, I know—it’s a completelyridiculous project. Why waste time buildinga computer to play a computer video game?The enormity of commitment of bothresources and time was baldly apparent. Soas I have done countless times in the past, Isimply disguised the exercise as a learningopportunity. Not knowing much about theinner details and intricacies of either com-


XPER IMENT

Figure 1 – As the pucks move down the screenthey expand, giving the graphic impression ofcoming toward you. I found scan line 198 wasabout the right place to look for pucks; notice thegreen puck passing through the scan. By havingfive instances of the puck locater running in theFPGA, the computer effortlessly found and storedthe presence or absence of a puck. Its “highlight”signal generated the highlighted line.

boos from the (virtual) crowd. There’snothing like a game of Guitar Hero tokeep you humble.

Sizing up the Design Challenge The Guitar Hero game is, in concept, a sim-ple one. Video is presented on a monitor intwo basic sections. By using a DVD recorderI was able to capture, frame by frame, a gamebeing played. Upon review, it was easy to seethe step-by-step progress the game makes.

In general, there are two areas of thescreen (see Figure 1). Down the center is theguitar neck, starting small at the top of thescreen and getting progressively larger at thebottom, presented in just the same way as avanishing-point drawing exercise you mighthave done in elementary school. On the leftand right, graphics add excitement to thegame, but they are extraneous and can bedisregarded. Frets start at the top of theneck and travel down the screen, gettinglonger horizontally as they make their trek.

Overlaid on the neck, and traveling atthe same speed as the frets, are coloredpucks that look like cream-centered dough-nuts, tilted in perspective. Designed to sig-nify necessary button presses, these coloredpucks start from the top of the screen assmall circles and get wider and wider asthey move down, spreading out andexpanding in diameter, to give the illusionof moving toward the player.

As each puck approaches a correspon-ding cylinder at the bottom of the screen, ifyou’re fast enough to press the correct but-ton on the plastic guitar neck and hit thestrum bar at just the right moment, it willdisappear inside. A small flame emanatingfrom the cylinder signals success. Blow itand the sound of a muted guitar stringechoes your failure as the multiplier count-er resets. Blow it enough times and you’rebooed off the stage by an angry crowd.

It seemed to me that by sensing thepresence or absence of a puck and timingits progression down the neck of the guitar,we could build a system that could poten-

tially turn Dad into a Guitar Hero mas-ter—the gaming equivalent to my real-lifeguitar heroes, Eric Johnson, Jimmy Pageand Eddie Van Halen. But I had to sur-mount a few engineering hurdles first.

Challenge No. 1: There’s no still spot onthe screen you can look at to gather theimportant information. Guitar Hero isn’tlike yesterday’s Pac-Man or Pong. Graphicsare in motion all over the place. Overlaidframes of moving frets, lightning bolts,flashes of fire and strobes may make forexciting game play, but they were going tomake finding those pucks difficult.

I guess this would normally be the jobfor a frame store and a computer runningcomplex algorithms, but buying or build-ing a frame store was slightly out of thequestion. Finding those pucks was going tobe the job of a single-pixel comparator. Yes,we could pick the line, and yes, we couldpick the position on that line, but the onlyinformation available was going to be “isthe signal above or below this value?”White, or less than white?


XPER IMENT

Figure 2A – This is the completed PCB that my friend Steve laid out for the Digilent project. The connec-tor on the left now plugs directly into the Digilent Nexys2 FPGA Demo board. Counterclockwise from

there are the DB9 connector that drives the guitar, the optoisolator subassembly and the two video connec-tors—one for in and one for out. I’d say this board is a little cleaner than the prototype shown in Figure 2.

Figure 2 – Here’s the prototype of the analogboard. It seems that I always make my prototype

area just a little too small for the circuit I’mdesigning. Just as in software, “I’m not quitedone, I have to add just one more section…”

thus the two boards tacked together. The yellowRCA connectors in the middle of the board arevideo-in. The video signal moves down the cen-

ter of the board, through sync separator andZetex amp, to the row of resistors that shift the5-V TTL levels to 3.3 V for the FPGA. The

small header at the top right corner connects thisboard to the Digilent board (not shown). Theboard on the left is the 3.58-MHz color trap

and comparator. The long white wire going fromthe bottom right to the middle left is video.

We could build a system that could potentially turn Dad into a Guitar Hero master—the gaming equivalent to my real-life guitar heroes, Eric Johnson, Jimmy Page and Eddie Van Halen. But I had to surmount a few engineering hurdles first.


Finding a puck on the screen was liketrying to find a doughnut on a conveyorbelt by looking through a pinhole. Bychoosing an imaginary line right above theflame tips that jump from the cylinders atthe bottom of the screen, we found a rela-tively quiet spot from which to watch forthe white centers of the pucks.

The only input to the system is a single,1-volt peak-to-peak composite video signalcontaining all the information necessary todrive a video monitor. By separating thecomposite video into horizontal and verti-cal sync, I could control counters in thecomputing platform to locate any spot onthe screen. Programmatically, I could say“look at line 198, position 1340, and tellme if it’s white or dark.”

Composite video in, comparator out at asingle spot (see Figure 2). I used a NationalLM1881 sync separator and a ZetexZXFV089 video amplifier with dc restoreto generate the timing signals for the com-puting platform. Between these two parts, Igenerated horizontal and vertical sync and areferenced copy of the original video signal.

From here, the video signal takes twopaths. Path one is through an op-amp-based adder, so that I could selectively adda small offset voltage before the videopassed through an amplifier/buffer andout to a monitor. By using the computingplatform to signal “highlight,” I couldmake areas of the screen lighter than nor-mal. This came in handy when I was trou-bleshooting the system, since it let mehighlight, for instance, “areas where thecomparator sees high values” or “this iswhere I’m sampling the screen.”

Path two is to a 3.58-MHz trap toremove the color information from the sig-nal. Color information rides on top ofluminance information, and all I wantedto do was look at the brightness of a spoton the screen. If I left the color informa-tion in the signal, the next stage wouldhave had a very difficult time dealing withit. Next, it was on to a very fast (7-nanosecond) comparator to generate the“white/not white” signal.

Challenge No. 2: Controlling the plasticguitar. There are five buttons on the guitar’sneck and a strum bar in its body. Each of

XPER IMENT

Figure 3 – The smaller green board is the guitar controller. Below it is the prototype board with the six optoisolators, used to electrically press the buttons on the guitar. The ribbon cable on the

right goes to a DB9 connector mounted on the side of the guitar.

Figure 4 – Here’s the entire AutoGuitarHero system. Counterclockwise from the left, boards include theDigilent USB interface, Digilent Spartan-3 FPGA demo board, driver board and analog interface board.The small board in the middle of the picture is used to attach scope probes to the system for troubleshoot-

ing. Notice the yellow video-in connector and the SMA video-out connector.

those six controls had to be operated undercomputer control.

I disassembled the guitar body andpoked around until I figured out which padson the circuit board were driven by the var-ious buttons. Six bits from the computingplatform drove an LS244 with current-lim-iting resistors on the outputs. Optoisolators(see Figure 3) mounted on a small circuitboard inside the guitar body drove each ofthe button pads. The computing platformand guitar body are connected togetherthrough a nine-conductor RS232 cable.

Challenge No. 3: Having the analog sec-tion and I/O prototyped and debugged,my thoughts turned to picking the rightcomputing platform. Since I was an FPGAnovice, it had to be reasonably simple, real-ly fast, really powerful, have a stable pro-gramming environment and should besomething I could grow into and learnfrom. That’s when I found Digilent.

Digilent is a great company, and theyoffer a Spartan-3®-based FPGA demoboard that I thought would be perfect forthe application. In fact, there are so manygates and so much I/O that I knew theboard would be overkill—but as with allcomputing platforms today, you get a lot ofbang for the buck.

Challenge No. 4: A small matter of pro-gramming the system. Now that I had aworking hardware platform, it was time towrite some code (Figure 4). I downloadedand installed ISE® from the Xilinx site andgot the USB interface up and running so Icould program the Spartan-3. There’s apretty steep learning curve before you cando anything useful, but once you get yourhead around the idea of designing withgates, it’s incredibly powerful.

In the microprocessor world, everythinghappens sequentially. Set up this variable. Dothis. Do this next. Then do this. If this hap-pens, do something else. It’s systematic andprogressive, and timing is important. “Howlong does it take to go through this loop?”

The world of programmable logic is dif-ferent. Everything happens in parallel—andat almost wire speed. No counting clockcycles. Everything just occurs at the sametime. If you need a 13-bit counter, you writea line of code. If you need a six-input gate,you write a line of code. It’s fantastic, likereaching into a magical junk box and pullingout the perfect part. Need to build a divide-by-50 million counter? Instead of wire-wrap-ping pin after pin to make one, just write aline of code. Didn’t get it right? No need foran unwrap tool—just rewrite the line.

For instance, here’s the VHDL code toset up two counters:

signal line_cnt:std_logic_vector(8

downto 0);9 bit counter 0 to 511

signal samp_cnt:std_logic_vector(12

downto 0);13 bit counter 0 to 8191

Now, take a look at the VHDL code tocount lines. It counts from 0 to 262 andthen resets to zero each time vertical syncgets asserted:

process(v_sync, burst, line_cnt)

begin

if burst'event and burst = '0'then

if v_sync = '0' then

line_cnt <= "000000000";

else

line_cnt <= line_cnt + 1;

end if;

end if;

end process;

Similarly, here’s the counter for positionin a line. It counts from 0 to about 3,054using the on-board 50-MHz clock, and thenresets to zero at the beginning of each line:

process(clk, samp_cnt, burst)

begin

if clk'event and clk = '1' then

if burst = '0' then

samp_cnt <= "0000000000000";

else

samp_cnt <= samp_cnt + 1;

end if;

end if;

end process;

Everything in the design is driven offthese two counters. We need the programto watch five spots on the screen for a puck.We ask the code to generate five instancesof the following code:

process(clk)

begin

if clk'event and clk = '1' then

if v_sync = '1' then

if samp_cnt = puck_center_y then

if line_cnt = sense_line and

window_in = '1' then

yellow_frame_latched <= 1;

end if;

end if;

end if;

end if;

end process;

“If it’s the rising edge of clk (every 20 ns)and we’re not in reset, and if we’re in theright position (samp_cnt) and we’re on theright line (line_cnt), and if the comparator(window_in) has found a white spot, thenthere’s a puck in the right position on thescreen. So store the event (in this case, yel-low_frame_latched <= 1).”

The actual algorithm is a bit more com-plicated, because a fret moving over that areawill also meet the above criteria. Thus, thecode watches five lines in a row and incre-ments a counter for each line the comparatorfinds to be white. Then, later in the program,it checks to see the value of the counter. Ifjust a fret passed the spot, there is a count ofless than 3. If a puck passes the spot, thecounter contains a value of 3 or greater.


XPER IMENT

Now that I had a working hardware platform, it was time to write some code. I downloaded and installed ISE from the Xilinx site and got the USB

interface up and running so I could program the Spartan-3.


At the end of every frame we store theresults of the puck hunt in a shift register,clocked every frame (about 1/30th of a sec-ond). Notice how you can just copy this codefour more times and have a six-deep x1 shiftregister running to store the puck informa-tion in time so we can offset the actual strumbutton press x/30th second to wait for thepuck to be in proper position on the screen.

process(v_sync)

begin

if v_sync'event and v_sync = '0'

then

for i in 0 to SR_SIZE - 2 loop

shift_reg_y(i+1) <=

shift_reg_y(i);

end loop;

shift_reg_y(0) <=

yellow_frame_latched;

end if;

end process;

Finally, we use the shift register tap 5 to seeif we have to activate the strum bar. If there arepucks in this tap of the shift register, ping thestrum bar. If not, check next time.

process(clk, strum_stretched_srps_g,

strum_stretched_srps_r,

strum_stretched_srps_y,

strum_stretched_srps_b,

strum_stretched_srps_o)

begin

if clk'event and clk = '1'then

strum_center <=

strum_stretched_srps_g

OR strum_stretched_srps_r

OR

strum_stretched_srps_y

OR strum_stretched_srps_b

OR strum_stretched_srps_o;

end if;

end process;

Although the actual code is about 1,000lines, the code I showed above does all theheavy lifting. There are some pulse stretchersand timing helpers in there too, but this isthe core.

Son of Guitar Hero I worked on this project on and off, in myspare time, for about three months. Each stepwas a learning experience. Before Guitar Hero,I had little hands-on knowledge of compositevideo, FPGAs, VHDL or much else this proj-ect took to complete. Afterwards, I can say thatI know my way around Xilinx’s IDE, and I’velearned how unbelievably powerful an FPGAcan be. It’s a great environment and one that Iplan on using in the future.

In fact, after I posted the autoguitarhero.comWeb site, the folks at Digilent noticed amarked increase in traffic to their site. Theyfound out about the AutoGuitarHero projectand wanted to use it in a demo. I agreed to sup-ply them with a system they could show in abooth at an engineering show, and asked to bepaid with store credit at Digilent. How’s thatfor commitment?

At this point, the AutoGuitarHero hasbeen taken apart and the various subassem-blies are sitting in a box. Its work is done. Iput my initials at the top of the leader boardand they stayed there for almost a full day. Foryou see, I didn’t computerize the whammybar or the guitar shake—and by operating thewhammy bar and shaking the guitar correct-ly, you can earn more points.

My wife and I went to dinner the eveningthat I completed the project and my sonwent up to my shop. He turned on theAutoGuitarHero, and operated the whammybar and shook that guitar better than I evercould. When I got home, I found his initials,not mine, at the top of the screen (Figure 5).What’s a dad to do?

For the complete blow-by-blow of howMichael Seedman built this project, visit hissite www.autoguitarhero.com.

XPER IMENT

Figure 5 – Despite my best efforts, it didn’t takelong for my son, Alex, to trump my best score, as

seen in the telltale on-screen leader board, by usinga little bit of (uncomputerized) body English.

by Clayton CameronSenior field applications engineerXilinx, Inc. [email protected]

There comes a time in most design cycleswhere a little creativity—you might call itdigital duct tape—is required to make yourdesign work. Over the past eight years, I’veseen some of the best engineers do trulyamazing things with this approach, oftenusing one essential tool: FPGA Editor.

FPGA Editor allows you to see yourimplemented design and review it to deter-mine if that is truly what you wanted at theFPGA fabric level—a must for any engi-neer or FAE. Let’s say you’re given a co-worker’s design on which to make changes,and their HDL source is hard to under-stand or there are no source comments ordocumentation. Maybe you just want tolock down some clock logic but you don’tknow the instance name or how to lock itin place. Certain tips and tricks for explor-ing the FPGA fabric and creating com-mand line patches can help you meetfast-approaching deadlines.

General Fabric ExplorationOne of the first things I normally do whenXilinx releases the tools for a new FPGA isopen FPGA Editor and look at the FPGAfabric. You get there by going to the Xilinx→ ISE → Accessories menu and clickingon the FPGA Editor icon or by typingfpga_editor at the command prompt.After the GUI is open, select New under


Digital Duct Tape with FPGA EditorDigital Duct Tape with FPGA Editor

Xilinx Senior FAE ClaytonCameron shows us his tipsand tricks for using hisfavorite tool in the ISE toolsuite, FPGA Editor.

ASK FAE -X

the File menu. FPGA Editor will ask youfor a design file name and a physical con-straint file. At this point you have nodesign files, so enter anything for thedesign file name (for example, test.ncd)and select a part type you wish to review.FPGA Editor will use the same file namefor the physical constraint file and load ablank design.

Another option is to compile one of theprovided ISE® tool suite example designsand load it into FPGA Editor for fabricreview. Loading an example design will giveyou more details and make it easier tolocate items of interest.

To navigate FPGA Editor, you reallyonly need to know two things:

1. How to zoom in and out using theCTRL / Shift key shortcuts.

2. How to zoom to selected items usingthe F11 key.

To zoom in and out quickly withoutusing the GUI buttons, simply hold downthe Ctrl and Shift keys, and use the leftmouse button to zoom in and the right oneto zoom out. To find any item quickly,select it in the List window located in theupper-right corner of the GUI. Onceyou’ve located the desired item, hit F11.The Array window will zoom in on it.

FPGA Editor has four main windows:List, World, Array and Block. The Listwindow shows all the active items in yourdesign. The pulldown menu at the top ofthis window will allow you to select itscontents—that is, a list of placed orunplaced components, nets or unroutednets, and so on.

World’s view window gives you a lookat the complete FPGA die at all times; thiscomes in handy if you are trying to deter-mine how you previously routed a net.The Array window, meanwhile, is youractive view of the fabric and logic. Whenyou double click on any item within theArray view, the Block view will appear,offering a detailed look at the item orlogic element of interest.

You can duplicate any of these win-dows for easier navigation and editing ofyour design. In many cases it is handy to

dow at the bottom of the GUI as it pro-duces something like this:

comp “DCM_BASE_inst_star”, site

“DCM_ADV_X0Y9”, type = DCM_ADV

(RPM grid X73Y202)

This is useful data. Copy and paste theabove line into your UCF and make thefollowing changes to lock down this DCMlogic:

INST “DCM_BASE_inst_star”

LOC=DCM_ADV_X0Y9;

Using this method, you can lock downalmost anything in the FPGA. Here isanother example for BUFG locking:

comp “BUFG_inst_star”, site

“BUFGCTRL_X0Y20”, type = BUFG

(RPM grid X73Y124)

INST “BUFG_inst_star”

LOC=BUFGCTRL_X0Y20;

Return to the List window again andhighlight the same DCM. Double clickingon it will bring up the Block view of theDCM and show you all the settings andparameters. This is very powerful featurethat can apply to any logic item within thefabric. If you select a slice and double clickon it, you can see how that slice was rout-ed and whether the carry chain or localflip-flop was used.

The Block view contains a button barwith many more options. One worth not-ing is the F= button, which displays thecomplete configuration of the items used in

have a second Array window open toallow you to work on two different partsof the design at once. For example, let’ssay you had to add a route between a glob-al clock buffer and a flip-flop at the bot-tom of the chip. It’s much easier to do thisif you have one Array window on theglobal clock buffer output and a secondon the clock input of the flip-flop of inter-est. Otherwise, you will be zooming inand out to locate the source and destina-tion of the route, which is not fun.

On the right side of the FPGA EditorGUI is a button bar with 20 function but-tons to help you view and edit yourdesign. You can add more function but-tons with your own functions by editingthe fpga_editor.ini file located in the$XILINX/data directory. While review-ing your design, use the INFO buttonfrom time to time. It dumps all the infor-mation on the selected item to theConsole window. This can come in handy,since you can highlight the data in theConsole window and copy it for use else-where, such as writing UCF constrains.

Once you have the basics down, you canstart reviewing the fabric of the FPGA. Inormally start my fabric review with theclocking logic. This would include the dig-ital clock manager (DCM), phase-lockedloop (PLL), global buffer (BUFG), region-al clock buffer (BUFR), I/O buffer(BUFIO) and the different clock regions.(To alphabetize the items, go to the LISTwindow and click on the word Type.) Clickon a DCM and hit F11. The ARRAY win-dow will locate the selected DCM andzoom in on it. Go ahead and click on theDCM once and watch the Console win-


ASK FAE -X

FPGA Editor allows you to see yourimplemented design and review it todetermine if that is truly what youwanted at the FPGA fabric level–a must for any engineer or FAE.

that slice. For example, if you used a Lut6and a flip-flop, the F= button would giveyou the Boolean equation for the LUT andthe mode the flip-flop is configured for.

It is one thing to read the Xilinx userguide; it’s another to see all the logic,switches and parameters spread out onyour computer screen for review. Once youbecome familiar with where everything islocated, you’ll be surprised at how it willhelp you write and verify your design.

Scripting in Flow PatchesFPGA Editor has the ability to record youractions while you edit a design in the GUI.You can save and even play them back toreproduce your work at a later date. This isextremely powerful when it comes to mak-ing “in flow” changes to your design attimes when it’s not possible to change yourRTL. Let’s say you’ve created a design withthird-party IP or Xilinx encrypted IP, andit contains a global clock and a DCM thatgenerates a clock call interface_clk.Then let’s assume the ASIC to whichyou’re interfacing has a reported erratumand cannot accept data on the rising edgeof the interface_clk as advertised. Howdo you fix this problem?

Well, you could alter your PCB andremove the broken ASIC or have the third-party IP team review altering the clockoutput logic to provide an interface_clkwith a 90-degree phase shift. Both of thesesolutions are time-consuming and costly.A simpler suggestion would be to useFPGA Editor to record the actions andmake the necessary changes to the inter-face_clk logic to provide the correct clockphase to the broken ASIC. Once you havean FPGA Editor script of the changes, youcan play them back from your commandline build script and continue your FPGAflow as normal. When the broken ASIC isfixed, you simply remove the FPGAEditor script playback from your buildscript and the interface_clk will returnto its normal behavior.

To begin hand-editing your design, youneed to enable read/write privileges inFPGA Editor. Go to the menu bar andclick on File → Main Properties. Underthis menu, you can adjust the edit mode

from No logic change to Read/Write.Click Apply and you can now edit yourdesign. The next step is to begin recordingall your changes with FPGA Editor; sim-ply go to the menu bar and click on Tools→ Scripts → Begin Recording. FPGAEditor will prompt you for a script name(such as patch.scr). Once you’ve enteredit, you can begin making the necessarychanges to your design.

It is always a good idea to run a designrules check (DRC) on your design to see if

it raises any red flags. In my exampledesign, I have 14 warnings that should beignored. Next we will need to locate theDCM for the interface_clk and createanother clock called DCM_clk90_out fromthat DCM’s 90-degree output. You willneed to route that clock to a BUFG to usethe global clock routing. To add a BUFG,simply find an unused BUFG location inthe fabric, right-click on it and select Add.The tool will then prompt you to give theBUFG a name (clk90_bufg) and deter-mine its type: BUFG (see Figure 1).

Once you’ve created the new BUFG,you need to hook up its input and outputto the desired locations. In this case, theDCM’s 90-degree output will drive theBUFG. To make this connection, in win-dow Array1, click on the DCM’s 90-degree output pad and in window Array2,click on the input pad of the BUFG whileholding down the Ctrl key. Then releasethe Ctrl key, click your right mouse but-ton and select Add. The tool will promptyou for a name of that new net connec-tion. This in turn links the DCM andBUFG together via the new net (seeFigure 2).

The output of the clk90_bufg needs toreplace the clock on an IOB that is drivenby the original interface_clk. To remove


ASK FAE -X

Figure 1 – The properties window allows the userto configure and name the selected logic item.

Figure 2 – When hand routing between two logic items, use two Array windows for easy selection of the source and destination, as shown by the red triangles.


the IOB from the original clock domain,you need to locate the IOB, highlight theclock input pads and hit the Delete keyto remove this connection. This will inturn allow us to connect the clock fromour new clk90_bufg and complete thepatch. To connect the output of theBUFG (clk90_bufg), highlight the out-put pad of the BUFG in window Array2and select the clock inputs of the IOB inwindow Array1 while holding the Ctrlkey. Release the Ctrl key and click theright mouse button to display the optionmenu and select Add. This makes the finalconnection between the BUFG outputand the IOB that drives the newly creat-ed interface to the downstream ASIC,which in turn allows interface_clk90 tocapture the transmitted data correctly.

This completes the patch for the ASIC.Now you should rerun the DRC checker tomake sure you didn’t introduce any newerrors. Go to the menu bar and click onTools → DRC → Run.

Once your script is complete and errorfree, you need to go back to the menu barand select Tool → Script → EndRecording. This will stop and close thescript for use the next time you wish tomake any RTL changes and this ASICpatch is required. It’s a good idea to open

the script file in a text editor and removeall the GUI Post and Unpost commands.These commands are not needed, andthey make the script hard to read andreview. The text below is the script for ourASIC patch. As you can see, it is fairlystraightforward and easy to read.

unselect -all

setattr main edit-mode Read-Write

add -s “BUFGCTRL_X0Y28” comp

clk90_bufg ;

setattr comp clk90_bufg type BUFG

unselect -all

select pin ‘BUFGCTRL_X0Y28.I0’

select pin ‘DCM_ADV_X0Y11.CLK90’

add

post attr net $NET_0

setattr net $NET_0 name

DCM_clk90_out

unselect -all

select pin ‘OLOGIC_X0Y2.CLK’

delete

unselect -all

select pin ‘ILOGIC_X0Y3.CLK’

delete

unselect -all

select pin ‘ILOGIC_X0Y3.CLK’

select pin ‘OLOGIC_X0Y2.CLK’

select pin ‘BUFGCTRL_X0Y28.O’

add

post attr net $NET_1

setattr net $NET_1 name

clk90_bufg_out

unselect -all

drc

save -w design “patch.ncd”

“patch.pcf”

exit

end

Take a look at the script and see if youcan pick out the actions you did in the GUI.

It’s important to understand that youcan play back this script from the GUI(under the menu bar Tool → Scripts →Playback) or the command line. To playback your patch from your build script,simply add the following command:

fpga_edline yourdesign.ncd

yourdesign.pcf -p yourscript.scr

You should execute this command afterPAR, when the NCD and PCF files arecompleted.

FPGA Editor truly is a power-user tool,although not everyone would want or needto use it in their designs. But when youneed something special or you need tobend the rules a bit to get even more fromyour design, there is no other tool like it.Your FAE is the one who can show it toyou, and demonstrate how FPGA Editorcan help you with debug, verification and,of course, bending the rules.

Clayton Cameron is a Senior FAE based inToronto. He joined Xilinx in 2000, sup-porting telecom customers in the Ottawaoffice. As an FAE, Clayton greatly enjoyshelping customers and solving problems.He also enjoys the diversity of his positionand the variety of challenges he faces on adaily basis.

In his spare time, he lets off steam in thegym, keeping physically and mentally fit. Athome, he loves to spend time with his wifeand two young children..

ASK FAE -X

Figure 3 – The BUFG output net properties window shows the number of net connections and the fully routed net status.

XAPP469: Spread-Spectrum Clocking Reception for Displays www.xilinx.com/support/documentation/application_notes/xapp469.pdf

Display applications like flat panels and video players commonly usehigh-speed low-voltage differential signaling (LVDS) interfaces totransfer video data. To address electromagnetic compatibility (EMC)issues, designers can use spread-spectrum clocking to reduce theimpact of the radiated energy these signals produce. When designersuse a spread-spectrum clock as the source from which LVDS signalsare derived, the radiated energy is spread across a range of frequen-cies, effectively reducing the peak energy at any one frequency.

In this application note, Jim Tatsukawa shows that a spread-spec-trum clock will drive the LVDS interfaces of Spartan®-3E andExtended Spartan-3A family devices with no adverse effects on a sys-tem’s performance. Tatsukawa explains how to estimate the maximumspread-spectrum clock modulation frequency for the digital clock man-ager (DCM) and describes a simple test setup to evaluate the effects ofthe spread-spectrum clock on a typical LVDS communications path.

Note: This application note applies to Spartan-3E and ExtendedSpartan-3A family devices only.

XAPP1117: Software Debugging Techniques for PowerPC 440 Processor Embedded Platformswww.xilinx.com/support/documentation/application_notes/xapp1117.pdf

In this application note, Brian Hill discusses the use of the XilinxMicroprocessor Debugger (XMD) and the GNU software debug-ger (GDB) to debug software defects.

Hill describes how you can use XMD to download executables tothe system, to control running these applications with breakpointsand to examine or modify memory and CPU registers. He alsoexplains how to use the GDB’s symbolic software debugger in concertwith XMD. Doing so can streamline tasks that are normally cumber-some to perform with XMD alone.

The note demonstrates how to use GDB to debug software local-ly (with a local process running on the same machine and operatingsystem as GDB itself ) to connect to the GDB stub, also called GDBserver, running within XMD. XMD automatically starts the GDBserver after the user connects to the target processor.

To use the application note effectively, get your hands on anML507 board (www.xilinx.com/products/boards/ml507/reference_designs.htm), which includes a Virtex®-5 FXT and, in turn, aPowerPC® (PPC) 440 processor core.

The note includes a design implementation that Hill inten-tionally seeded with software defects. Hill then reviews how tofind and fix these bugs and lists the best tools for the job.

XAPP1052: Bus Master DMA Reference Design for the Xilinx Endpoint Block Plus Core for PCI Expresswww.xilinx.com/support/documentation/application_notes/xapp1052.pdf

In this application note, Jake Wiltgen shows how to design andimplement a bus master direct memory access (DMA) design for theendpoint block, plus wrapper core, for PCI Express® using theVirtex®-5 FPGA, which includes an integrated block for PCIExpress. A bus master DMA (BMD) design moves data to and fromhost memory. By using one in your applications, your design canachieve higher throughput and performance, along with lower over-all CPU utilization.

Included in this BMD reference design is a DMA kernel modedriver, including source and Windows 32-bit software application,both provided by Avnet. The application note also includesinstructions for installing both the driver and application.

To view other Xilinx application notes, visitwww.xilinx.com/support/documentation/application_notes.htm.


Application NotesApplication NotesIf you want to do a bit more reading about how our FPGAs lend themselves to a broad number of applications, we recommend these application notes. If you want to do a bit more reading about how our FPGAs lend themselvesto a broad number of applications, we recommend these application notes.

XAMPLES . . .

by Stuart ElstonSenior Manager, Customer EducationXilinx, [email protected]

The programmable logic industry has come along way from the world of glue logic thatsits at the edge of the board. Today, FPGAsbeat at the very heart of complex systems inmany products and industries. Xilinx®

FPGAs provide a new level of system designcapabilities through soft MicroBlaze™processors, hard PowerPC® processors andsilicon-efficient architectural resources.

The technology and tools are withoutdoubt becoming more sophisticated, yet thebusiness pressures of time-to-market, marketlongevity and product flexibility demand thatdesigners capitalize on these technologyadvances immediately. Training and the proac-tive acquisition of skills are key.

The Xilinx Customer Education group andits exclusive network of Authorized TrainingProviders are here to help. Complementary toour core FPGA classes, we offer four coursesfocused purely on embedded design, cateringto both hardware and software engineers.

Our Embedded Systems Developmentcourse brings experienced FPGA designers upto speed on developing embedded systemsusing the Xilinx Embedded Development Kit(EDK). The lectures and labs also delve intothe features and capabilities of the XilinxMicroBlaze soft processor and the PowerPC440 processor. The hands-on labs provideexperience in the development, debugging andsimulation of an embedded system. Morespecifically, the course will introduce you tothe various tools that encompass the EDK,and teach you to rapidly architect an embed-ded system containing a MicroBlaze or IBMPowerPC processor and Xilinx-suppliedCoreConnect bus architecture IP by using theBase System Builder. You will use the Eclipse-based Software Development Kit (SDK) to

develop software applications and debugsoftware, and will create and integrate yourown IP into the EDK environment.

Our Advanced Features and Techniquesof Embedded Systems Development coursegives embedded-systems developers the nec-essary skills to develop complex systems.Building on skills gained in the EmbeddedSystems Development course, it teaches stu-dents to assemble and architect a completeembedded system, and to identify the stepsinvolved in integrating user IP. You will use aBoard Support Package to target multipleoperating systems, apply advanced debuggingtechniques, design a flash memory-based sys-tem and boot load from flash, while applyingvarious techniques to improve performance.

The Embedded Systems SoftwareDevelopment course introduces you to soft-ware design and development for Xilinxembedded-processor systems. You will learnthe basic tool use and concepts required forthe software phase of the design cycle, afterthe hardware design is completed. Topicscover the design and implementation of thesoftware platform for resource access andmanagement, including device driver devel-opment and user application debugging andintegration. Practical implementation tips

and best practices are provided throughoutto enable you to make good design decisionsand keep your design cycles to a minimum.You will have enough practical informationto get started developing the software plat-form for a Xilinx embedded system based ona PowerPC 440 or MicroBlaze processor.This course is aimed at software engineers.

Embedded Open-Source Linux Devel-opment is a new, intermediate-level, two-day course that shows embedded-systemsdevelopers how to create an embeddedopen-source Linux operating system on aXilinx development board. Hands-on expe-rience ranges from building the environ-ment to booting the system using a basic,single-processor system-on-chip design withLinux 2.6 from the Xilinx kernel tree. Thiscourse introduces embedded Linux compo-nents, open-source components, environ-ment configurations, network componentsand debugging/profiling options for embed-ded Linux platforms. The primary focus ison embedded Linux development in con-junction with the Xilinx tool flow.

Our Authorized Training Providers canprovide these courses in your locale. Contactthem for an up-to-date schedule by going towww.xilinx.com/support/training/atp.htm.


Embedded for SuccessThe Xilinx Customer Education group prepares you to take advantage of embedded FPGA design.

ARE YOU XPER IENCED?

Conference Date LocationConvergence 2008 October 20-22, 2008 Detroit, MISDR Forum October 26-30, 2008 Washington, DCElectronica 08 November 11-14, 2008 Munich, GermanyMILCOM 2008 November 17-19, 2008 San Diego, CAEmbedded Technology 2008 November 19-21, 2008 Yokohama, JapanInterBEE November 19-21, 2008 Chiba City, JapanCES 2009 January 8-11, 2009 Las Vegas, NVMobile World Congress 2009 February 16-19, 2009 Barcelona, SpainEmbedded World 2009 March 3-5, 2009 Nuremburg, GermanyEmbedded Systems Conference April 1-5, 2009 San Jose, CA

Upcoming Conferences In addition to offering training at multiple locations, we exhibit our technologies at several conferences.

Logic Design Tools

ISE Foundation™ Software Description: The industry’s most completeprogrammable logic design solutionLatest version number: 10.1.3Date of latest release: September 2008Previous release: 10.1.2Download the latest patch:www.xilinx.com/downloadRevision highlights: Besides adding supportfor the new Virtex®-5 TXT FPGA Platformas well as quality improvements, ServicePack 3 provides enhancements to the ISEProject Navigator, Constraints Editor,CORE Generator™ System, FloorplanEditor and implementation tools.

With Service Pack 3, the IBISWriternow provides updates to the IBIS modelsfor the Virtex-5 family of FPGAs throughXilinxUpdate.

ISE Simulator Description: A complete, full-featuredHDL simulator integrated with ISEFoundationLatest version number: 10.1.3Date of latest release: June 2008Previous release: 10.1.2Download the latest patch:www.xilinx.com/download Revision highlights: Support for the newVirtex-5 TXT FPGA Platform and qualityimprovements

ModelSIM Xilinx Edition III (MXE-III) Description: A low-cost version of the industry’s most popular simulationenvironmentLatest version number: 6.3cDate of latest release: March 2008Previous release: 6.2gRevision highlights: No new updates sincethe release of the ISE Design Suite 10.1.

PlanAhead™ Description: A faster, more efficient FPGAdesign solution to help achieve your per-formance goals in less timeLatest version number: 10.1.3Date of latest release: September 2008Previous release: 10.1.2Download the latest patch:www.xilinx.com/download Revision highlights: Support for the newVirtex-5 TXT FPGA Platform and qualityimprovements

ChipScope™ Pro and ChipScope Pro Serial I/O ToolkitDescription: Real-time debug and verification tools for Xilinx FPGAsLatest version number: 10.1.3Date of latest release: September 2008Previous release: 10.1.2Download the latest patch:www.xilinx.com/downloadRevision highlights: In addition to sup-port for the new Virtex-5 TXT FPGA

Platform and quality improvements,Service Pack 3 also improves the scrollbar in the Waveform viewer Bus/Signalcolumn to adjust scroll and justify text.This new feature makes it easier to viewsignals and buses with extremely longhierarchical names.

Support for the Virtex-5 TXT FPGAPlatform is now available for all ChipScopePro and ChipScope Pro Serial I/O Toolkitcores, including the IBERT core. In addi-tion, Service Pack 3 also includes improve-ments to the ChipScope Pro Serial I/OToolkit to determine the optimal DecisionFeedback Equalizer settings for the Virtex-5GTX RocketIO™ transceivers.

ISE WebPACK™Description: A free solution for your XilinxCPLD or medium-density FPGA designLatest version number: 10.1.3Date of latest release: September 2008Previous release: 10.1.2Download the latest patch:www.xilinx.com/webpackRevision highlights: The improvementsdescribed above in the revision highlightsfor ISE Foundation apply to all devicessupported in ISE WebPACK.

Embedded Design and DSP Tools

Platform Studio and EDK (Embedded Development Kit)Description: An integrated developmentenvironment of embedded processingtools, MicroBlaze™ soft processor core, IP,software libraries and design generatorsLatest version number: 10.1.3Date of latest release: September 2008Previous release: 10.1.2Download the latest patch:www.xilinx.com/download Revision highlights: In addition to quali-ty improvements, Service Pack 3 includes


Xilinx Tool & IP UpdatesXTRA, XTRA

Xilinx is continually improving its products, IP and design tools as it strives tohelp designers work more effectively. Here we report on the most current updatesto the flagship FPGA development environment, the ISE® Design Suite, as well asother design tools and IP. The latest service packs offer significant enhancementsand new features. Keeping your installation of ISE up to date with these servicepacks will ensure the best results for your design.

Updates are available from the Xilinx Download Center atwww.xilinx.com/download. For more information on the ISE Design Suite or todownload free 60-day evaluations of any of the products, visit www.xilinx.com/ise.Also, see the Tools of Xcellence section in this issue for news of IP, tools and development boards from Xilinx partners.

support in the EDK’s Base SystemBuilder for the Virtex-5 FPGA ML510Embedded Development Platform.

Service Pack 3 also includes new IPcores in Platform Studio. In SystemFlash v1.00a simplifies access to the on-board flash memory for the nonvolatileSpartan®-3AN family of FPGAs, whileTFT Controller 1.00a provides easycontrol of the text display on FPGAdevelopment boards. In addition,Agilent trace capture tools support earlyversions of the MicroBlaze soft process-ing core. Upgraded trace capabilityincludes capture of new MicroBlazeinstructions such as those for MMU,PID, FPU and FSL.

System Generator for DSP Tool Kit Description: Enables development ofhigh-performance DSP systems usingproducts from The MathWorks, Inc.Latest version number: 10.1.3Date of latest release: September 2008Previous release: 10.1.2Download the latest patch:www.xilinx.com/downloadRevision highlights: In addition toquality improvements, Service Pack 3adds new support for the FFT 6.0blockset in System Generator, provid-ing up to 34 bits data and phase fac-tor width. Other improvementssupport block floating-point scalingfor streaming, pipelined and I/Oarchitecture, and DSP48 abstractionfor mathematical operators.Accumulators, AddSub and counterblocks can now be implemented usingeither a DSP48 or the original LUT-based implementation, providingdesign portability across all supportedXilinx devices.

Thanks to enhanced printing sup-port, users can now print directly fromthe WaveScope toolbar or file menuwithout having to perform manualscreen captures. Finally, new IP versionchecking provides a warning if an IPcore scheduled to be removed in afuture version of System Generator forDSP is used.

Installation Instructions:www.xilinx.com/ipcenter/coregen/ip_update_install_instructions.htmListing of all IP in this release: www.xilinx.com/ipcenter/coregen/101_3_datasheets.htmRevision highlights: Xilinx intellectualproperty (IP) cores, including LogiCOREIP cores, are delivered through softwareupdates available from the XilinxDownload Center. The latest versions ofIP products have been tested and aredelivered with the current IP releases.

In addition to quality improvements,Service Pack 3 ISE IP Update 10.1.3provides new features, functionality andexamples for many of the XilinxLogiCORE IP cores. It includes opti-mized uplink and downlink basebandmodules that contain complex function-ality, including rate matching/dematch-ing, assembly/reassembly, turbo codecsand CRC. These high-quality cores areproduction ready, enabling users to real-ize fast and efficient baseband designs inXilinx FPGAs, while significantly reduc-ing development effort. These cores arescalable from femto- to macrocell appli-cations, and are designed to meet 3GPPLTE wireless specifications for bothFDD and TDD variants.

Learn more about the 3GPP LTE ULChannel Decoder at www.xilinx.com/products/ipcenter/DO-DI-CHDEC-LTE.htm. Details of the 3GPP LTE DLChannel Encoder are available athttp://www.xilinx.com/products/ipcenter/DO-DI-CHENC-LTE.htm.

Enhancements to existing IP cores:A new low-power implementation optionhas been added to the Block MemoryGenerator. Also in this release are updatesto other popular CORE Generator IPcores, including the Memory InterfaceGenerator (MIG); PCI 32, PCI 64 andPCI-X; Content Addressable Memory(CAM), and FFT v6.0.

A number of cores now support theVirtex-5 TXT family of FPGAs. Amongthem are the Block Memory Generator,FIFO Generator, CAM, Virtex-5RocketIO GTX Transceiver Wizard,Endpoint Block Plus Wrapper for PCIExpress, 10 Gigabit Ethernet MAC,Virtex-5 Ethernet MAC Wrapper, XAUI,SPI-4.2 and FFT.

AccelDSP™ Option to System Generator for DSP Description: Enables a top-down MATLAB® language-based DSP design methodologyLatest version number: 10.1.3Date of latest release: September 2008Previous release: 10.1.2Download the latest patch:www.xilinx.com/downloadRevision highlights: In addition to qualityimprovements, Service Pack 3 includes the“use_logicore” directive, which tells AccelDSPto use an optimized LogiCORE™ for thespecified operator in the design, allowinggreater quality-of-results. Also, a newoptional parameter called “enable” has beenadded to the “insertpipestage” directive.This allows you to specify whether or notan associated hierarchical directive isenabled or disabled.

A new parameter called “register_out-put,” which has been added to the“memmap” directive, allows you to specifywhether or not the output of the memoryis registered. And a new parameter called“enable,” now added to the “insert-pipestage” directive, lets you specifywhether or not an associated hierarchicaldirective is enabled or disabled.

In addition, new LogiCORE support isnow available for Accumulator, MultiplyAccumulator and Multiply Adder.

Xilinx IP Updates

Name of IP: ISE IP Update 10.1.3Type of IP: AllTargeted application: Xilinx develops IPcores and partners with third-party IPproviders to decrease customer time-to-market. The powerful combination ofXilinx FPGAs with IP cores providesfunctionality and performance similar toASSPs, but with flexibility not possiblewith ASSPs.Latest version number: 10.1.3Date of latest release: September 2008Access the latest version:www.xilinx.com/download

Informational URL:www.xilinx.com/ipcenter/coregen/updates_101_ip3.htmRelease Notes: www.xilinx.com/support/documentation/user_guides/xtp025.pdf


XTRA, XTRA

by Mike SantariniPublisher, Xcell JournalXilinx, [email protected]

It’s not just the universe that startedwith a big bang. So did one of thehottest new companies in Silicon Valley,PLX Devices, an automotive electronicsstartup that owes its existence to itsfounder’s blown engine.

PLX’s latest product is a plug-indevice that monitors gas usage andencourages fuel-efficient driving. Itsfirst was an air-to-fuel ratio gauge thatPaul Lowchareonkul, the 28-year-oldCEO, crafted after ruining his car’sengine following a losing race with aMercedes. Both are built on XilinxFPGA platforms.

Lowchareonkul, who had long nurseda passion for cars and for racing, drove asouped-up Honda Prelude during hisyears at UC Irvine, where he created hisown double major in EE and computerengineering. One day he pulled up nextto a brand-new Mercedes SLK-320 andchallenged the driver to race.

“I thought I had my car tuned per-fectly and I thought I could beat him, buthe beat me by a car length,” saidLowchareonkul. “I then went and boughta turbo charger for my car, but it wasn’t aperfect fit—it was for a different year ofPrelude. I thought, ‘I’m an engineer, I canfigure it out.’” So Lowchareonkul installedthe turbo charger and raced a few times,upon which the engine promptly blew up.“It turns out it wasn’t tuned properly, andwhen I took the engine apart, the pistonswere cracked,” he said. “I said to myself,‘there has to be a way to monitor the safe-ty of your engine.’”

Lowchareonkul searched the Internetand found a few do-it-yourself fuel-to-air ratio circuits for sale. “But they wereugly and not advanced, so I designedone that is sexy,” he said. With a XilinxFPGA, he developed a gauge that userscould plug into the OBD II/CAN portin their automobiles to get instanta-neous readouts from their engines (allcars built after 1997 have OBDII ports).


Engineer Turns Blow-up Into Hot Automotive Electronics Startup

Engineer Turns Blow-up Into Hot Automotive Electronics StartupA blown engine sparked the design of a novelair-to-fuel ratio gauge using a Xilinx FPGA.Ultimately, a new company grew up around it.

A blown engine sparked the design of a novelair-to-fuel ratio gauge using a Xilinx FPGA.Ultimately, a new company grew up around it.

PROF I LES OF XCELLENCE

Paul Lowchareonkul

“I designed a circuit that controlledyour car’s oxygen sensor, which monitorsyour air-to-fuel ratio,” Lowchareonkulsaid. “Essentially, if you are running toolean, meaning not enough gas, you can det-onate the gas in your pistons too early,destroying the pistons. And if you are run-ning too rich, meaning too much gas, youare wasting gas and losing a lot of power,dumping fuel out your exhaust. So thisproduct measured the amount of oxygen tothe point that people can tune their engineso precisely, they can squeeze every bit ofhorsepower out of their car.”

The Cupertino, Calif., native was nonewcomer to Xilinx platforms. His father,Teratum Lowchareonkul, is an engineer atXilinx, and during his junior and senioryears at college, the younger Lowchareonkulinterned with the company under the tute-lage of Xilinx veteran Bill Pabst. “Duringthis internship, I learned how an FPGAworks and all the interesting stuff you cando with one,” Lowchareonkul said. “Whatwas neat about the internship was, I wasgiven the opportunity to experiment andplay around with the technology.”

Turning a Blow-up into a BusinessAfter building that first gauge for his ownuse, he started selling a few of the systemsto his classmates. Lowchareonkul thendecided to see how his invention would doon eBay. “I sold it with no reserve, and thefirst one went for $600, which was around$300 below [the price of ] commerciallyoffered sensors,” Lowchareonkul said. Thatsale marked the start of PLX Devices, Inc.,which Lowchareonkul owns outright.

He immediately began to develop asophisticated product lineup, including agauge that allows users to check 50 aspectsof engine performance, including the fuel-to-air ratio, as well as a multigauge system.All of the company’s more-advanced gaugesare powered by Xilinx FPGA platforms.

Lowchareonkul’s designs use an organicLED rather than a mechanical needle. Thisallows users to customize the display to suittheir tastes, and to switch what aspect ofengine performance they want to monitorusing a keychain fob-type remote control.What’s more, the device can record data for

PLX, the device both saves money andreduces emissions. This “green” focus hasgarnered widespread media coverage inprint and on radio and TV for the Kiwiand PLX. “We’re also now signing dealswith mainstream distributors, and Kiwi isavailable in mainstream stores—it’s a con-sumer product,” said Lowchareonkul.

Lowchareonkul said Xilinx plays a criti-cal role in PLX Devices’ products. “Because

we used the FPGAs, we were able to createa design quickly to get into the market atthe perfect time,” he said.

So, now that the company is on theroad to success, one might wonder if theCEO is looking for an automotiveupgrade. He currently drives a souped-upHonda S2000, but like any auto buff,Lowchareonkul dreams of buying newwheels. The car of his fantasies isn’t aPorsche, Mercedes, BMW or Ferrari.Instead, Lowchareonkul said his next vehi-cle is going to be the all-electric Tesla.

“I’m always looking for the next bestthing, and the Tesla’s such an elegantdesign,” he said. “You don’t have to changethe fluids, you don’t have a radiator and ithas a very efficient electric motor. I want toget it so we can learn what to do with it.”

If the last eight years have been anyindicator, the highly driven Lowchareonkulwill no doubt do something amazing.

For more information on PLX’s prod-ucts, visit www.plxdevices.com.

a driving instance. For example, after a race(on a legally sanctioned track, of course),drivers can download the data to their PCto analyze their engine’s performance.

The gauges have made a splash in theautomotive market. In 2007, PLX won twoawards (Best New Interior Product andBest New Mobile Electronics) at theSpecialty Equipment Market Associationconference, beating out products from

much larger, established companies. PLXgauges have also won 18 media awards.

The latest offering, the Kiwi, catapultsthe company out of the auto enthusiastniche and into the consumer realm. TheKiwi—named for the green fruit and builtaround a Xilinx FPGA and OLED dis-play—has arrived at a propitious moment.With gas prices spiraling ever upward, thissystem allows users to monitor their fuelefficiency and even awards a “Green Score”from 0 to 100 to promote fuel-efficientdriving habits.

“It’s a device that you can plug into yourcar in minutes,” Lowchareonkul said. “Youplug it in, start driving and the device mon-itors your driving habits to make sure youare driving for gas efficiency.” The Kiwikeeps score based on four parameters: accel-eration, drag, smoothness and deceleration.It also monitors miles per gallon in real time.

The Kiwi comes with several tutorials totrain drivers in how to get the most mileageout of every gallon. Thus, according to


PROF I LES OF XCELLENCE

PLX Devices’ Kiwi, powered by a Xilinx FPGA, helps customers drive more fuel-efficiently.

The latest ASIC prototyping board from Synplicity, newlyacquired by Synopsys Inc. (Mountain View, Calif.), features theVirtex®-5 LX330T FPGA, along with 1 Gbyte of on-board DDR2memory. “We’ve laid it out in a way that later in the year, we canalso offer a derivative of this system with a Virtex®-5 FXT device.And we already have [customer] interest in that,” said JuergenJaeger, director of HAPS product marketing in the Synplicity busi-ness group at Synopsys.

Released at the Design Automation Conference in Anaheim,Calif., in June, the High-performance ASIC Prototyping System(HAPS) 51T “provides users with a platform for all kinds of appli-cations that incorporate high-speed interfaces,” Jaeger said. Alongwith the DDR2 memory, the board also includes 2M x 36-bit syn-chronous SRAM and 32M x 16-bit flash PROM.

Jaeger said Synplicity started shipping the HAPS-51T board inmid-May. “We got the first orders for the boards before we evenstarted the design. We see a lot of interest in this board, especially

Synopsys Prototyping Board Packs Virtex-5 LX330T; FXT Version Dueas an add-on to our FPGA motherboards to add a high-speedinterface to the system,” he said.

This is the first board to go beyond the company’s HAPSTrackconnectors, Jaeger said. “In order to leverage the 24 RocketIO™GTP channels in the Virtex-5 LX330T, we made a new connector,which we call the MGb [multigigabit] HAPS connector. Each MGbbrings eight RocketIO channels out from the FPGA directly.”

Designers can plug several daughterboards directly into theHAPS-51T, Jaeger said, including USB, Ethernet, PCI Expressand video processing. They can also plug riser cards into the sys-tem to add more sockets.

The system features three Vcco regions, which designers can indi-vidually set to 3.3, 2.5 or 1.8 volts. Designers can program the boardconfiguration via JTAG, on-board flash PROM or SelectMAP, orthey can buy an optional CompactFlash card from Synopsys.

For more details on the product, including gate counts, go towww.synopsys.com.


IP vendor Avalon Microelectronics (St. John’s, Newfoundland,Canada) recently expanded its intellectual-property lineup and nowoffers several cores for 40G, 10G and 2.5G Sonet/SDH and OTNoptical transport.

Avalon’s IP is mainly targeted at FPGA implementations. “Thenumber of IP blocks we offer is hard to quantify, because our IP isarranged around application spaces and needs,” said CEO WallyHaas, who founded Avalon in 2004 after a stint at AMCC. “Wehave a full suite of IP for the 40G, 10G and 2.5G data rates.”

The company offers framers, concatenated and channelized pathprocessors, and packet-over-Sonet mappers to support SDH/Sonetapplications. Its flagship product is an enhanced forward-error-cor-rection (EFEC) core for 10G and 40G. “We have a core IP librarythat is tested and validated, and we have hardware platforms that weuse to test these cores,” said Haas. “We customize these cores to cre-ate solutions for our customers.”

For example, he said, “a 10G transponder would take a 10Gclient SDH or 10GE, and that can be transported over 10G OTNwith FEC or EFEC cores. That’s one application where we combinea few pieces of IP to create a full product or to create a larger core,depending on our customer’s needs.”

Haas said the company’s modular approach allows Avalon to take

that same IP and configure it for a 40G muxsponder solution—cre-ate four 10G clients in Sonet or SDH, and aggregate those up to40G with FEC or EFEC.

Avalon initially made its mark offering an SFI-5 core. “What thatdoes is make FPGA- or serdes-based products compliant to the SFI-5 standard, which they normally aren’t,” said Haas. The companyholds three patents on that IP, which Haas calls its “skew findertechnology.” This patented technology uses a reference transceiverto relate skew by any transmitting serdes to any other transmittingserdes, he said. “We’ve deployed that in Virtex®-II Pro, Virtex®-4and Virtex®-5 FPGAs.”

The SFI-5 core is a soft core that users download in conjunctionwith control software. The company also offers netlisted and RTLversions of its IP, as well as single-project, single-site and multipro-ject, multisite licensing.

Haas said that Avalon takes IP protection very seriously and has afull-time patent writer on staff. The company has thus far filed sevenpatents, he said. According to Haas, most IP licensing deals alsorequire a small—usually very small—amount of integration servicesfrom Avalon. “We consider ourselves a product company,” said Haas.

For more information, visit http://www.avalonmicro.ca/about/mission/.

Tools of XcellenceTools of XcellenceNews and the latest products from Xilinx partnersNews and the latest products from Xilinx partners

TOOLS OF XCELLENCE

Avalon Bulks up IP Offerings for Optical Transport

Xilinx® Virtex®-5 FXTEvaluation Kit

The Xilinx® Virtex®-5 FXT Evaluation Kit provides a development platform for exploring PowerPC® 440-based architectures using Virtex-5 FXT FPGAs. This low-cost evaluation kit is ideal for both software and hardware developers and functions as an easy-to-use, entry-level tool for code development and debug, as well as processor system prototyping and general FXT evaluation.

Get Behind the Wheel of the Xilinx Virtex-5 FXT Evaluation Kit and take a quick video tour to see the kit in action (Run time: 7 minutes).

Ordering InformationPart Number Hardware Resale

AES-V5FXT-EVL30-G Xilinx Virtex-5 FXT Evaluation Kit $395.00 USD

Take the quick video tour or purchase this kit at: www.em.avnet.com/virtex5fxt-evl

Target ApplicationsPowerPC» ® 440 software development

General FPGA prototyping»Communications systems»Image processing»

Key FeaturesXilinx XC5VFX30T-1FFG665 Virtex-5 FPGA»Eight LEDs»DIP switches»Four push-button switches»On-board 100 MHz LVTTL oscillator»User clock inputs via differential »SMA connectors

EXP half expansion slot»System ACE» ™ module header

64 MB DDR2 SDRAM»16 MB Flash»RS-232 and USB-UART serial ports»10/100/1000 Ethernet PHY»System ACE option»Xilinx JTAG interface»BPI configuration»

Kit IncludesXilinx Virtex-5 FXT evaluation board»ISE» ® WebPACK™ 10.1 DVD

Wall-mount power supply (5 V)»Downloadable documentation »and reference designs

1 . 8 0 0 . 3 3 2 . 8 6 3 8www.em.avnet.com

Copyright© 2008, Avnet, Inc. All rights reserved. AVNET and the AV logo are registered trademarks of Avnet, Inc. All other brands are the property of their respective owners.Prices and kit configurations shown are subject to change.

by Victor PengSenior Vice President, Xilinx Silicon Engineering GroupXilinx, [email protected]

If you are a longtime customer of Xilinx,you’ll probably notice that the companyhas started to use the term “Xilinx®

FPGA Platforms” with greater regularity.It isn’t just a marketing buzz phrase, butan expression that accurately captures thereality of what customers need today andwhat Xilinx is delivering. FPGA silicon isthe engine of the platform, but it’s thecombination of silicon, software and IPthat delivers our full value proposition.The value lies in enabling you to designyour innovative products and get them tomarket quickly, and to deal with multi-ple, changing product requirements andstandards, at a cost that factors less thandesigning an ASIC. Increasingly, the jobtakes world-class software design toolsand embedded development tools, high-quality and reliable IP blocks, as well asworld-class silicon.

Over the course of my career, I’veworked on many IC design projects and,like many of you, have witnessed theprogress of FPGA technology from the userperspective. At first, FPGA capacity wastoo small to address many applications thatASICs were handling, so design groupsused them as glue logic or to prototypelow-complexity ASICs. As FPGAs grew incapacity, they found their way into moreapplications and shipped in more endproducts. But the devices remained tooslow for some applications.

Once vendors found ways to increasetheir clock rates, FPGAs really picked upmomentum and their value propositiongrew tremendously relative to the costlyalternative of designing an ASIC. Vendorsstarted to add a greater number of high-speed I/Os, including serdes, to theirdevice families for a number of new appli-cations. Then, a few years ago, FPGA ven-dors began to move into entirely newterritory, offering tools to help embedded-software engineers and algorithm develop-ers to use FPGAs. The number of usersfrom the embedded-software engineeringand algorithm development spaces issteadily increasing year over year.

Today, the FPGA business is at thedawning of a new age of growth as FPGAplatforms become more sophisticated andmore users realize the FPGA platformvalue proposition.

Indeed, a wider variety of engineers areusing FPGAs to design an ever-growingnumber of applications in wired and wire-less communications, automotive andISM, and aerospace and defense.

Now that I’m at Xilinx, I’ve been verypleased to learn through many meetingswith many of my former colleagues fromthe design world—and even former com-

petitors—that there is increasing demandfor FPGAs in applications I’m sure Xilinx’sfounders only dreamed of.

While I can’t reveal what those applica-tions are, I can say that Xilinx is activelyworking on all fronts to seize new marketopportunities, while actively building qual-ity tools, IP, as well as state-of-the-art sili-con to help our existing customers get theirinnovations to market quickly.

Much of this demand is driven by thesimple fact that ASICs are becomingincreasingly too expensive and complex todesign on the latest process technologies.But customers still want to differentiatetheir products, in both hardware and soft-ware, in ways that go beyond the possibili-ties most ASSP vendors can give them.They want to build a single product theycan offer to multiple customers, and rap-idly customize it to support changing andmultiple standards and specific customerrequirements. Thanks to their hardwareand software programmability, FPGAs givedesigners the greatest mix of flexibilitywhile meeting performance, capacity and,increasingly, power requirements for agrowing number of applications.

The wind is in our sails. We invite youto come along for the voyage.


Xilinx FPGA Platforms: Silicon Was Just the Beginning Demand for programmable logic is growing in a welter of new application realms. To seize these opportunities, Xilinx is working on all fronts to provide world-class silicon, tools and IP.

XPECTAT IONS

Proven Xilinx®

Power SolutionsTexas Instruments offers complete power solutions with a full line of high-performance products.

These products range from standard linear ICs to plug-in and integrated power solutions. And,

TI makes designing easier by providing leading-edge support tools that will help you accelerate

your time to market.

For more information regarding FPGA power solutions, including

downloadable reference designs and layout examples

please consult the following websites:

www.ti.com/xilinxfpgawww.em.avnet.com/tifpgapowerwww.silica.com/tifpgapower

The platform bar is a trademark of Texas Instruments. All other trademarks are the property of their respective owners. © 2008 TI PN 2096

Documents

Xcell - xilinx.comSo we decided to make Xcell Journal a quarterly applications update with a lot of technical detail, how-to content and innovative ideas, as well as silicon, tools