Download pdf - Xcell Journal issue 71

Issue 71Second Quarter 2010

Xcell journalXcell journalS O L U T I O N S F O R A P R O G R A M M A B L E W O R L DS O L U T I O N S F O R A P R O G R A M M A B L E W O R L D

INSIDE

BDTI Study Certifies High-Level Synthesis Flows for DSP-Centric FPGA Design

A Mix of FPGA IP and Resources Makes DisplayPortCompliance Easy

CloudShield Uses Virtex-5 FPGAs to Speed Packet Processing

FPGA-based Control Plane/Data Plane Video ProcessingSuits Industrial Apps

www.xilinx.com/xcell/

Xilinx Unveils ARM-Based Architecture Targeting Software and System DevelopersXilinx Unveils ARM-Based Architecture Targeting Software and System Developers

Learn more about the new Spartan-6 and Virtex-6 FPGA baseboards and FMC modules designed by Avnet at www.em.avnet.com/drc

Avnet Electronics Marketing introduces three new development kits

based on the Xilinx Targeted Design Platform (TDP) methodology.

Designers now have access to the silicon, software tools and

reference designs needed to quickly ramp up new designs. This

approach accelerates time-to-market and allows you to focus on

creating truly differentiated products.

Critical to the TDP methodology is the FPGA Mezzanine Card

(FMC) from the VITA standards body. Avnet has collaborated with

several industry-leading semiconductor manufacturers to create a

host of FMC modules that add functionality and interfaces to the

new baseboards, allowing for easy customization to meet design-

specific requirements.

©Avnet, Inc. 2010. All rights reserved. AVNET is a registered trademark of Avnet, Inc.

New baseboards for Spartan®-6and Virtex®-6 FPGAs» Spartan-6 LX16 Evaluation Kit» Spartan-6 LX150T Development Kit» Virtex-6 LX130T Development Kit

New FMC Modules for Baseboards

» Dual Image Sensor FMC

» DVI I/O FMC

» Industrial Ethernet FMC

More are soon to be released!

Development kits help ramp up new Spartan®-6 or Virtex®-6 FPGA designs

1 800 332 8638www.em.avnet.com

Toggle among banks of internal signals for incremental real-time internal measurements without:

See how you can save time by downloading our free application note.

www.agilent.com/find/fpga_app_note

Quickly see inside your FPGA

u.s. 1-800-829-4444 canada 1-877-894-4414© Agilent Technologies, Inc. 2009

Also works with all InfiniiVision

and Infiniium MSO models.

L E T T E R F R O M T H E P U B L I S H E R

Xilinx, Inc.2100 Logic DriveSan Jose, CA 95124-3400Phone: 408-559-7778FAX: 408-879-4780www.xilinx.com/xcell/

© 2010 Xilinx, Inc. All rights reserved. XILINX, the Xilinx Logo, and other designated brands includedherein are trademarks of Xilinx, Inc. All other trade-marks are the property of their respective owners.

The articles, information, and other materials includedin this issue are provided solely for the convenience ofour readers. Xilinx makes no warranties, express,implied, statutory, or otherwise, and accepts no liabilitywith respect to any such articles, information, or othermaterials or their use, and any use thereof is solely atthe risk of the user. Any person or entity using suchinformation in any way releases and waives any claim itmight have against Xilinx for any loss, damage, orexpense caused thereby. Mike Santarini

Publisher

PUBLISHER Mike [email protected]

EDITOR Jacqueline Damian

ART DIRECTOR Scott Blair

DESIGN/PRODUCTION Teie, Gelwicks & Associates1-800-493-5551

ADVERTISING SALES Dan [email protected]

INTERNATIONAL Melissa Zhang, Asia [email protected]

Christelle Moraga, Europe/Middle East/[email protected]

Miyuki Takegoshi, [email protected]

REPRINT ORDERS 1-800-493-5551

Xcell journal

www.xilinx.com/xcell/

Xilinx Processor-First Architecture: Catalyst for Mainstream ESL?

t seems that every year for the last 15 years, I’ve heard at least one analyst or tool vendordeclare that this was the year electronic system-level design would go mainstream. Yet withESL still far removed from the standard hardware (let alone software developer) flow, I had

pretty much filed the idea in the same mental category as world peace and flying pigs. But now,thanks to Xilinx’s new architecture release (see cover story), I’m seriously starting to think thatESL’s day will finally come—and soon.

ESL design—essentially, modeling the behavior of an entire system at a high level of abstraction,using a high-level language—seems to be a perfectly sound answer to the ever-increasing transistorcounts, speed and complexity of system-on-chip (SoC) design. The promise of being able to use alanguage like C to describe desired system functionality—and then have tools partition thatfunctionality into hardware and software, and automatically implement it—could in conceptreally speed up design innovations. What’s more, if a company could find a way to enable softwareor systems developers to use such a flow, it would tap into a user base several times larger than thetraditional user base of hardware engineers. Imagine what system innovations and products thesepotential millions of engineers could design!

Though ESL’s path has been a rocky one, that hasn’t deterred a number of players in the electronic-design community—EDA companies, system software companies and even silicon vendors—from pursuing it. They have scored a few big successes, notably high-level synthesistools (see the article in this issue from research firm BDTI).

However, each of the many ESL vendors is taking a slightly different approach to the complexitiesof this type of design. Each has its strengths, weaknesses and target markets. Given this diversity, onecan’t help but see a need for an industrywide focus to bring ESL to fruition—perhaps a technologycatalyst, one development that could inspire the many players to hone their efforts and make ESL liveup to its promise.

My personal hunch is that the new Xilinx architecture could fit the bill. You may be saying, well Mike, you work at Xilinx and we expect you to say that. And of course,

you are correct. But regardless of my affiliation, I am someone who has heard a lot of pitches in myday. And I’m genuinely impressed with what I have seen of the Xilinx® Extensible ProcessingPlatform thus far. I applaud the folks in charge for taking the time to listen to customers and learnhow real products are developed at a system level, and for learning from the mistakes of the past—made by Xilinx and others—before starting to build this well-thought-out architecture.

A device that boots the low-power but very speedy ARM® MPU core first, and right out of thebox, rather than requiring that users first program the FPGA logic, is a brilliant move. Also inspiredis the support for industry-standard development software and operating systems, and the use ofa common industry-standard SoC bus for which customers have already developed oodles of IP.

That said, no one device launch can do the job alone. To truly make the dream of mainstreamESL a reality will take much effort on the part of many companies and individuals. I hope you, ourcustomers and partners, will be among them.

IAn upcoming device launch could be the spark that ignites electronic system-level design industrywide.

S E C O N D Q U A R T E R 2 0 1 0 , I S S U E 7 1

XCELLENCE BY DESIGN APPLICATION FEATURES

Xplanation: FPGA 101 A Xilinx FPGA Route Toward Implementing DisplayPort…30

XTRA READING

THE XILINX XPERIENCE FEATURES

1818

Xcellence in Wired Communications Using Xilinx FPGAs to Accelerate Packet Processing…18

Xcellence in Automotive & ISM Control Plane/Data Plane Video Processing with a Xilinx FPGA…24

Letter From the Publisher Xilinx Processor-First Architecture: Catalyst for Mainstream ESL?…4

Xpert Opinion BDTI Study Certifies High-Level Synthesis Flows for DSP-Centric FPGA Design…12

Xamples A Mix of New and Popular Application Notes…40

Xtra, Xtra A Rundown of New and Upcoming Targeted Design Kits…42

Tools of Xcellence A Bit of News About Our Partners and Their Latest Offerings…44

Xclamations! Supply an Xceptional Caption for Our Artwork and Win an SP601 Evaluation Kit…46

Cover Story Xilinx Architects ARM-Based Processor-First, Processor-Centric Device 66

2424

3030

1212

6 Xcell Journal Second Quarter 2010

Xilinx Architects ARM-Based Processor-First, Processor-Centric Device

COVER STORY

block (see figure). The processor subsystemwill be bootable and programmable out ofthe box. The balance of the new device willconsist of a tightly coupled programmable-logic extension block that will allow design-ers to partition their hardware and softwarefunctions based on system requirements.They can implement functions in the pro-grammable-logic extension block to createtheir own application-specific, highly opti-mized systems-on-chips (SoCs).

“We’ve put a lot of thought and plan-ning into the architecture of the device andhave learned a lot of lessons from our pre-vious devices, like our PowerPC™-basedVirtex®-II Pro,Virtex-4 and Virtex-5 FXTFPGAs. We’ve also learned from the mis-steps of our competitors,” said VinRatford, senior vice president for world-wide marketing and business developmentat Xilinx. “Where all those devices took ahardware design-centric view of systemdesign or simply don’t have enough pro-cessing horsepower, our new ExtensibleProcessing Platform takes a processor-firstapproach, so software designers can startdeveloping with this product right out ofthe box. They don’t even have to use theextension block if they choose not to.”

But many design teams that have a mixof software and hardware designers willembrace the extension block. Xilinx plansto refine the use model over time with theend goal of offering software and systemdevelopers an environment that will enablesoftware gurus to program the programma-ble-logic extension in addition to theprocessor without the assistance of hard-ware designers.

Ratford points out that unlike previousarchitectures in which the FPGA bootsbefore the processor, this new processor-first platform plays perfectly to how devel-opers actually work when building systemarchitectures.

“Electronic systems and software engi-neers typically develop what they wanttheir system to do in the software first, andthen determine what functions they needto accelerate by implementation in hard-ware,” he said. “This allows them to fittheir design into the appropriate perform-ance, cost and power footprint that ulti-

by Mike SantariniPublisher, Xcell JournalXilinx, [email protected]

For well over a decade, FPGA vendors havebeen searching for ways to add the vastnumbers of embedded-software engineersto a user base composed mainly of hard-ware design engineers. Software developersoutnumber their hardware engineer coun-terparts by up to 10:1 worldwide, so creat-ing a device that both camps can instantlyuse has obvious business benefits. Now,Xilinx is architecting a new ARM® micro-processor-based device—an ExtensibleProcessing Platform—tailored to the realway software and system developers work.In breaking through the barrier, this noveldevice promises to take the company intonew markets and to new levels of growth.

As FPGA devices and tools havematured over the last decade to incorporatea greater number of embedded processors(DSPs, microcontrollers and microproces-sors) in dominantly programmable-logicarchitectures, a growing number of embed-ded-system designers have broadened theirskill sets beyond middleware and softwaredevelopment and become competent withhardware design languages. This cross-disci-pline has allowed these happy few designersto begin using FPGAs to create highly opti-mized, differentiated architectures that havethe right mix of hardware and software forprime system performance, functionalityand power consumption.

But while the number of engineerswith this mixed skill set has grown steadi-ly but slowly over the last 10 years, thevast majority of systems designers still relyon hardware engineers to create customhardware functions their systems will needto achieve that optimal mix of perform-ance, functionality, power consumptionand system cost.

Processor Comes First With the Xilinx® Extensible ProcessingPlatform, Xilinx is planning to release afirst-of-its-kind device in which a 32-bitARM Cortex™-A9 processing subsystemrunning at up to 800 MHz is the dominant

Second Quarter 2010 Xcell Journal 7

New architecture targets both software and system developers.Processor boots before programmable logic to speed and simplify system development.

COVER STORY

mately the application is going to need.When they start a project, they are devel-oping a proof of concept. They are lessworried about having it tuned to the spe-cific requirements of a specific customer,and are more concerned about having themaximum amount of flexibility in deter-mining what can be done in hardware orsoftware. Through revisions, they decide

what functions will go into hardware andwhat will go in software, and then takesteps to refine both to fit their systemrequirements. Our device will help themdo their job faster and better than before.”

Xilinx will deliver the new ExtensibleProcessor Platform in the same low-power,high-performance 28-nanometer processtechnology that it will use for its next-gen-eration FPGAs (see sidebar).

Why ARM?Xilinx chose to partner with ARM Ltd.because the company is well-established andhas an incredible reputation for the quality ofboth its processor IP and software. Indeed,

the ARM architecture has become the defacto standard choice for designers seekinghigh-speed, low-power microprocessor cores.

“On all fronts—hardware and softwarefunctionality, performance, ecosystem, userfamiliarity and power—ARM was handsdown the best choice for this new architec-ture,” said Ratford. “With power consump-tion becoming a top-order consideration

for not only wireless but also wired appli-cations, adding the lowest-power processorto the device will give users a mind-blow-ing number of options for making systemtrade-offs. They can offload functions tothe hardware extension block to increasesystem performance. And they can createsystems that can have screaming perform-ance at one moment, then power down toconsume milliamps.”

A key feature of the new architecture isthe interfaces. Xilinx is interconnectingthe full, ARM processor-based system andprogrammable-logic extension block with ahigh-bandwidth interface connecting theprocessor-based system, extension block

and shared memory. By contrast, a typicalsystem that pairs a discrete MPU-basedASSP chip with an FPGA on the sameprinted-circuit board typically has just over100 I/Os connecting them.

Further, in the March release of ver-sion 4 of its AMBA® bus’ AdvancedExtensible Interface (AXI), ARM includ-ed an extension of the AXI specificationoptimized for use on programmable logic.The AXI-4 stream protocol extensionserves as a bidirectional crossbar commu-nication switch that will take advantage ofthe abundant I/Os, enabling engineersusing the new Xilinx device to achievenew levels of system interblock through-put and at the same time tap into a mul-titude of hardware peripheral cores thatIP vendors and customers have developedfor ARM ASIC and ASSP implementa-tions over the last two decades.

This tightly coupled integrationbetween the two domains, along with thenew AXI extension, also means that ifdesign teams find a particular function thatisn’t running optimally on the processor—or if they have a piece of code they want tospeed up—they can create hardware forthat function and place it in the program-mable-logic extension block using anindustry-standard interface.

Familiar Software Programming Model In creating the new architecture, Xilinx wasmindful of the requirements and workingpreferences of its targeted audience.

Because the new device boots theprocessor system first, at reset, a softwaredeveloper can program the processor out ofthe box, side-by-side with the hardwaredeveloper. This has the benefit of shorten-ing development cycles by putting thesekey functions on parallel tracks.

“In fact, someone could buy the partand just use the processor system,” saidKeith DeHaven, director of processor mar-keting at Xilinx. “But the value of thedevice lies in the user’s ability to leveragethe programmable logic to customize, opti-mize and differentiate their products whileleveraging the command, control andapplications features enabled by the ARM-based processor system.”


Processing SystemHardwired SoC

High-PerformanceLow Power, Low Cost

Boots OS at Rest

Programmmable Logic for Extensions

Rapid DifferentationHigh-Performance, ScalableProgrammed by Processor

High-Performance,Reconfigurable,

Application OptimizedAccelerators

High-BandwidthInterfaces

Extensible Processing Platform

AdditionalPeripherals

CommonPeripherals

Memory Interfaces

Off-the-Shelf

Off-the-Shelf

Custom

Custom

ARM®

Dual-Cortex™-A9MPCore Complex

Figure 1 – The Xilinx Extensible Processing Platform pairs an ARM processor with programmable elements.

COVER STORY


DeHaven said that the processor systemhas a constant set of peripherals, switchesand memory interfaces, providing the soft-ware developer with a consistent program-ming environment. In addition, developerscan get started with existing ARM toolsand available hardware (summarized inTable 1) to jump-start their efforts.

But of course, the real value of thearchitecture comes as design teams tradeoff functions between the processor systemand programmable-logic extension block.Now the software developer, not the hard-ware designer exclusively, will be dictatingfrom the processor view how the devicewill operate. For example, the processorsystem may be using data in the extensionblock to do peripheral functions, or it maydelegate control to the block. Developerswill likely run hardware/software co-simu-lation to see if a given function will run inhardware faster, consume less power orreduce cost. Still others may simply wantto take software functions that are not like-ly to change and offload them to theextension block to free up more room inthe processor code for other commands.

Once they solidify what functions will gointo hardware and software, they can thenhave their hardware engineers use Xilinx’sISE® Design Suite to implement those func-tions with an AMBA-AXI standard interfacein the programmable-logic extension block.Meanwhile, the developers can continue cre-ating software while the hardware team pro-grams the extension block.

While the processor-first architectureis unique and the use model more repre-sentative of how software developers real-ly work, Xilinx plans to make the floweven more intuitive.

Xilinx and its partners are currently devel-oping a comprehensive set of common andstandards-based accelerators and peripherals(IP cores, in hardware speak) and related

drivers and APIs that will further help soft-ware and system developers add functions totheir designs. Some of these accelerators andperipherals will be ready at the time oflaunch, allowing users to focus on creatingtheir own custom IP to meet their systemneeds and differentiate their products.

The accelerators and peripherals willrange in size from small functions that

designers can mix and match in the exten-sion block to fully populated extensionfunctions that target specific designdomains—connectivity, DSP and process-ing—and vertical markets such as auto-motive; industrial, scientific and medical;aerospace and defense; wired and wirelesscommunications, among others.

In the longer term, Xilinx is developingC-to-FPGA compiler flows in an effort toeventually offer software and electronic-system developers a way to readily movefunctions between software and hardwareprogramming environments to rapidlydevelop, evaluate and optimize their sys-tems. “The idea is that they will be able todevelop in a C-based environment and seeresults in hardware and software rapidly,”said DeHaven. In fact, Xilinx has beendiligently monitoring research conductedby the benchmarking and analysis firmBDTI in evaluating the use models of C-level synthesis tools (see related BDTIarticle in this issue).

While software developers will be able touse any commercial development tools sup-porting the ARM Cortex-A9, Xilinx plans tobundle its own tools with its new device tohelp folks get started. Bundled in the tool kitand PCB will be an Eclipse-based integrateddevelopment environment, GNU-basedcompiler, debugger and drivers. “Users willbe able to leverage the environment of theirchoice,” said DeHaven. “They will be able todevelop with industry tools or Xilinx devel-opment tools that support the Cortex-A9and ARM CoreSight™ debug interfaces.”

In addition to ARM native support,Xilinx is also working closely with leadingthird-party solution providers to developdevice-specific software bundles—operat-ing systems and development tools—aimedat engineers using the new device.

Suited for Multiple Vertical Markets Xilinx created the new architecture at theurging of customers who are looking for adevice that is scalable, flexible, upgradeableand will allow them to quickly create deriv-atives for their needs. The ExtensibleProcessing Platform will allow them to fur-ther differentiate their products from com-peting systems based on fixed-function

Vendor OS

eSol eT-kernel Multicore Edition

Express Logic ThreadX

Green Hills INTEGRITY 10

Kernel.org Linux 2.6+

Mentor Graphics Nucleus PLUS RTOS

Microsoft WinCE

MontaVista MobiLinux 5.0

QNX Neutrino RTOS

Symbian Symbian OS 9+

Wind River VxWorks 6.6 SMP

Table 1 – ARM has a mature and robust ecosys-tem for operating systems and OS development

tools. Here are some of the OSes the ARMCortex ecosystem supports.

T h e r e a l v a l u e o f t h e a r c h i t e c t u r e c o m e s a s d e s i g n t e a m s t r a d e o f ff u n c t i o n s b e t w e e n t h e p r o c e s s o r s y s t e m a n d p r o g r a m m a b l e - l o g i ce x t e n s i o n b l o c k . N o w t h e s o f t w a r e d e v e l o p e r, n o t t h e h a r d w a r ed e s i g n e r a l o n e , w i l l b e d i c t a t i n g h o w t h e d e v i c e w i l l o p e r a t e .

COVER STORY

ASSPs and ASICs. “We’ve previewed thisto several customers and they are eager toget their hands on the device,” saidRatford. “I predict the market reach of thisdevice will be incredible.”

For example, Xilinx expects any verticalmarket that incorporates intelligent videowill see immediate benefits from using thenew device. Intelligent video requires mul-tiple processing steps such as preprocessingpixel-leveling, which is computationallyintensive and thus well served by the paral-lel-processing capabilities of programmable

logic. Intelligent video also requires analyticprocessing at the element level, which isserved using a combination of compatibleparallel (programmable-logic) and serial(MPU-based) functions. Meanwhile, appli-cation processing at the frame level requiresdecision, control and communications pro-cessing typically performed by MPUs.

Among the specific video markets thatstand to gain are automotive driver assis-tance; consumer multiclass, multifunction-al printers; general embedded systems thatuse scanners; industrial smart cameras,

including Internet Protocol surveillancecameras and machine vision, DVRs, med-ical imaging systems, broadcasting studiocameras and transcoders; and defense-gradenight-vision equipment.

One intelligent-video application thatwill see immediate benefit from the newarchitecture is automotive driver assistance.Key customers in this space have been urg-ing Xilinx for years to create an ARM-basedextensible platform.

Automotive customers will be able toprogram the device to control and analyze


28-nm FPGAs will target the right mix of power consumption and performance

In February 2010, Xilinx announced it will manufacture its next-generation FPGAs in 28-nanometer high-k metal gate(HKMG) high-performance, low-power process technologies and will use a new, unified ASMBL™ architecture for the devices.Xilinx evaluated several technologies at the 28-nm node before concluding that the HKMG high-performance, low-power

process offered the ideal mix of power consumption and performance for Xilinx’s next-generation FPGAs.For the last four generations of IC processes, static power caused by transistor leakage has become increasingly problematic.

At 28 nm, it becomes a first-order effect. Where dynamic power refers to the power a device consumes when it is performing thetasks it was designed to do, static power is wasted juice that increases overall power consumption and produces more heat.

If left unchecked at the 28-nm node, static power draw can account for well over 50 percent of a device’s total power draw,drastically taxing power and thermal budgets. A key to the HKMG high-performance, low-power process is the material it usesas a gate dielectric (thermal insulator): instead of silicon dioxide, the technology employs hafnium dioxide. Where a 40-nm sil-icon dioxide material only afforded a k value (or gate dielectric constant) of 3.9, the 28-nm HKMG high-performance, low-power process will have a k value of 25, which is much more resistant to leakage.

By pairing HKMG with Xilinx’s ASMBL (Advanced Silicon Module Block) architecture—which enables Xilinx to rapidlyand cost-effectively assemble multiple domain-optimized platforms with an optimal blend of features—Xilinx believes its next-generation devices will have 50 percent lower static power consumption than devices implemented using alternative 28-nmhigh-performance processes. At the same time, Xilinx is including architectural innovations enabling a system-level perform-ance increase of up to 50 percent over previous-generation FPGAs.

Reducing power consumption also enables increased device capacity. Using this new process technology, Xilinx will intro-duce a new ultra-high-end 28-nm FPGA to enable applications that require increased capacity in keeping with Moore’s Law.

In addition, Xilinx’s design team is also introducing extra measures to help designers optimize their designs for the right mixof functionality and performance, as well as power.

Xilinx worked closely with customers to identify and understand the architectural bottlenecks in their systems. One of thebiggest areas constraining performance is in interfacing the FPGA with other devices. To address this issue, Xilinx will intro-duce new clocking technology and will harden critical datapath components. The company will also introduce fine-grainedclock-gating features and new place-and-route algorithms in the ISE Design Suite to further reduce power. Fine-grained clock-gating technology is a patented algorithm that analyzes the logic equation and disables wasted logic transitions that do not con-tribute to the final result. This methodology will help customers effectively reduce the power consumption of their designs byas much as 30 percent.

For more information, visit http://www.xilinx.com/technology/roadmap/28nm-technology.htm.

— Mike Santarini

COVER STORY


the data from multiple sensors positioned360 degrees around a vehicle, with eachsensor performing multiple functionssimultaneously. The intelligent controlsensor could, for example, have these sen-sors monitor lanes painted in the road,detect swerving vehicles in adjacent lanes,sync the automobile’s speed to that of thevehicle ahead, detect pedestrians andmonitor spaces between parked cars tolocate adequate parking spots—all simul-taneously. A system like this might warnthe driver instantly if it detects threats; itcould even automatically slow the vehicledown to avoid a collision.

Because such a device would be hard-ware and software programmable, tier-onevendors could make derivatives of this con-troller for various car manufacturers andvarious product lines of each of a carmak-er’s vehicles—without having to change theoverall configuration of the control unit.That capability will save OEMs vastamounts of time, effort and money.Further, the software and hardware pro-grammability means that these devices canbe serviced or upgraded in the field.

Similarly, in industrial controls, userscan create systems that will manage andanalyze data from a series of sensors andmotors that in real time can identify defec-tive products flying down an assemblyline, detect cracks in the machinery orpower down motors when they are run-ning too hot or are not in use to reducefactory costs, optimize operations andeven save the lives of workers.

The new device also holds great promisefor applications in the wired and wirelesscommunications markets, especially inwireless LTE radio, baseband and enter-prise femtocell markets, and in routers,switches and multiplexers in the wiredcommunications arena.

Xilinx anticipates the device will alsofind multiple uses in the military-aerospacebusiness, especially in cockpit control,munitions and communications equip-ment supporting the Global InformationGrid (see the cover story in Xcell Journal,Issue 69).

National Instruments, a longtime cus-tomer of Xilinx and alpha customer of thenew device, has been closely monitoring thedevelopment process and giving feedback toXilinx. Currently, National Instrumentspairs on a PCB a real-time processor with aXilinx FPGA in its NI Reconfigurable I/O(RIO) LabVIEW FPGA embedded plat-form (http://www.ni.com/fpga/). The plat-form offers a wide range of peripheral I/Oand predefined software libraries that cus-tomers can mix and match to create uniqueembedded systems for various vertical mar-kets. By offloading some of the functionsfrom the standalone processor and insteadrunning them in the FPGA, LabVIEWFPGA can run much faster and with thedeterminism required for instrumentation,measurement and control applications.The LabVIEW FPGA environment alsoenables the typical LabVIEW user or mar-ket-domain expert to accomplish this with-out knowing anything about the details ofFPGA design.

National Instruments expects embed-ded products based on the new Xilinxarchitecture will run much faster, with theadded benefit of consuming much lesspower, said NI R&D fellow Keith Odom.

“Now NI will have a very high-per-formance processor capable of running ourhighly productive graphical design envi-ronment, and we have very high band-width connecting the processor system tothe programmable-logic fabric,” saidOdom. “The amount of data we can trans-fer between the processor and the pro-

grammable-logic fabric is much higherthan you would typically see in an FPGAthat has an embedded processor or even amicrocontroller-based ASSP. Because thebandwidth is so high, you can movebeyond mechanical and audio and intomore electrical-, radio- or vision-relatedapplications, along with enabling muchmore sophisticated algorithms to beapplied to the data in all applications. Thisdevice opens up a lot of new possibilities.”

Odom notes that because it essentiallyintegrates two devices in one, data commu-nication on the new device will consumeless power. “Because these blocks are con-nected with so many I/Os, you don’t haveto burn the power you normally would inhigh-speed chip-to-chip communication.You’ll also be able to power it down intostandby modes,” he said.

Odom also likes the fact that the newdevice will be processor centric, rather thanFPGA centric. “It is absolutely key,” saidOdom. “In a lot of applications you wantthe controlling software to reprogram theFPGA depending on the application youare running, and there are times where youwant the processor to run autonomouslyfrom the FPGA fabric. So our applicationconstantly switches out what’s on the FPGAbased on your current processing needs, andthis architecture is ideal for that.”

“It will be incredible to see what cus-tomers will be able to do with device oncewe’ve released it,” said Ratford of Xilinx.“We’re very excited, but still have somework to do for the device to reach its fullpotential.”

Xilinx will announce pricing and avail-ability of the new device in early 2011. Tolearn more details about the new architectureso you can start development today, visitwww.xilinx.com/technolog y/roadmap/processing-platform.htm.

The new device will be processor centric, rather than FPGA centric.

‘In a lot of applications you want the controlling software to reprogram the FPGA

depending on the application you are running, and there are times where you

want the processor to run autonomously from the FPGA fabric.’

COVER STORY


BDTI Study Certifies High-Level Synthesis Flows for DSP-Centric FPGA DesignAdvances in high-level synthesis tools make it easier for DSP developers to implement their designs in FPGAs, benchmarking firm finds.

XPERT OP IN ION

by Jeff BierPresident [email protected]

Jennifer Eyre WhiteDSP [email protected]

In recent years, high-level synthesis toolshave become something of a holy grail forengineers who use—or want to use—FPGAs in their designs. These tools takeas their input a high-level representationof an application (written in C orMATLAB’s M language, for example) andgenerate a register-transfer-level HDLdescription of a hardware implementationtargeting an FPGA.

High-level synthesis tools (HLSTs) areparticularly interesting to two groups ofprospective users: FPGA users who areimplementing demanding digital signal-processing (DSP) applications, and high-performance DSP processor users doing thesame thing. That’s because taxing signal-processing workloads—which typicallyhave high data rates and high levels of par-allelism—are often well suited for imple-mentation on FPGAs using HLSTs.

For current FPGA users, these toolshold the promise of an easier and fasterdesign process. For current DSP processorusers, HLSTs offer a different, yet equallytantalizing prospect: the ability to migrateto a more powerful processing engine (anFPGA) without having to wrangle RTLcode. So, what’s not to like?

One key problem is that, in the past,high-level synthesis tools have not beenable to generate efficient RTL code (interms of resource usage). Most engineerswere unwilling to sacrifice the perform-ance and efficiency of hand-coded RTL,and the tools never achieved significantmarket penetration. Emerging anecdotalevidence suggests, however, that newerHLSTs for Xilinx® FPGAs are very effi-cient and usable. Given this conflictinginformation, how is a prospective user tojudge whether high-level synthesis toolsare worth considering?

not just the C-to-RTL portion, but alsothe Xilinx RTL tool chain.

Typically, the first step in implementingan application on any hardware target is torestructure the initial C code. By “restruc-turing,” we mean rewriting the initial Ccode (which is typically coded for clarityand ease of understanding rather than forperformance) into a format more suitablefor the target processing engine. On a DSPprocessor, for example, it may be appropri-ate to rearrange an application’s controlflow so that intermediate data always fitsin cache. With a high-level synthesis tooltargeting an FPGA, restructuring typicallyprovides a representation of the applica-tion that allows the tool to extract poten-tial parallelism, resulting in a streamingpipelined implementation.

In general, high-level synthesis tools donot handle restructuring automatically.Instead, the restructuring is done by hand.In fact, designers can do the restructuringentirely independently of the high-levelsynthesis tool. In our evaluation, for exam-ple, we used Microsoft Visual Studio forrestructuring and reverifying the C code.Compared with handwritten RTL code,where restructuring and language transla-tion occur as a single combined step,restructuring entirely in C is easier and lesserror-prone—a key advantage for high-level synthesis tools.

After restructuring the high-level code,the user directs the HLST to synthesize ahardware implementation of the specifiedfunctionality in the form of RTL HDLcode. Then Xilinx’s RTL tools (ISE andEDK) take the RTL code the HLST hasgenerated, perform synthesis and place-and-route tasks, report the resource utiliza-tion of the implementation and alert theuser to any timing issues.

BDTI’s Tool Certification ProgramBDTI’s goal in creating the High-LevelSynthesis Tool Certification Program wasto enable two key points of comparison inorder to serve the two classes of potentialHLST users. First, we wanted to comparethe efficiency (in terms of resource usage)of HLST-based FPGA application imple-mentations vs. an implementation based

To answer this question, we at inde-pendent benchmarking and analysis firmBDTI developed the BDTI High-LevelSynthesis Tool Certification Program in2009. Our aim was to provide objective,credible data and analysis of HLSTs forFPGAs so as to enable potential users toquickly understand their capabilities andlimitations in demanding signal-process-ing applications. We conducted the evalu-ation from the perspective of experiencedDSP software engineers who had no previ-ous experience in FPGA development—abackground that is characteristic of manyof the processor users who stand to benefitfrom HLSTs.

The first two HLSTs the program eval-uated were Synfora’s PICO and AutoESL’sAutoPilot. In early 2010, we released thefirst results from the evaluation program—including a few results that will surprisemany FPGA and DSP processor users.

Implementation Using HLSTsOur process for implementing test appli-cations using HLSTs starts with a high-level-language description of the desiredfunctionality, from which the high-levelsynthesis tools generate RTL implementa-tions. Then Xilinx’s RTL tools (theIntegrated Synthesis Environment, orISE®, and the Embedded DevelopmentKit, or EDK) transform that RTL imple-mentation into a complete FPGA imple-mentation in the form of a bitstream forprogramming a specific Xilinx FPGA on aspecific hardware platform with I/O andmemory. In this case, the platform was aXilinx XtremeDSP™ Video Starter Kit—Spartan®-3A DSP Edition, a TargetedDesign Platform based on a Spartan-3ADSP FPGA.

We could have limited our evaluationto the high-level synthesis tools alone,ignoring the RTL-to-bitstream portion ofthe design flow. But we believe thatpotential users need to know what it takesto get from the high-level applicationdescription all the way to an FPGAimplementation—a job that requires theRTL tools in addition to the high-levelsynthesis tool. For this reason, we evalu-ated the entire implementation flow—


XPERT OP IN ION

on handcrafted RTL code. This informa-tion is critical for current FPGA users pon-dering whether to adopt HLSTs toaccelerate their development times.Second, we wanted to gauge the perform-ance and development effort associatedwith using HLSTs on an FPGA relative toimplementing the same workload using aDSP processor with its associated softwaredevelopment tools. This comparisonenables DSP processor users to assess howdifficult it would be to switch technologiesand migrate to an FPGA-based design.

We evaluated the high-level synthesistool flows (including the associated RTLtools) using two well-defined sampleapplications, or “workloads.” These appli-cations (described briefly in the next sec-tion) are representative of demandingdigital signal-processing applications thatdesigners often implement on FPGAs,with high data rates and high computa-tional demands. Other types of applica-tions would likely yield different resultsthan those presented here.

We implemented the two applicationsusing several approaches. First, we imple-mented a given workload on the targetFPGA using each high-level synthesis toolin conjunction with the Xilinx RTL tools.We then implemented the same workloadon the same FPGA using a traditional RTLdesign approach, or on a DSP processorusing its associated development tools(depending on the workload under consid-eration). In this manner, we were able tocompare the quality of results and produc-tivity associated with using various tool-plus-chip combinations.

Evaluation WorkloadsThe two applications we used for evalua-tion purposes were the BDTI Optical

Flow Workload and the BDTI DQPSKReceiver Workload.

The term “optical flow” (or “opticflow”) refers to a class of video-processingalgorithms that analyze the motion ofobjects and object features (such as edges)within a scene. The BDTI Optical FlowWorkload operates on a 720p resolution(1,280 x 720 progressive scan) input videosequence and produces a series of two-dimensional matrices characterizing theapparent vertical and horizontal motionwithin the sequence. In designing thisworkload, we incorporated dynamic, data-dependent decision making and arrayindexing in order to ensure a challengingtest case for the tools.

There are two operating points associ-ated with the BDTI Optical FlowWorkload, each of which uses the samealgorithm but is optimized for a differentmetric. Operating Point 1 is a fixed work-load defined as processing video with720p resolution at 60 frames per second.The objective for Operating Point 1 is tominimize the resource utilization requiredto achieve the specified throughput.(Resource utilization refers to the fractionof available processing-engine resourcesrequired to implement the workload.)

The objective for Operating Point 2,meanwhile, is to maximize the throughput(measured in frames per second) using allavailable device resources.

The second workload, the BDTIDQPSK Receiver Workload, is a wirelesscommunications receiver baseband appli-cation that includes classical communica-tions blocks found in many types ofwireless receivers. It is a fixed workloadwith a single operating point defined asprocessing an input stream of complex,modulated data at 18.75 Msamples/s, with

the receiver chain clocked at 75 MHz. Thereceiver produces a demodulated outputbitstream of 4.6875 Mbits/s. The objectivefor this workload is to minimize the FPGAresource utilization needed to achieve thespecified throughput.

Memory usage and memory bandwidthrequirements vary significantly between theworkloads. The BDTI DSPSK ReceiverWorkload requires minimal memory usage(and therefore, no external memory chip).The BDTI Optical Flow Workload, how-ever, requires storing a history of four videoframes (1,280 x 720 pixels per frame), andthus requires an external memory chip toaccompany the Spartan-3A DSP FPGA.Optical Flow Workload Operating Point 1requires a single external memory chip andinterface (with a bandwidth of approxi-mately 450 Mbytes/s), while Optical FlowWorkload Operating Point 2 typicallyrequires two external memory chips andinterfaces with a combined bandwidth ofapproximately 1.4 Gbytes/s.

For the BDTI Optical Flow Workload,typical FPGA implementations process onepixel per clock cycle for Operating Point 1and two pixels per clock cycle for OperatingPoint 2. BDTI DQPSK Receiver Workloadimplementations process one input sampleevery four clock cycles.

Description of Metrics and PlatformsHistorically, demanding applicationsimplemented in handwritten RTL code onan FPGA typically achieved relatively goodquality of results (that is, performance andefficiency) but poor productivity, whileapplications implemented on DSP proces-sors provided good productivity but rela-tively poor quality of results. High-levelsynthesis tools targeting FPGAs seek toprovide the best of both worlds: good qual-


Historically, demanding applications implemented in handwritten RTL code

on an FPGA typically achieved relatively good quality of results but poor

productivity, while applications implemented on DSP processors provided

good productivity but relatively poor quality of results.

XPERT OP IN ION


ity of results achieved with high productiv-ity levels. Therefore, in our evaluation weconsider two categories of metrics: qualityof results and usability.

Quality-of-results metrics assess theperformance and resource utilization ofthe workload implementation. The BDTIOptical Flow Workload reports quality-of-results metrics for the HLST-Xilinx flowand for the DSP processor flow. The BDTIDQPSK Receiver Workload delivers qual-ity-of-results metrics for the HLST-Xilinxflow and for a traditional FPGA imple-mentation using a handwritten RTLdesign that Xilinx developed in accordancewith typical industry design practices,including the use of Xilinx COREGenerator™ intellectual-property blockswhere appropriate.

Usability metrics assess the productivityand ease of use associated with the HLST-Xilinx design flow, and are based on ourexperience in implementing the BDTIOptical Flow Workload. These metricscompare the productivity and ease of useassociated with the HLST-plus-Xilinx toolflows targeting an FPGA relative to using aDSP processor with its associated software

development tool chain. We evaluatedusability metrics qualitatively based onnine aspects of tool use, including out-of-the-box experience, ease of use, complete-ness of tool capabilities, efficiency ofoverall design methodology and quality ofdocumentation and support.

For this evaluation, the target FPGA isthe Xilinx Spartan-3A DSP 3400(XC3SD3400A). For the BDTI OpticalFlow Workload, the Xilinx XtremeDSPVideo Starter Kit—Spartan-3A DSPEdition is the target platform. We usedXilinx RTL tools, including the ISE andEDK tool suites (version 10.1.03, lin64),along with the high-level synthesis tools.

The target DSP processor for this projectis the Texas Instruments TMS320DM6437.This video-oriented processor includes a600-MHz TMS320C64x+ DSP core alongwith video hardware accelerators. (Thehardware accelerators are not applicable tothe BDTI Optical Flow Workload, andtherefore we didn’t use them.) The evalua-tion used the Texas Instruments DM6437Digital Video Development Environment asthe target platform, and used the TexasInstruments Code Composer Studio tools

suite (version V3.3.82.13, Code GenerationTools version 6.1.9).

Implementation, Certification ProcessWe distributed the job of implementingthe two workloads on the two chipsamong the high-level synthesis tool ven-dors, Xilinx and BDTI based on the chipand tool chain used. The HLST vendorsimplemented both workloads using theirtools along with the Xilinx tools, and sub-mitted performance and resource utiliza-tion results to BDTI for verification andcertification. We used these certifiedresults to generate the quality-of-resultsmetrics presented in this article.

In parallel, our engineers received train-ing from the HLST vendors and inde-pendently implemented portions of theBDTI Optical Flow Workload using thehigh-level synthesis tools and Xilinx tools.This process provided BDTI with first-hand insight into the usability of the toolchains and the quality of results they gen-erated. We also implemented the BDTIOptical Flow Workload on the DSPprocessor, while Xilinx implemented thehandwritten RTL FPGA version of the

250

200

150

100

50

0

DSP

195

HLST + FPGA

5.1

40x Perfo

rmance

Performance (Frames/Second)Higher is Better

5

4

3

2

1

0DSP

4.16

HLST + FPGA

0.15

30x Cost/Performance

Cost/Performance (Frames/Second)Lower is Better

Figure 1 – A Spartan-3A DSP FPGA using HLSTs achieved 195 frames/s at 720p resolution on the BDTI Optical Flow

Workload, a video application, whereas a C64x+ DSP processor attained just 5.1 frames/s.

Figure 2 – Cost/performance for the BDTI Optical FlowWorkload (720p) on a Spartan-3A DSP FPGA

using HLSTs was far better than on a 600-MHz TI C64x+ architecture DSP.

XPERT OP IN ION

BDTI DQPSK Receiver Workload (whichBDTI then verified and certified).

Quality of Results: Performance and EfficiencyAs shown in Figure 1, the FPGA imple-mentations of the BDTI Optical FlowWorkload created using high-level synthe-sis tools achieved roughly 40 times the per-formance of the DSP processorimplementation. Bringing chip cost intothe analysis, Figure 2 shows the correspon-ding cost/performance advantage, which isroughly 30x in favor of the FPGA imple-mentation. Clearly, FPGAs used withhigh-level synthesis tools can provide com-pelling performance and cost advantagesfor some types of applications. (Moredetailed results are available onwww.BDTI.com.)

We also evaluated the efficiency of theHLST-based FPGA implementations ofthe BDTI DQPSK Receiver Workload vs.the same workload implemented usinghand-coded RTL. Here, too, the HLSTsperformed extremely well. As shown in

Table 1, both AutoPilot and PICO wereable to generate RTL code that was com-parable in efficiency (that is, resourceusage) to handwritten RTL code. The sim-ilarity of HLST and handwritten RTLresults is probably not coincidental; weprovided AutoESL and Synfora with theresource utilization figure for the hand-coded RTL implementation at the outsetof the evaluation process. Those compa-

nies likely used this figure as a target inoptimizing their implementations. (Weshould note, however, that such informa-tion is not required for effective use of thehigh-level synthesis tools, and that theHLST vendors did not get the handwrittenRTL design.)

We also interviewed designers who haveused the AutoESL and Synfora high-levelsynthesis tools, who confirmed this


PlatformChip Resource Utilization

(Lower is better)

HLST plus Xilinx RTL tools targeting Xilinx XC3SD3400A FPGA 5.6% - 6.4%

Handwritten RTL code using Xilinx RTL tools targeting Xilinx XC3SD3400A FPGA 5.9%

Efficiency of Design Methodology

Out-of-BoxExperience

Ease of Use Completenessof Capabilities

Quality ofDocumentationand Support

Learning to Use the Tool

Design andImplementation

(FirstCompilingVersion)

Design andImplementation

(FinalOptimizedVersion)

PlatformInfrastructureDevelopment

Extent ofModificationsRequired toReference

Code

Combined HLST+ Xilinx RTLtools rating

Fair Good Good Good Very Good Very Good Good Good Good

TexasInstruments

software developmenttools rating

Good Very Good Very Good Very Good N/A (assuming

already familiar)

Excellent Good Good Fair

Table 1 – Resource utilization for BDTI DQPSK Receiver Workload, at 18.75 Msamples/s input data with a 75-MHz clock speed

Table 2 – Usability metrics of HLST and FPGA tools vs. DSP development software

XPERT OP IN ION


resource usage finding. Indeed, theyreported that the tools produce excellentresults comparable to those obtained viahandwritten RTL code, with much lessdesign and verification effort—a signifi-cant achievement.

Usability Metrics Our usability metrics assess how easy it isto use the high-level synthesis tool flowcompared with the DSP processor toolchain. For each usability metric, we assigna score of excellent, very good, good, fair orpoor. In assigning these scores, we considerthe overall design methodology for a com-plete project—starting with a C-languageapplication specification and ending with areal-time implementation on the targetchip (either an FPGA or a DSP processor).Table 2 presents the usability metrics.

In general, both PICO and AutoPilotwere easy to install and use, even withoutexpertise in FPGA design. In contrast, wehad difficulty installing and using Xilinx’sRTL tools, and we ultimately decided tobring in an experienced FPGA engineerto help get the design running on theFPGA. We needed the FPGA engineer tointerpret error messages from the XilinxRTL tools, for example, and to interfacethe HLST-generated RTL modules withI/O and memory modules to create acomplete design that would run on theFPGA. In general, we found it challeng-ing to resolve design problems uncoveredoutside the scope of the high-level syn-thesis tools. If the HLST user does nothave RTL design and tools skills (as wedidn’t), at this stage of the design flow heor she will need the assistance of an engi-neer who does have them.

Even given the challenges associatedwith the RTL-to-bitstream portion of theflow, however, Table 2 indicates that theHLST-Xilinx tool chains yielded usabilityand productivity results that were nearly asgood as those of the DSP processor flow.Overall, we found that it took a similar levelof effort to implement the BDTI OpticalFlow Workload on the TI DSP processor ason the Xilinx FPGA using either of the twoHLSTs, assuming the availability of anexperienced FPGA engineer to assist withsome portions of the flow.

This is a significant finding, and maysurprise many DSP software engineers.Development time has been a key impedi-ment for many system designers tradingoff the use of a programmable DSP proces-sor vs. an FPGA, and our evaluation indi-cates that this new approach largelyeliminates this barrier for applicationssuch as the BDTI Optical Flow Workload.

HLSTs: Game Changers?Our earlier benchmarking of FPGAs andDSP processors (published in the 2007report FPGAs for DSP) showed large per-formance and cost/performance advan-tages for FPGAs on some applicationswhen the FPGA implementations werecreated using traditional RTL designtechniques. The new analysis presentedhere confirms this performance advantage(for example, a 40x speed gain and 30xcost/performance advantage on the BDTIOptical Flow Workload), and shows thatFPGAs can achieve similar performanceand cost advantages when used with high-level synthesis tools. In addition, wefound that the two high-level synthesistools evaluated thus far—Synfora’s PICO

and AutoESL’s AutoPilot—achieved alevel of resource-use efficiency compara-ble to that attained using handwrittenRTL code. While we did not directlyevaluate the time savings that came fromusing the HLSTs rather than writing RTLcode by hand, we believe they will proveto be substantial, in part based upon ourinterviews with current HLST users.

FPGA designs created using traditionalhandwritten RTL coding typically takemuch more effort than the equivalentapplication implemented in software on aDSP processor—which is one good reasonwhy many DSP processor users are reluc-tant to switch horses. Perhaps the mostsurprising outcome of this project, there-fore, is that it took roughly the same effortto implement the evaluation workload onthe FPGA (using either AutoPilot orPICO, plus the Xilinx tools) as it took onthe DSP processor.

For FPGA users, our study indicatesthat HLSTs offer improved productivitywith no significant downsides. For DSPprocessor users, it’s clear that FPGAs areworth considering—with HLSTs quicklybecoming a game-changing technology.

The authors wish to thank the many individ-uals at AutoESL, Synfora, Xilinx and BDTIwho contributed to the evaluations describedhere. For more information on BDTI’s evalu-ation programs and detailed results for theHigh-Level Synthesis Tool CertificationProgram, visit www.BDTI.com. For newsand analysis of signal-processing technologiesand tools, sign up for BDTI’s monthlynewsletter at InsideDSP.com.

Development time has been a key impediment for many system designers

trading off the use of a programmable DSP processor vs. an FPGA.

Our evaluation indicates that this new approach involving high-level

synthesis tools largely eliminates this barrier.

XPERT OP IN ION


Using Xilinx FPGAs to Speed Packet ProcessingVirtex devices enable the programmable FAST processors to decode, inspect and modify packets with minimal CPU involvement.

XCEL LENCE IN WIRED COMMS

et processing using our intellectual-prop-erty (IP) cores. Our packet-switch FPGAuses a standard Xilinx SPI-4.2 IP core forinterfacing to our network processor(NPU) and our IP core search engine.

We used standard Xilinx IP cores wherev-er possible in order to focus our system-on-chip design efforts on our packet-processing capabilities. We chose a Xilinx 10-Gbit Ethernet MAC core with dual GTPtransceivers to implement the 4 x 3.125-GbpsXAUI physical-layer interface. For the NPUinterface, we used a Xilinx SPI-4 Phase 2 coresupporting up to 1 Gbps per LVDS differen-tial pair with dynamic phase alignment andChipSync technology. Our key packet-pro-cessing IP cores are as follows:

• FAST Packet Processor: Our FPP’sIngress Packet Processor (FIPP) isresponsible for first-level packet pars-ing, key and flow ID hash generation,and Layer 3-4 checksum verification ona per-port basis. The FPP’s EgressPacket Processor (FEPP) performsegress packet modifications and Layer3-4 checksum recalculation.

• FAST Search Engine: Our FSE maintains a flow database in TCAMand QDR SRAM that we use to deter-mine the processing actions that needto be performed on ingress packets.The FSE receives key messages fromeach port’s FIPP, determines the pro-cessing actions for the packet andsends a result message back to theoriginating queue.

• FAST Data Queue: Our FDQ storesincoming packets in an “unscheduled”holding buffer. When the ingresspackets are written to QDR SRAM,the queue forwards a key messagefrom the FIPP to the FAST SearchEngine. The FSE uses this key todetermine how to process the packet,and sends a result message back to theFDQ. Based on the result message,the queue can forward, replicate ordrop each buffered packet. The queuemay additionally perform packet mod-ification on the forwarded and repli-cated packets independently.

by Andy NortonDistinguished Engineer, Office of the CTO CloudShield Technologies, an SAIC company [email protected]

Next-generation network infrastructures areemerging as 10-Gigabit Ethernet maturesand the industry looks ahead to 40GbE and100GbE. Converged networks create newchallenges for scalable open platforms toprocess the traffic. Common componentsin converged next-gen infrastructure chassisinclude high-performance terabit switchingfabrics and programmable content proces-sors capable of handling tens of gigabits oftraffic at the application layer amid con-stantly increasing complexity and burgeon-ing applications. CloudShield has created anew class of programmable packet proces-sors able to inspect, classify, modify andreplicate packets, integrating dynamic inter-action with the application layer.

Our Flow Acceleration SubsysTem(FAST) uses Xilinx® Virtex®-class FPGAs topreprocess packets for CloudShield DeepPacket Processing and Modification blades.These FPGAs include 10-Gbit EthernetMACs with per-port ingress processors forclassification and key extraction, egressprocessors for packet modification, packetqueues using quad-data-rate (QDR)SRAMs, Xilinx Aurora-based messagingchannels and a search engine based on ter-nary content-addressable memory (TCAM).Our FPGA chip set provides caching andprocessing of packets with minimal CPUinvolvement for high-performance process-ing at up to 40 Gbits per second. FeaturingLayer 2-7 field-based lookups, it usesdynamically reconfigurable rules to modifypackets at wire speed in a flexible and deter-ministic manner.

FAST Packet Processor Core FunctionsOur currently deployed deep-packet pro-cessing blades use two blade-access con-troller FPGAs and a packet-switch FPGA,all implemented using LX110T Virtex-5FPGAs. In each of our blade-access con-trollers we provide data-plane connectivi-ty using two Xilinx 10GbE MAC/PHYcores, chip-to-chip interfaces based onXilinx ChipSync™ technology and pack-



Ins and Outs of Data FlowFigure 1 shows the data flow through ourflow-acceleration subsystem. Core FPGAfunctions are shown in green, packet flowin yellow, control messages in blue, exter-nal devices in gray.

First, we identify customer traffic begin-ning with a received packet from a 10GbEnetwork port. Packets on each port enter aFAST Ingress Packet Processor for packetparsing and analysis (No. 1 in the figure).After classifying protocols and encapsula-tions, the FIPP locates the Layer 2, 3 and 4header offsets. Next come flow hashing andkey extraction (flow selection lookup rulessuch as a 5-tuple using source and destina-tion Internet Protocol addresses, sourceand destination ports and protocol).

At this point, our queue managerbuffers receive packets to free memorypages in external QDR SRAM. Receivedpackets at this stage are consideredunscheduled. We park them in externalQDR SRAM while waiting for FASTscheduling. The FAST Data Queue (No. 2in the figure) assigns a packet ID and dis-patches key messages to the FAST SearchEngine (No. 3), which uses the key to identi-fy the flow. A matched flow entry in externalTCAM provides an index into our FlowAction Table in associated SRAM. Thematched flow action is based on our cus-tomer’s provisioned application subscriptions.

Our FAST Search Engine replies with aresult message to the FDQ (No. 4), whereour action scheduler assigns packets to an

output queue according to its specifiedaction. We then de-queue packets from thepacket queue to the indicated destinationoutput port (No. 5), where our FASTEgress Packet Processor (No. 6) handlespacket modification according to a provi-sioned rule in the Flow Modification Tableas indicated by the specified action.

When our FAST Search Engine matchesa customer flow, the specified actionoccurs, while a miss would result in the exe-cution of a default rule (drop or send toNPU). We allow basic actions such as drop-ping packets, forwarding packets directly toa network port, forwarding them to ourexception-packet-handling NPU or repli-cating and forwarding packets with inde-pendent rules. Our extended actions


10GE

10GE

10GE

10GE

10GE

10GE

10GE

10GE

Packet

Key Registers[LUT RAM]

Per-10GE Ingress Port

FAST Ingress PacketProcessor

Hash Registers[LUT RAM]

Packet Queue[External QDR SRAM]

FAST Data Queue

PIB1.0

PIB1.0

PIB1.0

PIB1.0

Packet

Packet

Packet

Per-10GE Egress Port

Flow Modification Table[BRAM]

FAST Egress PacketProcessor

Modified Packet

Exception Packet Handler[External NPU]

Flow Tables[External TCAM]

Flow Action Table[External QDR SRAM]

FAST SearchEngine

Packet

Key Message

Message Key

Message R

esult

Search Key

Match/No Match

Matched Flow Action

Match Index

(2)

(3) (4)

(1) (5)(6)

Figure 1 – Data flow in the Flow Acceleration SubsysTem



include packet collapse (deleting a portionof a packet), packet expand/write (insertinga series of bytes into a packet) and packetoverwrite (modifying a series of bytes)combinations. An example of an overwriterule might be modifiying a MAC source ordestination address, modifying a VLANinner or outer tag, or changing a Layer 4header flag. An insert/delete examplemight be as simple as deleting an existingEtherType and inserting MPLS labels orVLAN Q-in-Q tags, or as complex asinserting an IP header as a GRE deliveryheader followed by the GRE header(Generic Routing Encapsulation is a tun-neling protocol; see Internet RFC 1702).

FAST Packet ProcessorsOur FAST Ingress Packet Processordecodes all packets to determine the con-tents of Layers 2, 3 and 4, if present. Afterour initial Ethernet Layer 2 decoding, thepacket may undergo further Layer 2 pro-

cessing. Next we proceed with Layer 3,processing either an IPv4 or IPv6 packet.Assuming we find one of these valid Layer3 types, we continue processing at Layer 4.

At the same time the packets are beingdecoded, our key extraction unit is locat-ing and storing key fields to produce asearch key that our FAST Search Enginewill use for flow lookup later. Figure 2shows the format of a TCP/IP packet overEthernet Type II and a standard 5-tuplekey to be extracted. It also shows theresultant key extracted for this example.

We also perform IP, TCP, UDP andICMP checksum calculations on all classifiedpackets for both ingress and egress processors.Two Virtex-5 FPGA DSP48E slices providethe adders needed to calculate and verify thechecksums. Our first DSP sums the datastream on 32-bit boundaries, and the secondDSP folds the resulting sum into a 16-bitchecksum value at the end of the associatedlayer being calculated. We then calculate the

checksums; for recalculation, we zero out thechecksum byte positions of the incomingdata stream and reinsert the inverse of thechecksum result back into the data streamusing a storage buffer. The pseudo headerbytes required for the Layer 4 checksum aremultiplexed into the incoming data streamfor inclusion in the final calculation.

A FAST Egress Packet Processor at eachoutput port performs rule-table-based (rulesstored in internal BRAM) packet modifica-tions and Layer 3-4 checksum recalculationand insertion. The FEPP expands beyondtraditional packet modification “fixed-func-tion” techniques, enabling packet modifica-tions to include overwriting, inserting,deleting or truncating packets based on aspecified modification rule number. Ourflow modification rules allow specificationof an opcode indicating the type of opera-tion, with the OpLoc indicating the startinglocation, an OpOffset, Insert Size, DeleteSize, whether to perform Layer 3 and 4

EXTRACTED KEY (Hex Bytes):

EXTRACTED KEY (Network Notation):

C0A80A 14

Source IP Address:192.168.10.20

Dest IP Address:192.168.10.10

Src Port:4132

Dst Port:80 (http)

Protocol:60 (TCP)

C0A80 A 0A 10 24 00 50 06

Layer 2:Ethernet IIHeader

Layer 3:InternetProtocol

Layer 4:TransmissionControlProtocol

Extra

cted

Fie

lds

for K

ey

Figure 2 – 5-Tuple key extraction for TCP/IP packet over Ethernet Type II


checksum calculation and insertion as wellas mod-rule chaining.

We can use packet overwrite features tosimply modify existing fields such as a MACdestination address, MAC source address,VLAN tag or even a single TCP flag.

In order to modify only the MAC des-tination address, the “action” that theFEPP receives along with the packet wouldbe to use, for example, Rule 2 in the FlowModification Table (Figure 3). Rule 2would have been preprovisioned to specify

the opcode (overwrite), OpLoc (locationwithin the packet, for example Layer 2),OpOffset (offset from starting location),mask type (which bytes to use) and modi-fication data (actual overwrite data). Theresult would be to overwrite the 6 bytes,starting at location Layer 2, with the pre-provisioned modification data.

Another example of overwrite is shownfor Rule 6, where we want to modify a par-ticular TCP flag, possibly the ACK, SYN orFIN (Figure 4). This rule would use opcode(overwrite), OpLoc (Layer 4), OpOffset (0offset from Layer 4), mask type (use byte14) and BitMask (which bits within a byteto mask). We could have specified multiplefields for the overwrite, using mask types toinclude or exclude specific bytes.

Our overwrite capabilities are not limitedto only what we can store in the Flow

Modification Table, but also include what isstored in the Flow Action Table as associat-ed data. A rule could specify the use of asso-ciated data that passes to the FEPP as part ofthe action, significantly expanding the rangeof data that can be used for the modifica-tion. The result is to allow, for example,overwrites for an entire VLAN tag range.

Our insert/delete capabilities can achieveeven more-complex packet modifications.The example for Rule 5 (Figure 5) uses ourinsert/delete capability. The variousactions—including opcode (insert/delete),OpLoc (Layer 2), OpOffset (start at byte12), ISize (insert size = 22 bytes), DSize(delete size = 2 bytes) and Insert Data(0x8847, MPLS labels)—associated withRule 5 will result in the deletion of the exist-ing EtherType and insertion of the newEtherType=8847, indicating that the newpacket will be an MPLS unicast packet, fol-lowed by a stack of MPLS labels as specifiedby the insert data.

Floorplanning and Timing ClosureThe greatest challenge we faced in design-ing our novel packet processor arose fromincreasing FPGA design complexity, high-er routing and utilization density, theintegration of various IP cores, use ofmultiple hard logic objects (BRAMs,GTPs, DSPs and the like) and insufficientdata-flow planning back at the earliestproject phases. Our released Phase 1Virtex-5 FPGA bit files were based onlower utilization density, particularlylower BRAM usage, resulting in relativelyeasy timing closure. Adding significantnew functionality with BRAM utilizationapproaching 97 percent at a later phase,we became painfully aware of just howcritical optimal floorplanning can be, andhow decisions made early in a product lifecycle can affect later phases.

The primary goal of floorplanning isto improve timing by reducing routedelay. In order to accomplish this, designanalysis considering data flow and pinoutis of paramount importance. The XilinxPlanAhead™ tool, now integrated into


MAC DA MAC SA EtherType L 3 Header PayloadRule 2Rule 2 MAC DA MAC SA L3 Header PayloadEtherType

MAC DA MAC SA EtherType L 3 Header PayloadRule 2Rule 6 MAC DA MAC SA L3 Header L4 Header Payload0 x 0800

MAC DA MAC SA EtherType L 3 Header PayloadRule 2Rule 5 MAC DA MAC SAMPLSLabel

MPLSLabel

MPLSLabel

MPLSLabel

MPLSLabel L3 Header Payload0 x 8847

Figure 3 – Simple MAC destination address overwrite modification

Figure 4 – Overwrite modification of TCP flags

Figure 5 – MPLS label insertion modification

Our next-generat ion implementat ion wil l boost performance, extend caching capabi l i t ies and add new funct ional i ty. Migrat ing our

FAST chip set into a s ingle Xi l inx Vir tex-6 FPGA, we wil l take ournext-gen FAST funct ional i ty, inter faces and performance to a new

level whi le decreasing board space and power requirements.



ISE® as the point tool for floorplanningand timing analysis, provided the interac-tive analysis and visualization we needed tonavigate the complex web of how toachieve timing closure on a high-utilizationdesign. PlanAhead supplied the insightinto our design that we needed to providethe minimal number of constraints so as toguide the map, place and route tools tomeet our timing requirements. We havefound that this often means optimally plac-ing a key set of BRAMs in addition toblock-based design area constraints.

In retrospect, had we spent more timeat the earliest project stages, usingPlanAhead to try what-if scenarios andhelp us visualize the optimal data-flow andpin selections, our task at this late designphase would have been less difficult.

Dynamically Adaptable Packet ProcessingOur Flow Acceleration SubsysTem’s abilityto inspect and modify packets at wire speedwith maximum flexibility, combined withthe ability to dynamically interact withapplication-layer services, results in highlyadaptable packet processing. Virtex-classFPGAs have been a key enabler, providingthe system-on-chip platform to acceleratecontent-based routing and implement keypacket-processing functions previouslyunseen in prior-generation FPGAs.

Our next-generation implementationwill boost performance, extend cachingcapabilities and add new functionality.Migrating our FAST chip set into a singleXilinx Virtex-6 FPGA, we will take ournext-gen FAST functionality, interfacesand performance to a new level whiledecreasing board space and power require-ments, resulting in a single-chip deep-packet-processing coprocessor unit.

AcknowledgementsNo complex FPGA design effort such as ourscould happen without a great team. I’d liketo acknowledge our FPGA dream team: GregTriplett, FPGA team lead and search enginedesign lead; Scott Stovall, data queue designlead; Scott Follmer, packet-processing designlead; Steve Barrett, verification team lead;and Isaac Mendoza, SystemVerilog whiz andverification engineer.


Get Published

www.xilinx.com/xcell

It's easier than you think!

Submit an article draft for our Web-based or printed publications and we will assign an editor and a

graphic artist to work with you to make your work look as good as possible.

For more information on this exciting and highly rewarding program, please contact:

Mike SantariniPublisher, Xcell Publications

[email protected]

See all the new publications on our website.

Would you like to write for Xcell Publications?


Control Plane/Data Plane Video Processing with an FPGAControl Plane/Data Plane Video Processing with an FPGAAn HD video recognition system balances high data-processing bandwidth with embedded processing control.

An HD video recognition system balances high data-processing bandwidth with embedded processing control.

XCEL LENCE IN AUTOMOTIVE & ISM

by Glenn SteinerSenior Manager for Embedded Processing SolutionsXilinx, Inc.

Dan IsaacsDirector of Technical MarketingXilinx, Inc.

David PellerinFounder and Chief Technology AdvisorImpulse Accelerated Technologies

One of the greatest challenges facing anembedded-systems designer is scoping thesystem’s performance requirements. Theinformation needed to identify the trueperformance requirements may not beavailable or may be difficult to obtain. Thebest estimates sometimes fail due to addi-tional unanticipated computational bur-dens. At other times, the analysis canindicate that an embedded processing sys-tem is not cost-effective for the data-pro-cessing demands. As a result, a systemsdesigner finds it highly desirable to have anarchitecture that is scalable to match thepotentially changing performance require-ments and to handle high-performancedata processing. A control plane/data planeprocessing architecture implemented insidean FPGA can meet these requirements.

What is control plane/data plane pro-cessing, and why might you need it foryour next embedded system?

In systems where it’s impossible orimpractical to do all the processing in soft-ware, designers can obtain additional per-formance in a variety of ways. They can usemultiple processors in symmetric or asym-metric processing configurations; implementhardware coprocessors; or split the data-pro-cessing tasks off completely into one or morededicated processing elements—as in controlplane/data plane processing.

In this method of programming, thedata processing is divided into two distinctplanes. The control plane represents algo-rithm elements that are not performancecritical, such as administrative tasks, userinterfaces and operating system function-ality. Meanwhile, the data plane representsthe movement of data through the system,for example, a video or audio stream, as well

processing. A highly parallel processing ele-ment, an FPGA in this example, handlesthe video processing while a medium-per-formance processor inside the FPGA man-ages the video-processing pipeline. Theprocessor might be dedicated to a singleapplication, or it might be running anoperating system such as Linux. The result-ing mixed hardware/software implementa-

tion distributes the processing to where itcan be best handled and results in a low-cost, high-performance data-processingsolution. Figure 1 shows a typical controlplane/data plane system.

FPGAs Enable Computation Load Balancing Short of using an expensive ASIC, FPGAsare the highest-performance and most cost-effective method of implementing stream-ing data-processing elements. FPGAs, dueto their flexible architecture, enable hard-ware designers to implement processingsystems consisting of elements that areboth paralleled and pipelined. This allowsdesigners to tune a system for both per-formance and latency.

Designers can then couple this dataplane solution to an external, discretemicroprocessor for control. Having thatprocessor inside an FPGA presents severaladvantages. An internal processor dramati-cally reduces the control latency betweenthe processor and the data plane elements.Such latency reduction can amount tomany processor cycles. An external proces-sor must communicate with the data plane.The communication channel may be 32 ormore bits with additional wires for address

as its processing. In the data plane, designersuse techniques such as pipelining to increasedata throughput. Typical applications forcontrol plane/data plane processing includestreaming video, network packet processingand high-speed signal processing.

Let’s look more closely at a controlplane/data plane application that involvesreal-time processing of streaming data. To

do that, we confront the challenge of iden-tifying a unique pattern in a high-definition(HD) video stream. This example is repre-sentative of many applications that require amix of high-performance data processingand control functions involving an embed-ded microprocessor.

A 720p, 60-Hz HD video stream oper-ates at a 74.25-MHz pixel rate. This repre-sents a required processing rate of 222.75Mbytes/second. If a hypothetical dual-core,dual-issue processor operating at 2.5 GHzwere to process this data, the optimalinstruction rate would be 10 giga-instruc-tions per second. Such a processor would becapable of executing 22.4 instructions perbyte of data processed. For some applica-tions this may be sufficient, but 22.4instructions represents very little data pro-cessing. Complex video-processing func-tions such as kernel convolutions, noisereductions and other filtering takes manymore instructions per second. The solutionis to create parallel or pipelined processingelements in the data plane.

HD video processing represents a com-mon real-world application requirementthat is solved very effectively by splittingthe problem into control and data plane


High-def in i t ion v ideo processing

represents a common real-world

appl ica t ion that is so lved very

ef fec t ive ly by spl i t t ing the

problem into contro l p lane

and data p lane processing.


and control. These additional wires maynecessitate a larger package for both theprocessor and FPGA, driving up systemcost. Alternatively, PCI Express® (PCIe)can provide a dramatic reduction in thenumber of pins. Unfortunately, not allprocessors and FPGAs support this rela-tively new interface and even when avail-able, PCIe components are more costlythan equivalent parts without PCIe.

Implementing a control plane processorand the data plane within an FPGAreduces parts count, board space and, fre-quently, power. The result can be a lower-cost solution. Hardened implementationsof processors such as the PowerPC®, orsoft implementations such as the XilinxMicroBlaze™ processors, are available inFPGAs. FPGA-based processors may alsobe configured based upon applicationrequirements. FPGA-based systems enablesystem-level tuning by providing the abili-ty to move decision and computationalfunctions between the processor and theFPGA logic.

Implementation of a Control/Data Plane System Certain tools can simplify the implementa-tion of an FPGA-based control plane/dataplane system. Two common approaches are

to assemble the system using wizards or to doso by modifying an existing reference design.

FPGA tools allow the rapid assembly ofa microprocessor system via wizards. Usingdrop-down lists or checkboxes, you simplyspecify the targeted part and the desiredprocessor and peripherals. Similarly, youmay use tools such as MATLAB® softwareto rapidly assemble a signal-processingpipeline with interfaces to the processor busfor control. Alternatively, you can constructa digital signal-processing pipeline via C-to-HDL tools. The control plane/data planesystem can be connected by simply match-ing bus interfaces. Figure 2 presents theintroductory window to start the wizard aswell as the final system the wizard builds.

The second method involves modify-ing an existing reference design. FPGAreference designs continue to evolve andare becoming market focused. The one weused in our example case study has a com-plete microprocessor system with memoryand peripherals, as well as a 720p HD dig-ital signal-processing pipeline. Thus, thissystem represents a complete control/dataplane solution. The reference designdemonstrates the processor controllingthe gain and FIR filter in the pipeline.Using C-to-FPGA tools to create theobject detection and highlighting module,

the complete system was functional in lessthan 20 hours of effort.

The processor may control the datapipeline by using supplied drivers thatboard support packages provide. Drivers arenow available for Linux, enabling processorsto directly control the data-processingpipeline. Linux calls consist of opening theI/O device from the Linux application andthen reading or writing to the device.

Case Study of an HD Video Recognition SystemObject detection and recognition are ofinterest in industries ranging from sur-veillance to medical imaging and factoryautomation. The higher the image resolu-tion, the more accurate the object recog-nition. Thus, HD video cameras and theassociated HD video stream processingcapabilities are highly desirable. Our casestudy starts with the question (inspired bya famous animated movie): can we detectand highlight a clown fish in a 720p HDvideo stream?

The design requires a color patternmatch across 16 bits to recognize the clownfish stripe pattern. Once recognized, thefish is to be highlighted with a movingspotlight on the display. Further, the size ofthis spotlight is designed to grow or shrinkdepending on the likelihood of a match (in


UserInterface

Data In Data Out

Control PLaneProcessor

(OS)

Control

Data Plane

Coprocessor Coprocessor Coprocessor

Processor Bus orDedicated Control

Channels

Figure 1 – Typical control plane/data plane processing system



reality, the system reduces image bright-ness in all areas except for the spotlightaround the fish). The spotlight sizing andshape calculations, and the comparisonsrequired to search for the fish at each pixellocation, represent a significant amount ofcomputation to perform at every 74.25-MHz clock cycle. It is quickly determinedthat the processing requirements are wellbeyond the capabilities of a typical embed-ded processor.

In such situations, it is best to offload thestreaming-data processing to a coprocessor.Implementing a coprocessor in an FPGAprovides the flexibility to architect a solu-tion that meets performance requirementsat the lowest system cost. As a result, anFPGA-based control/data plane architec-ture is the best choice. An FPGA embed-ded processor controls, via a bus interface,the digital signal-processing pipeline that isresponsible for receiving the video data,detecting the fish, highlighting the fish andoutputting the video data for display.

Thus, for this object detection andhighlighting example, we chose a Micro-Blaze embedded processor operating at 50MHz to manage and control a data-pro-cessing pipeline operating at 74.25 MHz,and to manage the user interface. Freed of

the burden of performing actual videoprocessing, the processor can handle manyother functions, such as host data com-munication via Ethernet, graphical userinterface management and fine control ofthe data-processing pipeline (for example,gain control on a frame-by-frame basis).

An operating system such as Linux isideal for providing the multitasking capa-bilities, network stack and language sup-port for user interfaces. Figure 3 showsthe block diagram of the implementedsystem. This solution enables an opti-mized balance between the need for highdata-processing bandwidth and softwarecontrol of how the data is processed.

HW/SW Co-design with C-to-FPGA tools C-to-FPGA compilers let developers tack-le software/hardware development using anew set of development tools and newtechniques. Developers can start by cod-ing their algorithm in software. Experiencetells us that developing algorithms in soft-ware is more efficient than developing inhardware for several reasons. First, a soft-ware language such as C enables program-mers to develop algorithms at a higherlevel than they can when using hardwaredefinition languages like Verilog or

VHDL. Next, the debug and test tools forC tend to run faster and more effectively,and typically are easier to use, than corre-sponding hardware development tools. Calgorithms run at full speed on a targetprocessor compared with hardware algo-rithms, which are initially tested anddebugged on a simulator. Finally, C devel-opment tools tend to be significantly lessexpensive than their hardware counter-parts. Thus, engineers generally prefer todevelop algorithms in C or a similar high-level language.

Once an algorithm has been provenusing a software language such as C,designers must measure its performanceand determine whether the algorithm canrun entirely on an embedded processor orentirely in hardware, or if a mixed hard-ware/software coprocessing implementa-tion is best. Performance analysis tools areavailable to aid in such determination. Ifthe code must be converted to hardware,the designer must either convert the algo-rithm by hand or use a C-to-FPGA tool.

C-to-FPGA tools enable developers torapidly convert algorithms to HDL, opti-mize the generated hardware processorand perform what-if scenarios balancingperformance and FPGA resources. They

Figure 2 – Wizard start screen and completed system


also enable software engineers to becomehardware engineers, giving them access tothe high-performance data-processinglogic within an FPGA.

Connecting Processor to FPGA with Linux Linux suppliers working with FPGAmanufacturers have developed driversthat enable processors to communicatewith and control FPGAs. First, you mustconfigure Linux for the I/O device. Thisis a two-step process. To begin, load thecustom driver into the Linux kernel:

module_init(xll_example_init);

then, register the driver to a specific devicenumber (for example, 253):

err =

register_chrdev_region(devno, 1,

"custom_io_example");

bash# mknod /dev/custom_io_exam-

ple0 c 253 0

Communication is accomplished byopening the I/O device and then readingor writing to the device, as shown bythese sample code segments:

// Open custom I/O device from

Linux application

int custom_io_ex_open(struct inode

*inode, struct file *filp);

// Read / Write to custom periph-

eral I/O using standard Linux

read/function function calls

ssize_t custom_io_ex_read(struct

file *filp, char __user *buf,

size_t count, loff_t *f_pos);

ssize_t custom_io_ex_write(struct

file *filp, const char __user *buf,

size_t count, loff_t *f_pos);

The FPGA Advantage Signal-processing systems frequently havedata bandwidth requirements exceedingthat which can be economically achievedvia general-purpose processors. In suchcases, designers commonly split their data-processing system into two processingfunctions, using a general-purpose proces-sor for control processing and a hardwareaccelerator, such as an FPGA, for the dataprocessing. This constitutes a controlplane/data plane processing system.

FPGAs are idea for implementingboth control and data plane functions.

An FPGA may contain one or more softprocessors such as the MicroBlaze and/orhard processors such as the PowerPC.Integrating them into the FPGA enableslow-latency and high-bandwidth comuni-cation between the control plane proces-sor and the data plane processing system.

Assembling such systems for both theembedded and data-processing functionsis straightforward with wizards and pre-built reference designs. C-to-FPGA toolshelp streamline this process by convert-ing algorithms prototyped in C to high-performance hardware processingelements. Finally, Linux drivers are nowavailable, enabling easy coding of com-munication and control between theprocessor and the FPGA signal-process-ing pipeline.

Our case study is a typical example ofan application where processing an HDvideo stream is not practical with a low-cost general-purpose processor but is eas-ily accomplished via a signal-processingpipeline inside an FPGA. The processoris then freed up to provide user interface,networking and system managementfunctions while monitoring and control-ling the signal-processing pipeline.


FilterControl(UART)

DVI DVIDVIIn

GammaIn

2D FIRFilter

Fish FinderFilter

GammaOut

DVIOut

XilinxMicroBlazeProcessor

System

The processor is used todynamically configure filters.

Processor Local Bus (PLB)

Figure 3 – Clown fish finder control plane/data plane system


Six powerful Vir tex®-6 FPGAs, up to 24 Million ASIC gates, clock speeds to710 Mhz: this new board races ahead of last generation solutions. The Dini Group

has implemented new Xilinx V6 technology in an easy to use PCIe hosted or standalone board that features:

• 4 DDR3 SODIMMs, up to 4GB per socket

• Hosted in a 4-lane PCIe, Gen 1 slot

• 4 Serial-ATA ports for high speed data transfer

• Easy configuration via PCIe, USB, or GbE

• Three independent low-skew global clock networks

The higher gate count FPGAs, with 700 MHz LVDS chip to chip interconnects, provideeasier logic partitioning. The on-board Marvell Dual Sheeva processor provides multiplehigh speed interfaces optimized for data throughput. Both CPUs are capable of2 GFLOPS and can be dedicated to customer applications.

Order this board stuffed with 6 SX475Ts—that’s 12,096 multipliers and more than 21million ASIC logic gates—an ideal platform for your DSP based algorithmic accelerationand HPC applications.

Don’t spin your wheels with last year’s FPGAs, call Dini Group today and run yourbigger designs even faster.

www.dinigroup.com • 7469 Draper Avenue • La Jolla, CA 92037 • (858) 454-3419 • e-mail: [email protected]

DNV6F6PCIe


An FPGA Route Toward Implementing DisplayPort An FPGA Route Toward Implementing DisplayPort An FPGA Route Toward Implementing DisplayPort

XP LANAT ION:FPGA 101

Nor is this desire for more bandwidthlimited to the consumer market. Thebroadcast equipment, digital display, scien-tific and medical markets continue to drivethe need for bandwidth for displays inapplication such as MRI and CT scans,command and control, daisychained dis-plays, electronic billboards and 3-D ren-dering of DNA, aircrafts, weather andvarious parts of the human body.

To help meet this bandwidth demandwhile controlling the cost, the VideoElectronics Standards Association intro-duced DisplayPort into the market in2007 and has since worked diligentlywith partners to refine it. Today, VESADisplayPort 1.1a supports a data rate of2.7 Gbits/second per channel with up tofour channels in a single cable, whileDisplayPort 1.2 doubles the data rate to5.4 Gbps (enough to handle, for exam-ple, 3,840 x 2,400 pixels at 60 Hz, orfour monitors at 1,920 x 1,200 or, alter-natively, a 3-D display at 120 Hz and2,560 x 1,600 pixels). DisplayPort sup-ports both embedded displays, such asthose within a laptop computer, and box-to-box connections between a video“source” device (set-top boxes, DVDplayers, PC graphic cards, laptops) and aseparate display device, commonlyreferred to as “sinks” in HDMI andDisplayPort standards documents.

by Carol FieldsSenior Staff Product Marketing ManagerXilinx, Inc. [email protected]

Neal KendallMarketing ManagerQuantum Data, Inc.

At the Consumer Electronics Show inJanuary, several major flat-panel TV and dis-play companies introduced high-definition3-D-enabled TVs and monster 4K x 2KLCD monitors, dramatically upping the anteon the number of bits that will be flyingbetween your TV, display and other elec-tronics in your home, car or mobile device.With these new TVs, sports fans will be leap-ing for joy over features like a 176-degreefield of view, 1,200:1 contrast ratio and 450nits of brightness—more than enough topenetrate even the darkest man cave.

But for design engineers creating theseTVs or electronics that will connect tothem, all the new features translate intoone hefty bandwidth requirement. Forexample, a quad 4K x 2K HDTV with 8megapixels (providing digital cinema qual-ity for the home) would consume fourtimes the bandwidth required for optimaloperation of today’s top-line TVs and mon-itors. That translates to a lot of bits racingbetween the set-top box and the HDTV.


There’s a simpler way to design whizzy new 3-D TVs. The XilinxSpartan-6 FPGAConsumer Display Kit and IP will get you there.


Figure 1 – TED Spartan-6 FPGA Consumer Video Kit

While some chip makers have intro-duced standard off-the-shelf transmittersand receivers for these applications, Xilinxhas launched a flexible programmable ver-sion of the VESA DisplayPort v1.1a solu-tion called the Xilinx LogiCORE™DisplayPort v1.1 (Version 1.2 is coming inIDS 12.1). This IP is readily available forXilinx customers, but before you get start-ed, it’s wise to review some additional back-ground information about some of the keyfunctions of the standard, such as thePolicy Maker, and how you can implementthem in Xilinx® FPGAs using our upcom-ing XAPP “Implementing a DisplayPortSource Policy Maker Controller SystemReference Design using a MicroBlaze™Embedded System” in the Tokyo ElectronDevices (TED) Spartan®-6 ConsumerVideo Kit (http://www.teldevice.co.jp/eng/).

Policy Maker—A Key DifferenceThe DisplayPort protocol marks a signifi-cant change in connectivity technology forthe display market. This transition is similar

to the Intel-led PC market’s migration fromthe parallel PCI bus to serial PCI Express.For the display market, VESA is leading themigration from protocols like VGA, DVIand HDMI to a high-speed serial-transceiv-er, packet-based layer architecture protocolwith DisplayPort. Unlike parallel protocols,serial packetized protocols have an addeddegree of complexity to achieve and main-tain the connection or link. In the VESADisplayPort 1.1a specification, the controlfunctions are grouped as the Link PolicyMaker and the Stream Policy Maker. TheLink Policy Maker manages the link and isresponsible for keeping the link synchro-nized. Its jobs include link discovery, initial-ization and maintenance. The StreamPolicy Maker manages the transport initial-ization and maintains the isochronousstream by controlling sequences of actionsby the underlying hardware.

These elements of the Policy Maker areimplementation specific and may be han-dled within the operating system, softwaredriver, firmware or FPGA logic. Many

commercial DisplayPort ICs hide the Linkand Stream Policy Maker implementationfrom the designer to simplify usage. If yourdisplay requirements match an off-the-shelfDisplayPort ASSP, the price and ease of useare hard to beat. However, designers look-ing to differentiate their products from thecompetition turn to FPGAs.

Source Policy Maker Reference Design The DisplayPort Source Policy MakerController System Reference Designusing a MicroBlaze Embedded Systemimplements functionality similar to thatof a commercial off-the-shelf DisplayPortchip with the added advantage of beingavailable in source code for customiza-tion. By using the Source Policy MakerController System Reference Designapplication note, you don’t need to learnthe details of the Policy Maker to getstarted, just connect up the exampledesign and go.

In addition to this source code design,the DisplayPort transmit (Tx) or source



AB-32 AUX Channel

Main Link

Video Data

Audio Data

DisplayPort Source LogiCORE

Source Policy Maker

Controller

Line Buffer

Secondary Channel

PLL

GTPTransceivers

TTL Input

Differential IOConfiguration Space AUX Channel

Hot PlugDetect

Main LinkDisplayPort Cable

To Rx

HPD

ink_clk

Main Link

Control

Figure 2 – DisplayPort Source Policy Maker Controller System Reference Design and LogiCORE Source High-Level Block Diagram


core comes with additional example designsfor a finite state machine (FSM) controller,

The DisplayPort Tx FSM controllerexample design (which has the top-levelfile name dport_tx_fsm_cntrl) comes witha DisplayPort LogiCORE source designexample. This simple proof-of-conceptdesign contains an RTL-based finite statemachine to implement a simple PolicyMaker that demonstrates the proper start-up procedure. The dport_tx_fsm_cntrldesign example has the benefit of requir-ing less time to simulate than the othersample designs.

The Source Policy Maker ControllerSystem Reference Design using aMicroBlaze Embedded System XAPP,which is due out in late May, has the top-level ISE® project name dport_source_ref_design.xise (you will find itsoon at http://www.xilinx.com/products/ipcenter/EF-DI-DISPLAYPORT.htm).This design allows you to modify theSource Policy Maker Controller sourcecode for your own needs. It works in con-junction with the DisplayPort LogiCOREv1.2 (IDS 12.1) release and the Spartan-6TED Consumer Video Kit.

These two example designs contain thebasic procedure for setting up the cores andmaintaining the link and stream. Note thatthe TED Spartan-6 Consumer Video Kitdoes not include a DisplayPort cable.

Functional OverviewBoth source and sink/display specificationsuse the Policy Maker; however, Xilinx hasimplemented them differently with respectto the DisplayPort LogiCORE. The PolicyMaker function on the sink (Rx) side ismuch simpler than that of the source (Tx)side. Xilinx LogiCORE implemented most


AUX Channel APB-32

Main Link

Video Data

Audio Data

DisplayPort Sink LogiCORE

I2S MasterController

DeviceController

I2C MasterController

DisplayPort CableFrom Rx

ink_clk

HPDLine Buffer

EDID ROM

Main Link

Hot PlugDetect

AUXChannel

Main Link

PLL

GTPTranceivers

LVCMOS 3.3V

Differential IO

HDCP

Control

DPCD

Configuration Space

SecondaryChannel

Receiver

For the display market, VESA is leading the migration from protocols like VGA, DVI and HDMI to a high-speed serial-transceiver, packet-based

layer architecture protocol with DisplayPort.

Figure 3 – DisplayPort Rx High-Level Block Diagram



Designers take for granted the automation that Enhanced DisplayIdentification Data structures provide. One way to get a sense of theirimportance is to imagine what life would be like if EDIDs did not exist.

In modern home theater environments, the absence of EDIDs in a sink devicesuch as an HDTV, audio/video receiver or video processor would mean thatthe consumer would need to review the specifications of these devices andlearn their capabilities. The user would then have to set the audio and videoformats to ensure that the source device’s sound and image output did notexceed the capabilities of the audio system or display. If the specificationswere not available or were not intelligible, the user would have to employ trial-and-error methods to discover the optimal settings.

For PCs, the absence of an EDID in the monitor would mean that thegraphics card would have a default resolution. If the resolution had to bechanged, users would have to set it manually for audio in a manner similar tothat described above for home theater systems.

What Information Does an EDID Contain?EDIDs provide a variety of information describing the capabilities of a displaydevice or audio system. The data is arranged in 128-byte blocks. The VESAstandard requires just a single block for VGA, DVI and DisplayPort. However,DisplayPort EDIDs will be enhanced to support an option for an extensionblock in order to define additional capabilities not covered in Block 0. TheConsumer Electronics Association requires both the initial VESA block (Block0) and one or more extension blocks; therefore, HDMI display devices haveboth the VESA block and the CEA extension block.

EDIDs are stored in an EPROM of an audio or video rendering device.Because of the limited storage space, EDID data is stored in a very compactmanner using bit- or byte-oriented storage. In some cases the values aretruncated or abbreviated to conserve space.

The list of display capabilities in the base EDID block is a long one.The block includes a header with 8 bytes of fixed data; vendor, productand version information; basic displayparameters (video input definition, screensize and gamma) and color characteristicssuch as chromaticity and white point, alongwith timing information. The latter includesestablished and standard timings; a timingformula; and detailed timing descriptors. TheVESA E-EDID standard requires that the firstdetailed timing descriptor be the “preferred”video format, with subsequent ones listed inorder of decreasing preference. Consumerelectronics equipment with HDMI interfacesrequires both the VESA block and at least oneextension block that defines the more impor-tant audio and video capabilities for HDTVs oraudio systems.

EDID OperationA source device reads the EDID of a sink in response to a connection event—called a hot plug—downstream at the display. The EDID is transmitted overthe Display Data Channel (DDC) for consumer electronics products using VGA,DVI and HDMI, or over the auxiliary channel for monitors with DisplayPortinterfaces (see Figure A). In the simple case of a source directly connected toa display device, the EDID is read when the hot-plug lead is asserted.

In cases where there is a repeater device between the source andthe sink—common in home theaters—the EDID is read when the audiosystem transmits a hot-plug pulse in response to a connection eventdownstream at the sink. A repeater will forward the EDID directly to thesource device or, in the case of an audio system, will substitute its audioEDID to the source, as shown in Figure B.

What if EDIDs Did not Exist?

5V5V

EDID Read

EDID ReadRequest

EDID

EDID

HP

HP

5 volts presented to AVR & sink

“Hot plug” asserted & forwarded to source

Source & AVR systems request EDID

Sink sends EDID to AVR;AVR updates & forwards to source

SourceAVR

HDTV (Sink)

5V

EDID Read

EDID

HP

5 volts presented HDTV (sink)

“Hot plug” asserted to source

Source requests EDID

Sink sends EDID over DDC

Source

HDTV (Sink)

Figure A – Typical EDID operation involving a source device (set-top box) and sink (HDTV)

Figure B – Typical EDID operation with an audio system (AVR)



The ecosystem is not complete without considering validation of an EDIDimplementation. Given the importance of EDIDs in simplifying and optimizingthe user experience, it is critical to get an EDID implementation correct.Various types of tests are necessary to ensure proper operation, such as theone shown in Figure C.

Many Levels of TestingThe most basic type of testing conducted in a development lab environmentis functional testing, or simulating a properly operating device with testequipment. The test equipment interoperates with the sink device whoseEDID is being verified. Conversely, when developing a source device, func-tional testing is used to verify that a source responds properly to a known-good EDID that the test equipment emulates. In most cases, a developer willwant to test the source device against a variety of known-good EDIDs—oldand new—to verify proper operation. Testing the EDID of a repeater deviceis more challenging and requires test equipment that can emulate both aknow-good source and a known-good sink device.

Developers should also conduct fault testing on their EDID-equippeddevices. Fault testing is a more rigorous type of testing in which the testequipment is configured to simulate a series of anomalous behaviors toverify that a device operates as desired in suboptimal conditions. Fault test-ing is important to ensure interoperability.

For testing source devices, test equipment emulates a rendering deviceand is provisioned with a flawed EDID. Test engineers can introduce a vari-ety of anomalies to ensure that a source device responds appropriately. Toemulate a flawed EDID or a series of them, it is essential to have an EDIDeditor utility in the test equipment. With the editor’s help, developers canquickly construct a variety of modifications to existing EDIDs to verify thata source responds in the proper way.

EDID fault testing of a sink device or the input side of an audio systeminvolves requesting EDID data in a way that is not typical. For example,although permissible, reading the EDID one byte at a time may result in anundesirable response from the display.

Developers of repeater devices, such as an audio or video processor, maywant to mix functional testing and fault testing. For example, a developer could

use test equipment to emulate a known-bad EDID on the sink device and emu-late a known-good source device.

Since improper EDIDs can cause significant interoperability problems,an HDMI device has to pass compliance testing in an authorized test cen-ter (ATC) in order to use the HDMI logo. VESA recently approved a similarcompliance test for DisplayPort devices and it will soon be required forDisplayPort logo use.

Compliance test specifications define the series of tests necessary toensure that a device operates correctly in accordance with a standard.Compliance testing of an EDID begins by defining the intended capabilitiesof the sink device. The capabilities are entered or imported into a compli-ance test application residing on a piece of test equipment. The pass/failresults for each distinct test in the series are shown in a report. Designersmust take care to ensure that a failure is genuine and not the result of animproper configuration.

Since it’s expensive and time-consuming to submit and resubmit acommercial product for testing, the best approach is for developers to pro-cure test equipment for their lab to conduct precompliance testing. In manycases developers can obtain the same equipment used in the ATCs for theirown lab. This significantly reduces the chance of failure in the ATCs. In thecase of EDID compliance testing, the approved test tool for HDMI is com-mercially available from Quantum Data. A VESA-approved EDID compliancetest tool for DisplayPort is anticipated to be commercially available soonfrom Quantum Data as well.

Although the goal of compliance testing is to ensure that devices inter-operate, it is often not sufficient in and of itself because of the diversity ofequipment and suppliers. Therefore, additional interoperability testing isoften needed. One such area where interoperability testing has a highvalue is in ensuring backward compatibility.

To support backward compatibility with existing source devices, newEDIDs must include all fields and blocks that past versions of EDIDs havehad. The CEA extension blocks have length fields so that older sources canskip newer, unsupported data blocks. Backward compatibility can be veri-fied with the sink device under development but it is often more conven-ient to use test equipment because it enables developers to more quicklyupdate EDIDs in the emulated sink for testing.

In developing a new source device, engineers may want to verify that itinteroperates with the EDIDs of older sink devices. Having test equipmentthat can emulate a variety of older EDIDs is essential.

During all types of testing in the lab—functional, fault, compliance andespecially interoperability—the EDID transactions can be monitored. Thiscan help identify the root cause of some EDID-related interoperability prob-lems, particularly those related to timing and responses to hot-plug events.

EDIDs consist of complex data sets that are critical in simplifying and opti-mizing the user experience in both PC and consumer electronics environments.Ensuring that EDIDs are implemented correctly is a key part of the developmentprocess. Quantum Data is a recognized authority on the verification of EDIDimplementations. Its test equipment and associated test applications are usedin the authorized test centers and by developers all over the world.

— Carol Shields and Neal Kendall

5V

EDID Read

EDID

HP

5 volts presented HDTV (sink)

“Hot plug” asserted to source

Source requests EDID

Sink sends EDID over DDC

Source

HDTV (Sink)

Figure C – Setup for testing a source device with test equipment emulating a display device

of the sink Policy Maker function insidethe LogiCORE. The RTL-based sink con-troller provides the remaining portion. ThePolicy Maker’s function on the source side,being much more complex, is available as asource code reference design.

Let’s take a closer look at the source-side Policy Maker. This allows the system

designer maximum flexibility in function-ality and implementation. The top level ofthe example design contains two high-level component instantiations of thecore: the XAPP Implementing aDisplayPort Source Policy MakerController System Reference Design usinga MicroBlaze Embedded System and theDisplayPort core source (Tx) design.Xilinx divided the core’s implementationinto atomic link functions called the MainLink, the Secondary Channel and theAUX Channel protocol. The Main Linkprovides for the delivery of the primaryvideo stream. The Secondary Channel,which Xilinx will include in a futurerelease of the core, will integrate the deliv-ery of audio information into the MainLink during a blanking period.Meanwhile, the AUX Channel establishesthe dedicated source to the sink commu-nication channel (see Figure 2).

Xilinx added a line buffer to the userdata interface so users can easily implementthe example design in an FPGA (seeFigures 2, 3 and 4). The Policy Maker or

Device Controller in Figure 3 on the sinkside is part of the sink design example thatis available from CORE Generator™.

MicroBlaze Processor Plays Central RoleXilinx designed the Source Policy MakerController to be used with the core sothat it functions in much the same way as

an ASSP DisplayPort source device. Werecommend that you use a MicroBlazeembedded or external processor to prop-erly initialize and maintain the link. TheXAPP contains a preconfigured version ofthe Policy Maker Reference Designimplemented in a MicroBlaze processorinside the FPGA, allowing you to imme-diately turn this design into hardware.When it becomes available, the referencedesign will contain source code designerscan modify.

The “logic” portion of the SourcePolicy Maker Controller design sits ontop of a MicroBlaze processor and usesI2C commands to control the link,stream and configuration space. The Ccode implements the Policy Makerinstruction controls, the top-level instan-tiation files and the EmbeddedDevelopment Kit (EDK). Meanwhile,Xilinx provides Software DevelopmentKit (SDK) project files to give designersmaximum implementation flexibility.Xilinx also provides Policy Maker Csource code for applications with existing

control-plane processors. Designers canadd this source code to the existing con-trol software inside or outside of theFPGA. The license agreement providesthe option of implementing the controllerportion outside of the FPGA (that is, inan external processor) as long as the codeis used in conjunction with the core.

Designers can modify the XAPP designusing the Xilinx embedded hardwaredesign kit with Xilinx Platform Studio (theEDK) or Xilinx embedded software designkit with the SDK. In general, FPGAdesigners typically use the EDK and soft-ware developers rely on the SDK.

The EDK flow produces an intermedi-ate net file (NGC) that you can incorporateinto the top-level ISE® project prior toimplementing the design. The NGC filecontains the MicroBlaze code as part of theBRAM initialization.

Fast TurnaroundThe EDK flow generally takes more timewhen you have modified the software;however, once you’ve generated a netlist,you no longer need the EDK or SDK. TheSDK flow modifies the FPGA bitstream,updating only the contents of theMicroBlaze code in BRAM. This SDKflow provides a faster turnaround time forsoftware modification; however, in thisscenario, you must use the SDK everytime you generate a bitstream. The XAPPwhite paper on this topic contains detailedinstructions on how to use the XilinxFPGA Embedded Software DevelopmentKit to run this design.

The “Getting Started Guide” containsinformation on ordering and licensing, sim-ulation only, full-system hardware evalua-tion as well as technical support. In addition,it includes script files you can use to gener-ate the example designs as well as a descrip-tion of how to simulate using the sample testbench and sample pattern generator.

You can use the design with either thefull or evaluation version of the XilinxDisplayPort LogiCORE, downloaded in theTED Spartan-6 FPGA Consumer VideoKit with a DisplayPort FPGA MezzanineCard card (http://www.xilinx.com/products/devkits/TB-6S-CVK.htm).



Ser

ial T

rans

ceiv

ers ■ 8B/10B, AC Coupled

■ 1,2,4 Lanes

■ 1.62, 2.7 Gbps

■ 135 MHz or 108 MHz Reference Clock

■ Aggregate Bandwidth of 10.8 Gbps

■ Link Symbol Rate (after 8/B/10B) 8.64 Gbps

Figure 4 – VESA DisplayPort v1.1a Main Link


The Policy Maker on the source sidecontains a state machine that connects tothe processor interface through an AMBA®

APB port or a 32-bit PLBv46 bus using anAMBA-to-PLB bridge. Xilinx has stored aninstruction set in Block RAM that you canmodify. The C++ code Xilinx used to trainthe link was compiled using the GNU C++compiler and has been fully tested on a softMicroBlaze processor implemented insidethe FPGA using Xilinx’s EDK PlatformStudio processor design suite. The refer-ence design contains a complete XilinxSDK project. The sample test bench con-nects a 135-MHz clock to a VID clock,and a 100-MHz clock to the APB clock.Xilinx has checked that it has appropriate-ly connected all the inputs. Reset is avail-able at the top-level block.

Extended Display IdentificationAn extremely important part ofDisplayPort is interfacing different devicesvia VESA’s Enhanced Display Ident-ification Data structures. EDIDs are real-ly not new. In fact, for a number of yearsdesigners have been using a variety ofvideo interfaces to read sink device param-eters from EDIDs to interface devices.However, these early EDIDs and relatedinterface technologies generally didn’tinclude a sophisticated configurable com-munications channel. Now, however, withDisplayPort, VESA has added intelligenceto the system to negotiate capabilitiesbetween the source (for example, a set-topbox, DVD player or PC graphics card)and sink device (such as a display moni-tor) and to optimize communicationparameters (for a tutorial on VESAEDIDS, see the sidebar). The variablesnegotiated for DisplayPort v1.1a include

the number of lanes (one, two or four),the data rate per lane (1.62 or 2.7 Gbps),the voltage swing (0.2, 0.6, 0.8, 1.2 V),four levels of channel pre-emphasis and adown-spreading of the link clock.

The Rx sink example design providedwith the LogiCORE from the COREGenerator provides an example EDID (seeFigure 3) that is intended to be read by asource device in order to ensure that theuser’s viewing experience is optimal.

The sink example design implements theEDID data structure in BRAM inside theFPGA. The DisplayPort source code enablesan I2C protocol over the AUX Channel.Figures 3 and 4 show a block diagram of aDisplayPort sink connected to a source. TheLink and Stream Policy Maker on the sinkside is part of the sink core; however, theLink Policy Maker on the source side is morecomplex and will be provided as a sourcecode with the reference design. The EDIDinterfaces to the Rx sink side through theI2C interface.

The I2C protocol is ideal for boltingonto the EDID data structure and iscommonly used for this type of applica-tion. The I2C controller locates and man-ages the data found in the EDID andpasses it to the sink core through the I2Cinterface protocol (over the AUXChannel) through a serial interface. Inoperation mode, you do not need to beaware that the EDID is being accessed.By probing the I2C bus, you can monitorthe content of the ROM. In debug mode,you can modify the I2C controller andoverride the 3-bit content found in theEDID ROM. The I2C supplies the con-trol signals, which when tied to the prop-er open collector output provide the I2Cmaster interface.

The sink includes a data structure calledthe DisplayPort Configuration Data(DPCD), which stores configuration dataand acts as a communications mailbox thatboth the sink and source can read and writeto. The source generally consumes the con-tents of the DPCD across the AUX Channel(see Figures 3 and 4).

Policy Maker Link Training The process of establishing communicationson the DisplayPort link is called “link train-ing.” During link training, the core at thestart of communication will try to optimizethe link speed and power used while mini-mizing errors. If there are problems duringdata transfer, the core will automaticallyrepeat link training to adapt to the changingconditions. Communications between thesource and sink packets take place over thebidirectional half-duplex 1-Mbps AUXChannel. Video and audio data is trans-ferred on the main link lanes (1, 2 or 4),which are high-speed gigabit transceiverchannels going from the source to the sink.

The core performs link training in twophases: clock recovery followed by channelequalization, symbol lock and interlanealignment. During the first phase, thereceiver’s PLLs are locked to the incomingsignal and a link clock is recovered. Duringthe second phase, the system optimizeschannel equalization and establishes sym-bol lock and interlane alignment.

Here is a typical operation sequence ofthe Policy Makers on both the source andsink sides:

1. The Tx Link Policy Maker monitorshot-plug detect and, when detected,gives notice to the Stream SourcePolicy Maker. The Stream Source


I n D i s p l a y P o r t , V E S A h a s a d d e d i n t e l l i g e n c e t o t h e s y s t e m t o n e g o t i a t e c a p a b i l i t i e s b e t w e e n t h e s o u r c e

( f o r e x a m p l e , a s e t - t o p b o x , D V D p l a y e r o r P C g r a p h i c sc a r d ) a n d s i n k d e v i c e ( s u c h a s a d i s p l a y m o n i t o r )

a n d t o o p t i m i z e c o m m u n i c a t i o n p a r a m e t e r s .

Policy Maker reads the sink’s EDIDover the AUX Channel.

2. The Tx Link Policy Maker reads theDisplayPort Configuration Data fromthe sink via the AUX Channel.Depending on source and sink capa-bilities, it writes the configurationparameters to the link configurationfield of the sink DPCD and startslink training by writing to theTRAINING_PATTERN_SET byteof the sink DPCD. It then initiatessending the training patterns.

3. The Tx Link Policy Maker controlsthe clock recovery sequence byadjusting the voltage swing and bitrate if necessary with feedback fromthe Rx Link Policy Maker. Once thecore achieves clock recovery, linktraining moves on to the channelequalization stage, where pre-emphasis is adjusted if called for bythe Rx Link Policy Maker. The

receiver also establishes symbol lockand interlane alignment.

4. Once the core passes link training(that is, the system achieves bit lockand symbol lock), it is indicated with-in the DPCD. The Tx Link PolicyMaker reports the training status tothe Tx Stream Policy Maker, whichenables the isochronous stream alongwith stream attribute data transfer.

Policy Maker Additional FunctionalityBesides involvement in link training, the TxLink Policy Maker also uses the IRQ HPDsignals from the receiver to monitor for sinkevent notification and checks the link-statusfields of the DPCD to understand the rea-son for the interrupt. If it detects that thelink has lost lock, the Tx Link Policy Makermust retrain the link. It can also reconfigurethe link for increased or reduced main-linklane count if the receiver calls for it.

The Link Policy Maker also determinesthe order of multiple AUX request transac-

tions, since one transaction ends beforeanother can be initiated. Since a sink mayreply with a NACK or DEFER, the PolicyMaker must decide on the follow-up actionfor those cases. AUX transactions are limitedto 16 bytes of data, so the Policy Maker mustdivide larger transactions into multipletransactions with none larger than 16 bytes.

Thanks to this capability to negotiate andoptimize link settings, DisplayPort canachieve optimal results over varying condi-tions. The Link and Stream Policy Makersare the control functions that coordinate theprocess, allowing modern high-speed videoand audio transport. Xilinx’s Source PolicyMaker Controller System reference designusing a MicroBlaze Embedded System isdesigned to help you exploit these new capa-bilities to the fullest to bring to market yourfeature-rich display product. The XilinxDisplayPort LogiCORE offers a flexiblesource and sink solution with exampleEDIDs and source code ready to downloadinto the TED Spartan-6 Consumer VideoKit. Evaluation versions of this IP are avail-able at no charge. You can find everythingyou’ll need to get started as well as a link tothe XAPP Implementing a DisplayPortSource Policy Maker System ReferenceDesign using MicroBlaze Embedded Systemat http://www.xilinx.com/products/ipcenter/EF-DI-DISPLAYPORT.htm, coming in lateMay 2010.

References

1. Xilinx IP Center – DisplayPort LogiCORE:http://www.xilinx.com/products/ipcenter/EF-DIDISPLAYPORT.htm

2. Quantum Data 882E Video Test Instruments,http://www.quantumdata.com/pdf/882E_DP_DS_RevI.pdf

3. VESA DisplayPort Standard, v1.1a, January 2008

4. I2S Bus Specification, Philips Semiconductors, June 1996 (more on the I2S bus is athttp://www.nxp.com/acrobat_download/various/I2SBUS.pdf )

5. UG196 and UG 198, Virtex®-5 FPGA GTP Transceiver User Guides

6. UG386, Spartan-6 FPGA GTP TransceiverUser Guide

The authors would like to thank Carl Rohrer,Matt Ouellette, Tom Strader and Chris Arndtof Xilinx and Craig Bezek of Quantum Datafor their reviews and input.



JUST 10 YEARS AGO, you could

count on one hand the number of

electronic systems in an automobile.

Today, it is difficult to find a single

system in a car that isn’t electronic

or at least electromechanical, as

car manufacturers look beyond

engine block size and body design

to electronics to differentiate their

offerings. Xilinx® Automotive (XA)

Spartan® FPGAs are playing an

increasingly important role in this

auto electronics revolution. And this

is just the beginning.

In the past, auto electronics

suppliers mainly used FPGAs to

prototype their designs. To keep costs

down, they would almost always

move the design to an ASIC or

leverage an ASSP for final production,

despite the long arduous development

process. This is rapidly changing.

In 2004, Xilinx reached an

important milestone with the release

of its 90nm XA Spartan-3 FPGAs and

its commitment to full automotive

qualification. That FPGA family

achieved a cost level that allowed

customers not happy with the

spiraling costs of ASICs or limited

differentiation of ASSPs to use FPGAs

in production automobiles, not just

for prototyping.

In fact, today’s most advanced and

successful luxury automobiles have as

many as 18 Xilinx devices across driver

assistance, driver information and

infotainment systems. Now, with the

new XA Spartan-6 FPGAs, automotive

companies can bring more advanced

electronics to all of their vehicle lines,

not just their luxury models.

The XA Spartan-6 FPGA Advantage

Xilinx’s new 45nm XA Spartan-6

FPGAs have more than double

the effective logic of XA Spartan-3

devices, while also increasing

clock speeds, on-chip memory and

parallel DSP processing power. New

hard block functions like multiport

memory controllers and PCIe®,

combine with flexible Gigabit rated

SelectIO™ technology and available

Multi-Gigabit Transceivers to give

designers what they need to create the

industry’s most advanced automotive

applications at affordable price points.

For example, right out of the box,

the new XA Spartan-6 FPGAs support

faster DDR3 memories for camera,

video, or high resolution display-

based applications. Xilinx has also

made PCIe host interfacing available

for infotainment systems requiring

high throughput for multimedia

storage, playback, and transport. The

dedicated PCIe hard block and high

speed I/Os eliminate the need for

an external interface chip and keep

more programmable logic free for

application-specific features.

XA Spartan-6 FPGAs do all of

this while exceeding AEC-Q100

qualification standards to meet the

quality needs of the most demanding

automakers worldwide.

In the coming months, the XA

Spartan-6 value proposition will

become even more compelling as

Xilinx rolls out its XA Targeted Design

Platforms, allowing designers, in

turn, to further accelerate the pace of

automotive innovation.

To learn more about the XA

advantage, visit www.xilinx.com/

automotive.

A D V E R T I S E M E N T

By NICK DIFIORE

Drive the Automotive Market with a Distinct Advantage: Xilinx Automotive Spartan-6 FPGAs

About the Author: Nick DiFiore is the Director of Automotive System Architecture & Platforms based in Xilinx, Inc.’s Detroit offices. Contact him at [email protected]

The Automotive FPGA Market is projected to grow at 22% CAGR over the next decade.

© Copyright 2010 Xilinx, Inc. Xilinx, the Xilinx logo, Virtex, Spartan, ISE, and other designated brands included herein are trademarks of Xilinx in the United States and other countries. All other trademarks are the property of their respective owners.

Source: Semicast, October 2009

XAPP1065: Spread-Spectrum Clock Generation in Spartan-6 FPGAshttp://www.xilinx.com/support/documentation/application_notes/xapp1065.pdf

Consumer display applications commonly use high-speed LVDSinterfaces to transfer video data. Designers can use spread-spectrumclocking to address electromagnetic-compatibility issues withinthese consumer devices. This application note by Jim Tatsukawauses Spartan®-6 FPGAs to generate spread-spectrum clocks bymeans of the DCM_CLKGEN primitive.

Spartan-6 FPGAs can generate a spread-spectrum clock sourcefrom a standard fixed-frequency oscillator. The DCM_CLKGENprimitive can use either a fixed spread-spectrum solution withoutany logic, providing the simplest implementation, or a soft spread-spectrum solution using a state machine. The soft implementationadds flexibility but requires additional control logic to generate thespread-spectrum clock.

While the application note focuses specifically on LVDS displays,it’s applicable to other applications with similar DCM usage thatalso have need of spread-spectrum clocks.

XAPP1144: Virtex-6 Embedded Trimode Ethernet MAC Hardware Demonstration Platformhttp://www.xilinx.com/support/documentation/application_notes/xapp1144.pdf

This application note describes a system using the Virtex®-6 FPGAEmbedded Trimode Ethernet MAC Wrapper core on a Xilinx Virtex-6 FPGA ML605 development board. The embedded system is underthe control of a PC-based graphical user interface that provides accessto the Ethernet MAC’s features along with several data flow options.

The hardware demonstration platform shows how to integrate theEthernet MAC, its wrapper core and the Ethernet Statistics v3.3 coreinto a system, generate the required clock resources, handle theEthernet data flow using packet FIFOs and flow control, and connectto a physical interface. An embedded microprocessor manages theembedded system and the Ethernet MAC. Data flow occurs in the fab-ric logic; the microprocessor does not handle it in real time.

XAPP878: MMCM Dynamic Reconfigurationhttp://www.xilinx.com/support/documentation/application_notes/xapp878.pdf

This application note by Karl Kurbjun and Carl Ribbing pro-vides a method to dynamically change the clock output fre-quency, phase shift and duty cycle of the Virtex-6 FPGAmixed-mode clock manager (MMCM) through its dynamicreconfiguration port (DRP). The authors explain the behavior ofthe internal DRP control registers and offer a reference designthat uses a state machine to drive the DRP, ensuring that the reg-isters are controlled in the correct sequence. Due to its modularnature, the design can be used as a full solution for DRP or canbe easily extended to support additional reconfiguration states.The design also uses minimal Virtex-6 FPGA resources, con-suming only 24 slices.

The reference design supports two reconfiguration stateaddresses and can be extended to support additional states. Eachstate does a full reconfiguration of the MMCM, making it pos-sible to change most parameters. The design does not supportoutputs configured with fractional divider values. Nor does itsupport reconfiguring with fine phase shifting enabled.

XAPP876: Virtex-5 FPGA Interface to a JESD204A-Compliant ADChttp://www.xilinx.com/support/documentation/application_notes/xapp876.pdf

In this application note and accompanying reference design, authorMarc Defossez describes how to interface the Virtex-5 LXT, SXT,TXT and FXT devices featuring GTP/GTX transceivers to an ana-log-to-digital (ADC) converter compliant to JEDEC Standard No.204A (JESD204A) Serial Interface for Data Converters. Withsome restrictions, this design can also be used for ADC devices thatcomply with the older JESD204 standard.

The JESD204A standard describes a serialized interface betweendata converters and logic devices. It contains normative informa-tion to enable the implementation of designs that communicatewith JESD204A-compliant devices.


Application NotesApplication NotesIf you want to do a bit more reading about how our FPGAs lend themselves to a broad number of applications, we recommend these notes. If you want to do a bit more reading about how our FPGAs lend themselvesto a broad number of applications, we recommend these notes.

XAMPLES . . .

This application note shows how to implement a two-lane dualADC with each lane having a 14-bit resolution and running at125 Msamples/second. It provides an overview of how to imple-ment the serial data interface and the link protocol described inthe JESD204A standard.

The JESD204A standard describes the protocol for imple-mentation with general high-speed serdes devices. The Virtex-5TXT device contains GTX transceivers. The JESD204A stan-dard is interpreted accordingly, and a compliant interface isdelivered for GTX transceivers. The implementation described isfor a single device containing two converters, using one link oftwo lanes connected to the FPGA.

XAPP873: Virtex-5 FPGA Interface for Fujitsu Digital-to-Analog Converters with LVDS Inputshttp://www.xilinx.com/support/documentation/application_notes/xapp873.pdf

In a second application note, Marc Defossez describes how tointerface a Fujitsu MB86064 or MB86065 digital-to-analog con-verter (DAC) with parallel low-voltage differential signaling(LVDS) inputs to a Virtex-5 FPGA utilizing the dedicated I/Ofunctions of the FPGA family. The note and accompanying refer-ence design also illustrate a basic LVDS interface for connecting toany DAC with high-speed parallel interfaces.

The author demonstrates the implementation in hardwareusing the DK86065-2 Fujitsu development kit and the XilinxML550 and ML555 demonstration boards. Fujitsu has developeda passive interface adapter module for this purpose.

All three of the implementations described make use of theOSERDES I/O features of the Virtex-5 FPGA. Designers canextend the reference applications in resolution width and speedfor use in a wide range of applications. At the same time, youwill find techniques you can use to interface the Virtex-5 FPGAto other types of DAC components.

XAPP1064: Source-Synchronous Serialization and Deserialization (up to 1050 Mb/s)http://www.xilinx.com/support/documentation/application_notes/xapp1064.pdf

Spartan-6 FPGAs perform in a wide variety of applicationsrequiring various serdes factors up to 16:1, at speeds up to1,050 Mbits/second, depending on the application, speed gradeand package.

The input and output serdes blocks—ISERDES andOSERDES, respectively—in Spartan-6 devices simplify the designof serializing and deserializing circuits, while allowing higher oper-ational speeds. This application note by Nick Sawyer discusses howto efficiently use the ISERDES and OSERDES primitives in con-junction with the input delay blocks and phase detector circuitry.

Each Spartan-6 FPGA input/output block contains a 4-bitISERDES and a 4-bit OSERDES. You can cascade the serdesfrom two adjacent blocks (master and slave) to make an 8-bitblock. This gives the possibility of serdes ratios from 2:1 to 8:1on both output and input for both single- and double-data-rateI/O clocks.

XAPP880: SFI-4.1 16-Channel SDR Interface with Bus Alignment Using Virtex-6 FPGAshttp://www.xilinx.com/support/documentation/application_notes/xapp880.pdf

The 16-channel single-data-rate interface design described in thisapplication note targets a Virtex-6 FPGA, taking advantage ofChipSync™ technology, available for every I/O in all Virtex-6 devices.The design features the ability to dynamically adjust the delay of theclock path in the receiver with 78-picosecond resolution. Using thisdynamic delay feature, the receiver escapes the limitations of staticsetup/hold timing by creating its own dynamic setup/hold timing. Theinterface calibrates out process variations by finding the optimalsetup/hold timing for each individual device.

Author Vasu Devunuri shows a Virtex-6 FPGA SFI-4.1 interfacetalking to an SFI-4.1 interface in another device—either an ASIC or anFPGA—that supports a 16-channel SDR interface. Because the interfaceis a source-synchronous link, the receivers of both devices get their clockfrom the Tx side of the other device. The transmitter clocks originate inthe back-end systems and can be provided from a variety of sources, suchas an oscillator on the printed-circuit board. Because each of the 16 datachannels on the serial side of the interface runs at 4:1 serialization, thedata width on the parallel side of the SDR interface is 64 bits.

Based on the design and characterization described, the SFI-4.1 16-channel SDR reference design exceeds its performance targets under allconditions of process, voltage and temperature. However, designersshould be aware that any deviation from the timing budget candegrade the maximum performance.

XAPP1088: Correcting Single-Event Upsets in Virtex-4 FPGA Configuration Memoryhttp://www.xilinx.com/support/documentation/application_notes/xapp1088.pdf

Designers of high-reliability applications must be concerned with theeffect of single-event effects (SEEs) on FPGA configuration memory.Changes to configuration memory can cause changes in the function-ality and performance of the device. This application note by CarlCarmichael and Chen Wei Tseng describes the use of configurationscrubbing and readback in the Virtex-4 family of FPGAs for detectingand correcting single-event effects (SEUs) induced by cosmic rays.

In-orbit, high-energy charged particles (radiation) can have aneffect on the electronic components used in space-based and extrater-restrial applications. In particular, SEEs can alter the logic state of anystatic memory element (latch, flip-flop, routing pip or RAM cell).Because the user-programmed functionality of an FPGA depends onthe data stored in millions of configuration latches within the device,SEUs in the configuration memory array might have adverse effects onthe expected functionality.

Mitigation techniques, such as using the TMRTool to implementtriple-module design redundancy, or triple FPGA deployment, canharden the application against SEUs. However, designers must correctthe upsets so that errors do not accumulate.

Toward this end, the Virtex-4 FPGA SelectMAP interface providesthe most efficient, post-configuration read/write access to the configura-tion memory array. This application note focuses on configuration mit-igation operation performed via SelectMAP.


XAMPLES . . .

BASE PLATFORM KITSThe base-level Virtex®-6 and Spartan®-6FPGA evaluation kits provide all the ele-ments that engineers need to develop basicFPGA-based systems, including support forI/O, memory interfacing and more. Theseevaluation kits provide a flexible environ-ment for higher-level system design, includ-ing applications that need to implementfeatures such as Ethernet or DDR2.Equipped with industry-standard FMC con-nectors, these kits provide the capabilities forscaling and customization for specific appli-cations and markets.

The base platform comprises a robust setof well-integrated, tested and targeted ele-ments that enable customers to immediatelystart a design. These elements include:

• Xilinx FPGA silicon

• ISE® Design Suite Logic Edition

• Reference designs common to manyapplications, such as memory interfaceand configuration designs

• Development boards that run the reference designs

• A host of widely used IP, such as GigE,Ethernet and memory controllers

EMBEDDED, DSP AND CONNECTIVITY DOMAIN KITS The more you know in advance about yourdemanding design requirements, the moreXilinx can provide to help accelerate yourFPGA system design. Domain-specific plat-form kits provide additional integrated toolsand IP optimized for the three primaryXilinx FPGA user profiles, or domains: theembedded processing developer, the digitalsignal-processing (DSP) developer and thelogic/connectivity developer.

These platforms save time by providingan enhanced design infrastructure, enablingengineers to jump ahead in the developmentschedule to focus more time on creatingtheir own unique product differentiation.

Domain-specific kits extend the baseplatform with a predictable, reliable andintelligently targeted set of integrated tech-nologies, including:

• Xilinx FPGA silicon

• ISE Design Suite System Edition, supporting higher-level designmethodologies

• Targeted Reference Designs optimizedfor embedded processing, connectivityand DSP

• User interface enabling fast bring-up anddetailed evaluation of key capabilities

• Base development boards with addi-tional functionality provided via FMCdaughtercards

• Domain-specific IP includingMicroBlaze™, PCI Express® (gen1and gen2), XAUI, DSP filters andother functions required for embedded,DSP and connectivity-based design

• Operating systems (required forembedded processing) and software

Every element in these platforms is tested,targeted and supported by Xilinx and ourecosystem partners. Starting a design withthe appropriate domain-specific platformcan cut weeks, if not months, off yourdevelopment time.

Digital Signal Processing For DSP designers, the Xilinx DSP kits offeran easy way to explore the ultrahigh(GMACs/s) performance in Virtex-6FPGAs or the superior high-performance,low-cost advantages enabled by Spartan-6FPGAs. With support for design flows thatleverage MATLAB® and Simulink from


Xilinx’s Latest PlatformsXTRA, XTRA

Early last year, Xilinx introduced Targeted Design Platforms, a tremendous step forward in helping designers address what has beentermed the “programmable imperative”—the necessity to do more with less, remove risk wherever possible, and differentiate in order

to survive. The aim was to provide customers with simpler, smarter and more strategically viable design platforms for the creation of world-class FPGA-based solutions in a wide variety of industries (see cover story, Xcell Journal, Issue 68). In support of these platforms, Xilinxhas launched a family of evaluation and development kits, each of which combines all the individual components required to enable quickdevelopment right out of the box.

Development kits from Xilinx® combine “targeted” and “tested” resources in the form of hardware, development tools, IP and referencedesigns, enabling designers to immediately begin developing, integrating and debugging system logic, interfaces and software. FPGAMezzanine Card (FMC) daughterboards from Xilinx and third parties enable expansion with domain-specific, market-specific and customer-specific capabilities.

Whether for the general development needs that the Base Platform Kits address, or the specific needs of DSP, embedded or high-speedconnectivity development with the Domain-Specific Kits, Xilinx has worked to deliver complete solutions that will help designers get offthe ground faster and, thus, get their products to market in much less time.

by Mark GoosmanSenior Marketing Manager, Xilinx Platform Solutions

The MathWorks, or RTL examples, sea-soned DSP algorithm developers can getstarted quickly. The kits include versionsof the reference designs that support eitherflow, allowing you to begin your design inthe manner you find most familiar.

Embedded Design The Xilinx embedded kits enable rapidsoftware application development as well aseasy customization of the processor hard-ware subsystems. Software developers canbegin building applications right out of thebox using the Xilinx SDK (an Eclipse-based IDE) and lightweight real-time oper-ating systems like Micrium uC/OS-II. Oryou can develop applications using a full-fledged embedded Linux board supportpackage from PetaLogix. The TargetedReference Design for the Xilinx embeddedkits includes a MicroBlaze soft-core proces-sor subsystem design.

Connectivity Designing systems that incorporate high-speed serial I/O, especially those that workwith multiple protocols and different linerates, can be challenging. The Xilinx con-nectivity kits address these challenges,leveraging proven, high-performancetransceiver technology available in theVirtex-6 family along with the industry’sfirst transceivers available in an FPGA opti-

ity to upgrade the features and functionalityfrom one kit to another. In most cases, you canfind information on upgrading kits on thedocumentation page for the specific evalua-tion or development kit in the Xilinx Boardsand Kits area at www.xilinx.com/kits. Forexample, if you have purchased the Spartan-6FPGA SP605 Evaluation Kit and wish toupgrade to the Spartan-6 FPGA EmbeddedKit, there’s a three-step process detailed at thebottom of the page at www.xilinx.com/products/boards/s6embd/reference_designs.htm.

Caught between the growing complexitiesof technical advances and shrinking designschedules, it’s becoming more and more vitalfor designers to find new ways to streamlinedevelopment. The primary driver behind thelaunch of Targeted Design Platforms was theneed to accelerate customers’ design cycles andenable them to spend less time developing theinfrastructure of an application and more timecreating their unique value in the design. Withthe broad portfolio of development kits, Xilinxis helping customers leap ahead in their devel-opment to focus on the portion of their appli-cation that creates a unique difference fromtheir competition.

mized for low power and cost, in Spartan-6FPGAs. The ISE Design Suite software inte-grates state-of-the-art serial connectivity toolsto allow implementation of PCIe® (gen1 andgen2), XAUI and Gigabit Ethernet bridgesthat have been optimized for maximum band-width using DMA. The kits support designsthat utilize both established as well as new seri-al protocols. This complete and comprehen-sive approach accelerates development forexperienced users and simplifies the adoptionof FPGAs for new users.

TARGETED REFERENCE DESIGNSXilinx has dramatically accelerated designerproductivity by including Targeted ReferenceDesigns with each of these kits. Tuned to thespecific domain, each of the TargetedReference Designs can be used as is or modi-fied and extended. These preverified designsoffer significant benefits to a wide range ofdesigners. For new users, the TargetedReference Designs reduce the learning curveby demonstrating FPGA implementations ofreal-world concepts. Experienced designerscan use them to jump-start their applicationdevelopment or to quickly evaluate platformarchitecture and capabilities.

Even without purchasing the specific baseplatform or domain-specific kit, users candownload documentation and referencedesigns. This allows both greater evaluation ofthe capabilities of the Xilinx kits and the abil-


XTRA, XTRA

Kits for Virtex-6 FPGAs Virtex-6 FPGA ML605 Evaluation Kit A full-featured, highly scalable development environment for high-performance applications

Virtex-6 FPGA Connectivity Kit Enables designers to build high-speed serial and other connectivity applications

Virtex-6 FPGA ML623 MGT Characterization Kit For comprehensive characterization of Virtex-6 FPGA high-speed serial transceivers

Virtex-6 FPGA Embedded Kit Gives hardware and software developers a basic platform for building high-performanceembedded applications

Virtex-6 FPGA DSP Kit Accelerates DSP development with reference designs for building digital signal-processing-based applications

Kits for Spartan-6 FPGAs Spartan-6 FPGA SP601 Evaluation Kit Entry-level environment for evaluating the Spartan-6 family

Spartan-6 FPGA SP605 Evaluation Kit Full-featured kit for developing low-cost applications requiring high-speed serial connectivity

Spartan-6 FPGA Embedded Kit Gives hardware and software developers a basic platform for building low-cost embedded applications

Spartan-6 FPGA DSP Kit Kick-starts development of DSP applications with reference designs for digital signal-processing-based applications

Spartan-6 FPGA Connectivity Kit Enables designers to build high-speed serial and other connectivity applications

Spartan-6 FPGA SP623 MGT Characterization Kit For comprehensive characterization of high-speed serial transceivers

Visit www.xilinx.com/kits to see more details on all these products as well as to learn aboutmarket-specific offerings such as the newSpartan-6 FPGA Industrial Video ProcessingKit. From time to time, Xilinx offers special promotional pricing on select boards and kits.

Visit www.xilinx.com/kits for technical details and ordering information.

EDA vendor Altium Ltd.’s advanced FPGA-based developmentboard, the Altium NanoBoard 3000, targets design groups using

FPGAs with on-board processors at the heart of their system designs.Based on a Xilinx® Spartan®-3AN FPGA, the board is a feature-packedprogrammable design environment that Altium provides to customerscomplete with hardware, software, ready-to-use royalty-free IP and adedicated Altium Designer Soft Design license—all for $395.

“We call it an instant FPGA prototyping environment,” said BobPotock, director of technical marketing at Altium (Sydney, Australia).“It targets the mainstream company that doesn’t have the luxury ofhaving a team of FPGA development specialists but has systemsengineers focused on developing new products. A growing number ofdesign teams want to make FPGAs central in their system designs andare adding processor cores to those FPGAs.”

With the NanoBoard 3000, Potock said, electronics designers canconstruct soft-processor-based systems inside the FPGA without anyprior FPGA design expertise. Nor do NanoBoard 3000 developersneed any VHDL or Verilog skills, he said, because they can use theirexisting board-layout and systems-design know-how to construct, testand implement FPGA-based embedded systems.

“Users can add processors, memory controllers, peripheral blocksand software stacks, and create a system without having to write HDLor even low-level driver code,” said Potock.

Potock said the key to the NanoBoard 3000 is tight integrationamong the development board, the Altium Designer tools and the IP.

In addition to a Spartan-3AN, the prototyping board contains anLCD panel with touchscreen capabilities, along with flash RAM, stat-ic RAM and dynamic RAM memory. It supports USB, SVGA,Ethernet, PS2 and MIDI, and also has programmable clocks and A/Dand D/A converters on board.

Meanwhile, Altium has added a number of graphical editors andscripts to Altium Designer, streamlining many design tasks betweenFPGA programming and PCB design. “Altium Designer is really adevelopment platform that incorporates printed-circuit-board design,FPGA design and embedded-software development, all integratedtogether,” said Potock.

The system works hierarchically. “If I were building a new product,the PCB would be the top-level project,” he said. “Below that would bethe FPGA project and inside that project would be the embedded hard-ware, such as the MicroBlaze™ processor, and then the software proj-ect running on that processor. These are all linked in Altium Designer.”

Specifically for the FPGA design step of the flow, Altium Designerincludes a schematic-entry, HDL entry or even C-to-HDL compiler

to program the FPGA. “We also have a technology called ‘open bus,’which creates a layer of abstraction so users can quickly integrate IPblocks that we provide,” said Potock. “They can be quickly down-loaded into the FPGA.”

Altium provides customers with a fairly extensive library ofroyalty-free IP. “You can construct your FPGA design with theseblocks and also use the vertical market cores provided by Xilinxto develop very specific systems,” said Potock.

Altium has also built functionality into Altium Designer thatallows the tool to control Xilinx’s ISE® Design Suite. “After a user cre-ates an FPGA design in schematics, HDL or C, Altium Designerwraps around ISE Design Suite so that when you say ‘program orcompile the design,’ it can call Xilinx’s synthesis tool or Altium’s ownsynthesis tool,” said Potock. “From a customer perspective, they don’tsee the Xilinx tools, they see Altium Designer. I think folks appreciatethat they only have to work in one environment for all these steps.”

The company also provides drivers and software developmentscripts for each of the IP blocks to facilitate embedded-softwaredevelopment. “We have something called Software PlatformBuilder that detects all the cores you have put into your FPGA,and it will literally build up a software stack based on your IPchoices,” said Potock.

For more information on the Altium NanoBoard 3000, visit http://altium.com/products/altium-designer/en/altium-designer_home.cfm.


Tools of XcellenceTools of XcellenceNews and the latest products from Xilinx partnersNews and the latest products from Xilinx partners

TOOLS OF XCELLENCE



Altium Facilitates FPGA-Based System Development with NanoBoard 3000

The Altium NanoBoard 3000 and associated tools aim tosimplify FPGA design for the masses.

TOOLS OF XCELLENCE


Defense electronics company 4DSP LLC (Reno, Nev.) has releasedeight new analog-to-digital and digital-to-analog boards based on

the FPGA Mezzanine Card (FMC) standard, along with a new versionof its flagship FM680 PMC-XMC mezzanine card equipped with aVirtex®-6 FPGA.

The latest 4DSP FMC boards are based on the new open industrystandard developed by a consortium of companies working throughthe ANSI/VITA organizations, as defined in the ANSI/VITA 57.12008 specification. FMC modules are designed to connect to FMC-compliant carrier cards in the CompactPCI, VPX or PCI Express®

form factors. Xilinx has given FMC a large role in its Targeted DesignPlatforms and the Virtex-6 and Spartan-6 FPGA associated develop-ment kits (see cover story, Xcell Journal, Issue 68).

Pierrick Vulliez, CTO at 4DSP, said the company has offered veryhigh-speed analog cards for many years. Now, with both companies’cards supporting FMC, the 4DSP boards can run optimally withXilinx’s new kits. “The cards range in performance from 60Megasamples per second to 5 Gsps,” said Vulliez. “FMC allows us tooffer our products to a much larger audience.”

The 4DSP FMC Series cards include a number of unique fea-tures, such as a user-selectable option to have data sampled by aninternal clock source (optionally locked to an external reference) orto use an externally supplied sample clock. A trigger input for cus-tomized sampling control is also available. I/O connections are onthe front panel as per VITA 57.1. Cascading multiple FMC boardsfor synchronized high channel count is possible.

Equipped with power supply and temperature monitoring, the4DSP Series FMC cards have several power-down modes to switchoff unused functions and reduce system-level power and heat. Vulliezsaid these features are well suited for software-defined radio (SDR)and similar applications requiring batteries or other low-powersources. The FMC cards are also well suited for man-pack applica-tions, ground mobile vehicles, UAVs and other airborne applicationswhere limited power sources affect mission range and on-station mis-sion time. The boards are also available with Mil-I-46058c-compli-ant conformal coating for use in hostile environments.

The products include:

• FMC103, a four-channel, 210-Msps FMC-LPC board with 12-bit A/D converter

• FMC104, a four-channel, 250-Msps FMC-LPC board with 14-bit ADC

• FMC107, an eight-channel, 65-Msps FMC-LPC board with 12-bit ADC

• FMC108, an eight-channel, 250-Msps FMC-HPC board with 14-bit ADC

• FMC110, which provides two channels of 12-bit A/D at 1 Gsps and two channels of 16-bit D/A at 1 Gsps, enablingsimultaneous sampling at a maximum rate of 1 Gsps

• FMC122, a single-channel 2.5-Gsps at 8 bits FMC-HPCADC board that can alternatively be configured as a dual-channel 1.25-Gsps at 8 bits converter board

• FMC125, a quad-channel, trimode 8-bit FMC-HPC ADCboard that can sample at 1.25, 2.5 or 5 Gsps

• FMC126, a quad-channel trimode 10-bit FMC-HPC ADCboard that samples at 1.25, 2.5 or 5 Gsps

In addition, the company has released the latest version of itsFM series of multifunction high-performance digital signalprocessor boards, the FM680. With the addition of the Virtex-6FPGA, the XMC and PMC industry-standard VITA 42.3-com-pliant module allows users to implement ever-more-complexalgorithms in their systems.

“It can be used for quite a few applications,” said Vulliez. “Aswith our previous-generation products, customers can use the newcard for software-defined radio, real-time processing, radar andsonar applications as well as video processing and compression.”

Customers can configure the FM680 from a variety of memo-ry options such as QDRII SRAM, DDR2 SDRAM and DDR3SDRAM. Optionally, customers can also populate the card’sunique user-configurable BLAST mounting sites with JPEG2000codecs, solid-state flash drives or even specific logic devices or cir-cuit designs.

For more information on the FMC cards, visit http://www.4dsp.com/fmc_list.php. For more information on the FM680, go tohttp://www.4dsp.com/FM680.php.

4DSP Fields Eight A/D and D/A FMC Boards, New FM680 PMC-XMC Mezzanine Card

The 4DSP analog-to-digital / digital-to-analog board line offers users a wide range of performance options.


Xpress Yourself in Our Caption Contest

XCLAMAT IONS!

NO PURCHASE NECESSARY. You must be 18 or older and a resident of the fifty United States, the District of Columbia or Canada (excluding Quebec) to enter. Entries must be entirely original and must bereceived by 5:00 pm Pacific Time (PT) on June 18, 2010. Official rules available online at www.xilinx.com/xcellcontest. Sponsored by Xilinx, Inc. 2100 Logic Drive, San Jose, CA 95124.

GET

TY IM

AG

ES

If you have a yen to Xercise your funny bone, here’s your opportunity. We invitereaders to step up to our verbal challenge and submit an engineering- or technology-related caption for this photograph showing a fraught moment during robot

games. The image might inspire a caption like “The robots had succeeded in bringingtheir creator to their level. Now it was time. So much for artificial intelligence….”

Send your entries to [email protected]. Include your name, job title, company affiliationand location, and acknowledge that you have read the contest rules (www.xilinx.com/xcellcontest). After due deliberation, we will print the submissions we like the best in thenext issue of Xcell Journal and award the winner the new Xilinx® SP601 Evaluation Kit,our entry-level development environment for evaluating the Spartan®-6 family of FPGAs(approximate retail value, $295; see http://www.xilinx.com/sp601). Runners-up will gainnotoriety, fame and a cool, Xilinx-branded gift from our SWAG closet.

The deadline for submitting entries is 5 pm Pacific Time (PT) on June 18, 2010. So, get writing!

MANOJ VISWAMBHARAN,technical manager, Acceleration

Technologies, at Credit Suisse won anSP601 Evaluation Kit with this caption for the photograph of the oversize Puccaand her weary friend that appeared in

Issue 70 of Xcell Journal.

Congratulations as well to our two runners-up:

“Pucca and the engineer each regretted theirchoice of partner for the blind date.”

— Paul Lindemann, consultant, Montage Marketing

“Anime dating driven by Virtex®-6 FPGApower enables a little too much reality...”

— Keith Felton, group director of product management,

Cadence Design Systems Inc.

“This is the last time I try Internet dating.”

GETTY IMAGES

PN 2452