66
www.xilinx.com/xcell/ SOLUTIONS FOR A PROGRAMMABLE WORLD Xcell journal Xcell journal ISSUE 83, SECOND QUARTER 2013 How OpenCV and Vivado HLS Accelerate Embedded Vision Apps How to Configure Your Zynq SoC Bare-Metal Solution Using Xilinx’s Power Estimator and Power Analyzer Tools Vivado Design Suite 2013.1 Available for Download All Eyes on Zynq SoC for Smarter Vision All Eyes on Zynq SoC for Smarter Vision Inside Xilinx’s High-Level Synthesis Tool page 32

Xcell Journal issue 83

Embed Size (px)

DESCRIPTION

The spring 2013 edition of Xcell Journal includes a cover story on how the Zynq All Programmable SoC is enabling customers to create Smarter Vision systems. The issue also includes a variety of fantastic methodology and how-to articles for Xilinx users of all skill levels.

Citation preview

Page 1: Xcell Journal issue 83

www.xilinx.com/xcell/

S O L U T I O N S F O R A P R O G R A M M A B L E W O R L D

Xcell journalXcell journalI S SUE 83 , S ECOND QUAR TER 2013

How OpenCV and Vivado HLS Accelerate Embedded Vision Apps

How to Configure Your Zynq SoC Bare-Metal Solution

Using Xilinx’s Power Estimator and Power Analyzer Tools

Vivado Design Suite 2013.1 Available for Download

All Eyes on Zynq SoC forSmarter Vision

All Eyes on Zynq SoC forSmarter Vision

Inside Xilinx’s High-Level Synthesis Tool

page32

Page 2: Xcell Journal issue 83

Avnet Electronics Marketing presents a new series of video-based SpeedWay

Design Workshops™, featuring the new Xilinx Zynq™-7000 All Programmable SoC

Architecture. This workshop series includes three online courses progressing

from introductory, through running a Linux operating system on the Zynq-7000

SoC, and finally to the integration of the high-speed analog signal chain with

digital signal processing from RF to baseband for wireless communications.

© Avnet, Inc. 2013. All rights reserved. AVNET is a registered trademark of Avnet, Inc. Xilinx® and Zynq™ are trademarks or registered trademarks of Xilinx®, Inc.

www.zedboard.org/trainings-and-videos

www.zedboard.org

Page 3: Xcell Journal issue 83
Page 4: Xcell Journal issue 83

L E T T E R F R O M T H E P U B L I S H E R

Xilinx, Inc.2100 Logic DriveSan Jose, CA 95124-3400Phone: 408-559-7778FAX: 408-879-4780www.xilinx.com/xcell/

© 2013 Xilinx, Inc. All rights reserved. XILINX, the Xilinx Logo, and other designated brands includedherein are trademarks of Xilinx, Inc. All other trade-marks are the property of their respective owners.

The articles, information, and other materials includedin this issue are provided solely for the convenience ofour readers. Xilinx makes no warranties, express,implied, statutory, or otherwise, and accepts no liabilitywith respect to any such articles, information, or othermaterials or their use, and any use thereof is solely atthe risk of the user. Any person or entity using suchinformation in any way releases and waives any claim itmight have against Xilinx for any loss, damage, orexpense caused thereby.

PUBLISHER Mike [email protected]

EDITOR Jacqueline Damian

ART DIRECTOR Scott Blair

DESIGN/PRODUCTION Teie, Gelwicks & Associates1-800-493-5551

ADVERTISING SALES Dan [email protected]

INTERNATIONAL Melissa Zhang, Asia [email protected]

Christelle Moraga, Europe/Middle East/[email protected]

Tomoko Suto, [email protected]

REPRINT ORDERS 1-800-493-5551

Xcell journal

www.xilinx.com/xcell/

Work Smarter With Xilinx All Programmable Solutions

I f you’ve checked out Xilinx’s home page lately, you’ve probably noticed that we haveplunged full force into a new campaign touting “Smarter” systems. Inspired by whatour customers have already been able to create with Xilinx devices, we are delivering

tools today to help you create the Smarter technologies that will shape tomorrow.In the last two months, Xilinx has rolled out our Smarter Networks initiative, and now,

as reflected in the cover story of this issue, our Smarter Vision program. In both cases,Xilinx not only offers devices that allow customers to create smarter systems, but the IP,tools and design environments to support those chips.

In this issue, DSP specialist Chris Dick writes about how the wireless carriers are rapidlygetting smarter in finding ways to increase the profitability of their next-generation networks.Gone are the days when carriers would just install a greater number of faster and more pow-erful basestations to increase bandwidth. Today they are adding intelligence and lookingtoward self-organizing network architectures to dynamically shift coverage to where it isneeded most—working smarter, not harder, and raising profits while doing so.

On another front, my cover story discusses a subject I find very fascinating: Smarter Vision.As you’ll read, the world of embedded vision has been growing by leaps and bounds, to thepoint where vision systems are becoming ubiquitous. Today you can find them in everythingfrom gaming consoles to the car you drive to factories and hospitals. Even produce distribu-tors are benefiting from embedded vision technology, which traditionally has paired FPGAswith standalone processors. But we at Xilinx think that with a silicon platform as powerful asthe Zynq™-7000 All Programmable SoC, you’ll be able to create much more capable embed-ded vision systems—Smarter Vision systems—using one chip instead of two or three.

To facilitate that effort, Xilinx is going the extra mile by offering the industry thebroadest software development environment support and proliferating high-level synthe-sis technology through the Vivado™ HLS tool in our Vivado Design Suite. System architectscan design vision algorithms in C/C++ and use Vivado HLS to create RTL versions. In doingso, they can see, for example, whether certain algorithms or parts of an algorithm wouldrun better in the Zynq SoC’s processor system or in its FPGA fabric.

In early April, Xilinx added even more Smarter Vision automation to the Vivado DesignSuite in release 2013.1. Designers now can use a tool called IP Integrator to deliver theindustry’s fastest time to integration. With this release of Vivado, we also announced ourOpenCV cores library to help vision architects design more productively. In the initialrelease, Xilinx has provided more than 30 commonly used algorithms from the OpenCVlibrary and created RTL versions that you can add to your design to get a jump on develop-ing Smarter Vision systems. You can even use them to evaluate hardware platforms byrunning the functions on the Zynq SoC vs. other ARM+DSP and ARM+GPU-based hardwareplatforms. Jose Alvarez and Fernando Vallina discuss the OpenCV advantage on page 24.

Finally, I’m happy to announce that an old friend from the trade press, Steve Leibson,the former editor-in-chief of the Microprocessor Report and EDN, recently joined Xilinxand has penned a comprehensive backgrounder on smarter wired and wireless networksand smarter data centers. Steve’s backgrounder goes into great depth on the trends in thesemarkets and how Xilinx is helping customers address them with our Smarter Networkingtechnology. You can download a PDF from http://www.xilinx.com/publications/

prod_mktg/smarter-networks-backgrounder.pdf.I hope this issue and associated resources will inspire you to take the next step and try

out the Zynq SoC, if you haven’t already.

Mike SantariniPublisher

Page 5: Xcell Journal issue 83

Interested in adding “published author” to your resume and achieving a greater level of credibilityand recognition in your peer community? Consider submitting an article for global publication inthe highly respected, award-winning Xcell Journal.

The Xcell team regularly guides new and experienced authors from idea development to publishedarticle with an editorial process that includes planning, copy editing, graphics development, andpage layout. Our author guidelines and article template provide ample direction to keep you ontrack and focused on how best to present your chosen topic, be it a new product,research breakthrough, or inventive solution to a common design challenge.

Our design staff can even help you turn artistic concepts into effective graphics orredraw graphics that need a professional polish.

We ensure the highest standards of technical accuracy by communicating withyou throughout the editorial process and allowing you to review your article tomake any necessary tweaks.

Submit final draft articles for publication in our Web-based Xcell Online orour digital and print Xcell Journal. We will assign an editor and a graphicsartist to work with you to make your papers clear, professional, and effective.

For more information on this exciting and highly rewarding opportunity, please contact:

Mike SantariniPublisher, Xcell [email protected]

Would you like to write for Xcell Publications? It’s easier than you think.

Get Published

www.xilinx.com/xcell/

Page 6: Xcell Journal issue 83

C O N T E N T S

VIEWPOINTS XCELLENCE BY DESIGN APPLICATION FEATURES

Cover StoryAll Eyes on Zynq SoC for Smarter Vision

88

Xcellence in Wireless Communications

Xilinx All Programmable Devices Enable Smarter Wireless Networks… 16

Xcellence in Embedded Vision

Using OpenCV and Vivado HLS toAccelerate Embedded Vision Applications in the Zynq SoC… 24

Letter From the PublisherWork Smarter with Xilinx All

Programmable Solutions… 4

1616

2424

Page 7: Xcell Journal issue 83

S E C O N D Q U A R T E R 2 0 1 3 , I S S U E 8 3

Xplanation: FPGA 101

Xilinx High-Level Synthesis Tool Speeds FPGA Design… 32

Xplanation: FPGA 101

How to Configure Your Zynq SoC Bare-Metal Solution… 40

Xplanation: FPGA 101

Using Xilinx’s Power Estimator and Power Analyzer Tools… 46

THE XILINX XPERIENCE FEATURES

Excellence in Magazine & Journal Writing2010, 2011

Excellence in Magazine & Journal Design and Layout2010, 2011, 2012

Tools of Xcellence Tiny GPS disciplined oscillator

will tell you what time it is… 52

Xamples... A mix of new and

popular application notes… 56

Xpedite... Latest and greatest from

the Xilinx Alliance Program partners… 60

Xtra, Xtra The latest Xilinx tool updates,

as of April 2013… 62

Xclamations! Share your wit and wisdom

by supplying a caption for our techy cartoon;

winner nabs an Avnet Zedboard… 64

XTRA READING

3232

5252

4040

Page 8: Xcell Journal issue 83

COVER STORY

8 Xcell Journal Second Quarter 2013

All Eyes on Zynq SoC for Smarter Vision

All Eyes on Zynq SoC for Smarter Vision by Mike SantariniPublisher, Xcell JournalXilinx, Inc. [email protected]

Page 9: Xcell Journal issue 83

Second Quarter 2013 Xcell Journal 9

C O V E R S T O R Y

If you have seen a demonstration of Audi’sAutomated Parking technology in which thecar autonomously finds a parking spot andparks itself without a driver—or if you haveplayed an Xbox 360 game with its Kinectcontroller or even just bitten into a flawlesspiece of fruit from your local grocerystore—then you can count yourself as aneyewitness to the dawning of the era ofsmarter vision systems. All manner of prod-ucts, from the most sophisticated electronicsystems down to the humble apple, areaffected by smarter vision technologies. Andwhile today’s systems are impressiveenough, some experts predict that in 10years’ time, a vast majority of electronicssystems—from automotive to factoryautomation, medical, as well as surveillance,consumer, aerospace and defense—willinclude smarter vision technologies witheven more remarkable capabilities.

As smarter vision systems increase incomplexity, we’ll very likely become passen-gers in autonomous automobiles flowing innetworked highways. Medical equipmentsuch as Intuitive Surgical’s amazing robotic-assisted surgical system will advance evenfurther and may enable surgeons to performprocedures from remote locations.Television and telepresence will reach newlevels of immersion and interactivity, whilethe content on screens in theaters, homesand stores will cater to each individual con-sumer’s interests, even our moods.

IThe Zynq All ProgrammableSoC, in tandem with newXilinx tools and IP, forms the foundation for the nextgeneration of embeddedvision products.

Page 10: Xcell Journal issue 83

Xilinx® All Programmable solutionsfor Smarter Vision are at the forefront ofthis revolution. With the Zynq™-7000All Programmable SoC—the first deviceto marry an ARM® dual-core Cortex™-

A9 MPCore™, programmable logic andkey peripherals on a single chip—as thefoundation, Xilinx has fielded a support-ing infrastructure of tools and IP thatwill play a pivotal role in enablingthe development and faster deliveryof these innovations in vision. Thesupporting infrastructure includesVivado™ HLS (high-level synthesis), thenew IP Integrator tools, OpenCV (com-puter vision) libraries, SmartCORE™ IPand specialized development kits.

“Through Xilinx’s All ProgrammableSmarter Vision technologies, we areenabling our customers to pioneer thenext generation of smarter vision sys-tems,” said Steve Glaser, senior vicepresident of corporate strategy andmarketing at Xilinx. “Over the lastdecade, customers have leveraged ourFPGAs to speed up functions thatwouldn’t run fast enough in theprocessors they were using in theirsystems. With the Zynq-7000 AllProgrammable SoC, the processorand FPGA logic are on the same chip,which means developers now have asilicon platform ideally suited forsmarter vision applications.”

In support of the device, said Glaser,“We’ve complemented the Zynq-7000All Programmable SoC with a robustdevelopment environment consistingof Vivado HLS, new IP Integrator tools,OpenCV libraries, SmartCORE IP anddevelopment kits. With these SmarterVision technologies, our customerswill get a jump on their next design and

be able to achieve new levels of effi-ciency, lower system power, increasesystem performance and drasticallyreduce the bill of materials—enrichingand even saving lives while increasing

profitability as these innovations rollout at an ever faster pace.”

FROM DUMB CAMERAS TO SMARTER VISION At the root of Smarter Vision systemsis embedded vision. As defined bythe rapidly growing industry groupthe Embedded Vision Alliance( w w w. e m b e d d e d - v i s i o n . c o m ) ,embedded vision is the merging oftwo technologies: embedded systems(any electronic system other than acomputer that uses a processor) andcomputer vision (also sometimesreferred to as machine vision).

Jeff Bier, founder of the EmbeddedVision Alliance and CEO of consultingfirm BDTI, said embedded vision tech-nology has had a tremendous impact onseveral industries as the discipline hasevolved beyond motorized pan-tilt-zoomanalog camera-based systems. “We haveall been living in the digital age for sometime now, and we have seen embeddedvision rapidly evolve from early digitalsystems that excelled in compressing,storing or enhancing the appearance ofwhat cameras are looking at into today’ssmarter embedded vision systems thatare now able to know what they arelooking at,” said Bier.

Cutting-edge embedded vision sys-tems not only enhance and analyzeimages, but also trigger actions basedon those analyses. As such, the amountof processing and compute power, andthe sophistication of the algorithms,

have spiked dramatically. A case inpoint is the rapidly advancing marketof surveillance.

Twenty years ago, surveillance sys-tems vendors were in a race to providethe best lenses enhanced by mechani-cal systems that performed autofocusand tilting for a clearer and wider fieldof view. These systems were essential-ly analog video cameras connected viacoaxial cables to analog monitors, cou-pled with video-recording devicesmonitored by security guards. The clar-ity, reliability and thus effectiveness ofthese systems were only as good as thequality of the optics and lenses, and thediligence of the security guards in mon-itoring what the cameras displayed.

With embedded vision technology,surveillance equipment companiesbegan to use lower-cost cameras basedon digital technology. This digital pro-cessing gave their systems extraordi-nary features that outclassed andunderpriced analog and lens-basedsecurity systems. Fisheye lenses andembedded processing systems withvarious vision-centric algorithms dra-matically enhanced the image the cam-era was producing. Techniques thatcorrect for lighting conditions,improve focus, enhance color and digi-tally zoom in on areas of interest alsoeliminated the need for mechanicalmotor control to perform pan, tilt andzoom, improving system reliability.Digital signal processing has enabledvideo resolution of 1080p and higher.

But a clearer image that can bemanipulated through digital signal pro-cessing was just the beginning. Withconsiderably more advanced pixel pro-cessing, surveillance system manufac-turers began to create more sophisticat-ed embedded vision systems that per-formed analytics in real time on thehigh-quality images their digital systemswere capturing. The earliest of theseembedded vision systems had thecapacity to detect particular colors,shapes and movement. This capabilityrapidly advanced to algorithms thatdetect whether something has crossed a

10 Xcell Journal Second Quarter 2013

C O V E R S T O R Y

Cutting-edge vision systems enhance andanalyze images, but also trigger actions

based on those analyses. As such, the needfor compute power has spiked dramatically.

Page 11: Xcell Journal issue 83

virtual fence in a camera’s field of view;determine if the object in the image is infact a human; and, through links to data-bases, even identify individuals.

SUSPICIOUS BEHAVIORThe most advanced surveillance sys-tems include analytics that track indi-viduals of interest as they movethrough the field of view of the securitynetwork, even as they leave the field ofview of one camera, move into a blindspot and then enter into the field ofview of another camera in the surveil-lance network. Vision designers haveprogrammed some of these systems toeven detect unusual or suspiciousmovements. “Analytics is the biggesttrend in the surveillance market today,”said Mark Timmons, system architectin Xilinx’s Industrial, Scientific andMedical (ISM) group. “It can accountfor human error and even take awaythe need for diligent human viewingand decision making. As you can imag-ine, surveillance in crowded environ-ments such as train stations and sport-ing events can become extremely diffi-cult, so having analytics that can spotdangerous overcrowding conditions ortrack individuals displaying suspiciousbehavior, perhaps radical movements,is very advantageous.”

To further enhance this analysisand increase the effectiveness of thesesystems, surveillance and many othermarkets leveraging smarter vision areincreasingly using “fusion” architec-tures that combine cameras with othersensing technologies such as thermalvision, radar, sonar and LIDAR(Light/Laser Detection and Ranging).In this way, the systems can enablenight vision; detect thermal/heat signa-tures; or pick up objects not capturedby or visible to the camera alone. Thiscapability drastically reduces falsedetections and in turn allows for muchmore precise analytics. Needless tosay, the added complexity of fusing thetechnologies and then analyzing thatdata requires ever more analytic-pro-cessing horsepower.

Timmons said that another mega-trend in this market is products that per-form all these forms of complex analysis“at the edge” of a surveillance systemnetwork—that is, within each camera—rather than having each camera transmitits data to a central mainframe system,which then performs a more refinedanalysis from these multiple feeds.Localized analytics adds resilience tothe overall security system, makes eachpoint in the system much faster andmore accurate in detection, and thus canwarn security operators sooner if indeeda camera spots a valid threat.

Localized analytics means that eachunit not only requires greater process-ing horsepower to enhance and ana-lyze what it is seeing, but must also becompact and yet incorporate highlyintegrated electronics. And becauseeach unit must be able to communicatereliably with the rest of the network, itmust also integrate electronic commu-nication capabilities, adding furthercompute complexity. Increasingly,these surveillance units are connectedvia a wireless network as part of a larg-er surveillance system. And increasing-ly, these surveillance systems arebecoming part of larger enterprise net-works or even larger, global networks,like the U.S. military’s GlobalInformation Grid (see cover story, Xcell

Journal issue 69, www.xilinx.com/pub-

lications/ archives/xcell/Xcell69.pdf). This high degree of sophistication is

being employed in the military-and-defense market in everything from footsoldier helmets to defense satellitesnetworked to central command cen-ters. What’s perhaps more remarkableis how fast smarter vision technology ismoving into other markets to enhancequality of life and safety.

SMARTER VISION FOR THE PERFECT APPLE Take, for example, an apple. Ever won-der how an apple makes it to your gro-cery store in such good condition?Giulio Corradi, an architect in Xilinx’sISM group, said that food companies are

using ever-smarter vision systems infood inspection lines to, for example,sort the bad apples from the good ones.Corradi said first-generation embeddedvision systems deployed on high-speedfood inspection lines typically used acamera or perhaps several cameras tospot surface defects in apples or otherproduce. If the embedded vision systemspotted an unusual color, the applewould be marked/sorted for furtherinspection or thrown away.

BENEATH THE SKINBut what happens if, at some pointbefore that, the fruit was dropped but thedamage wasn’t visible? “In some cases,damage that resulted from a drop maynot be easily spotted by a camera, letalone by the human eye,” said Corradi.“The damage may actually be in the fleshof the apple. So some smarter vision sys-tems fuse an infrared sensor with thecameras to detect the damage beneaththe surface of the apple’s skin. Finding abruised fruit triggers a mechanical sorterto pull the apple off the line before it getspacked for the grocery store.” If the dam-aged apple had passed by without thesmarter fusion vision system, the dam-age would likely become apparent by thetime it was displayed on the grocerystore shelves; the fruit would probablyhave to be thrown away. One rottenapple can, of course, spoil the bunch.

Analytics can also help a food com-pany determine if the bruised apple is ingood enough condition to divert to anew line, in which another smartervision system can tell if it is suitable forsome other purpose—to make apple-sauce, dried fruit or, if it is too far gone,best suited for composting.

Factory floors are another site forsmarter vision, Corradi said. A growingnumber use robotic-assisted technolo-gies or completely automated roboticlines that manufacturers can retool fordifferent tasks. The traditional safetycages around the robots are too restric-tive (or too small) to accommodate therange of movement required to manu-facture changing product lines.

Second Quarter 2013 Xcell Journal 11

C O V E R S T O R Y

Page 12: Xcell Journal issue 83

So to protect workers while notrestricting the range of motion of auto-mated factory lines, companies areemploying smarter vision to createsafety systems. Cameras and laserserect “virtual fences or barriers” thataudibly warn workers (and safety mon-itor personnel) if someone is gettingtoo close to the factory line given theproduct being manufactured. Someinstallations include a multiphase vir-tual barrier system that will send anaudible warning as someone crossesan outer barrier, and shut down theentire line automatically if the individ-ual crosses a second barrier that iscloser to the robot, preventing injury.Bier of the Embedded Vision Alliancenotes that this type of virtual barriertechnology has wide applicability. “Itcan have a tremendous impact inreducing the number of accidents infactories, but why not also have virtualbarriers in amusement parks, or at ourhomes around swimming pools or oncars?” said Bier. “I think we’ll see a lotmore virtual barrier systems in ourdaily lives very soon.”

SMARTER VISION FOR BETTER DRIVING Automotive is another market that isfully embracing smarter vision to cre-ate a less stressful and safer drivingexperience. Paul Zoratti, a systemsarchitect within Xilinx Automotive,said that advanced driver assistancesystems (ADAS) are all about usingremote sensing technologies, includ-ing smarter vision, to assist drivers(see cover story, Xcell Journal issue66, www.xilinx.com/publications/

archives/xcell/Xcell66. pdf). Each year over the past decade,

automakers have unveiled ever-moreimpressive DA features in their luxury

lines, while also making a growingnumber of driver assistance featuresavailable in their sport and standardproduct lines. Many of these fea-tures—such as blind-spot detection,lane change assist, pedestrian and signdetection—send drivers a warning ifthey sense a potentially dangerous sit-uation. Recent offerings from automanufacturers include even moreadvanced systems such as automaticemergency braking and lane keeping,which not only monitor the vehicleenvironment for potential problemsbut assist the driver in taking correc-tive actions to avoid accidents ordecrease their severity.

Zoratti said that some new-modelcars today are outfitted with fourcameras—located on the sides, front,and rear of the vehicle—that offer acontinuous 360-degree view of thevehicle’s surroundings. While thefirst-generation surround-view sys-tems are using those cameras to pro-vide an image to the driver, futuresystems will bundle additional DAfeatures. Using the same four cam-eras and image-processing analytics,these next-generation systems willsimultaneously generate a bird’s eyeview of the vehicle and also warn ofpotential danger such as the presenceof a pedestrian. Furthermore, whilethe vehicle is traveling at higherspeeds, the automobile will use thecameras on the side and rear of thevehicle for blind-spot detection, lanechange assistance and lane departurewarning. Adding another, forward-looking camera behind the wind-shield will support traffic sign recog-nition and forward-collision warningfeatures. Finally, when the driverreaches his or her destination andactivates automated parking, the sys-

tem will employ those same camerasalong with other sensors to help thecar semi-autonomously maneuverinto a parking spot.

Handling all these tasks in real timerequires a tremendous amount of pro-cessing power that is well-suited forparallel hardware computation,Zoratti said. That’s why many early DAsystems paired standalone micro-processors with FPGAs, with theFPGA handling most of the parallelcomputations and microprocessorsperforming the serial decision making.

The cost pressures in automotive aredriving the analytics to be performed ina central compute hub, rather than ateach camera as is the case in other mar-kets, such as surveillance. In this way,carmakers can minimize the cost ofeach camera sensor and, ultimately, thecost of the entire system. That meansthe processing platform in the centralunit needs to deliver very high perform-ance and bandwidth to support thesimultaneous processing of four, five oreven six real-time video inputs.

A SMARTER VISION FOR LONGER LIVES Another area where smarter vision ismaking a dramatic difference is in themedical electronics industry, whichuses smarter vision technology in awide range of medical imaging systems,from endoscopes and imaging scanners(CT, MRI, etc.) to robotic-surgical sys-tems such as Intuitive Surgical’s DaVinci, detailed in Xcell Journal issue 77(see www.xilinx.com/publications/

archives/xcell/Xcell77.pdf). Da Vinci’s sophisticated 3D vision

system allows surgeons to guide robot-ic surgical instruments with extremeprecision, fluidity and tactile sensitivityto perform a number of delicate, intri-

C O V E R S T O R Y

12 Xcell Journal Second Quarter 2013

Virtual barriers can reduce factory accidents, but 'why not also have virtual barriers in amusement parks,

or at our homes, around swimming pools or on cars?'

Page 13: Xcell Journal issue 83

cate surgical procedures. With everygeneration of system, surgeons are ableto perform a greater number and vari-ety of surgeries, helping to ensure bet-ter patient outcomes and faster recov-ery times. The degree of technologicalsophistication to control and coordi-nate these procedures is remarkableand is heavily reliant on the combinedhorsepower of processing and logic.Each generation of newer technologywill thus benefit from greater integra-tion between processor and logic.

A SMARTER VISION FOR AN IMMERSIVE EXPERIENCE Smarter vision is also making greatstrides in keeping us connected. If youwork in a modern office building,chances are your company has at leastone conference room with anadvanced telepresence conferencingsystem that not only allows you to talkto others around the world, but alsolets you see them as though they werethere in person. These videoconferenc-ing systems are increasing in sophisti-cation to the point that they can sensewho at a table or conference is speak-ing, and automatically turn and zoomin on that person, displaying him or herin ever-higher-quality immersive video.

Ben Runyan, director of the broad-cast and consumer segment marketingat Xilinx, said that companies develop-ing telepresence technologies are seek-ing ways to create a more immersiveexperience for users. “The goal is tomake users feel like they are in thesame room when in fact they could beon the other side of the globe,” saidRunyan. “To do this requires state-of-the-art cameras and display technolo-gies, which require advanced imageprocessing. As these technologies

become more advanced and deliver amore immersive experience, it willmake collaboration easier and makecompanies more productive while cut-ting down the need, and thus expense,to travel.”

XILINX: ALL-PROGRAMMABLEFOR SMARTER VISION To enable smarter vision to progresson all fronts rapidly and to reachnew markets requires an extremelyflexible processing platform, a richset of resources and a viable ecosys-tem dedicated to smarter vision.Xilinx devices have played a keyrole in helping companies innovatethese vision systems over the last

decade. Today, after five years indevelopment, Xilinx is delivering aholistic solution that will helpdevelopers of smarter vision appli-cations to quickly deliver the nextgeneration of innovations.

For more than a decade, embeddedvision designers have leveraged theprogrammability, parallel computingand fast I/O capabilities of XilinxFPGAs in a vast number of embeddedvision systems. Traditionally, designershave used FPGAs to speed up func-

tions that were slowing down the mainprocessor in their systems, or wereusing the FPGA to run parallel comput-ing tasks that processors simply couldnot perform. Now, with the Zynq-7000All Programmable SoC, embeddedvision developers have a fully program-mable device that is ideally suited fordeveloping the next generation ofsmarter vision applications.

“Smarter vision can be implement-ed in separate processors and FPGAscommunicating on the same board,but what the Zynq SoC delivers is alevel of integration that the electronicsindustry didn’t have before,” said JoseAlvarez, engineering director of videotechnology at Xilinx. “Now, instead of

Second Quarter 2013 Xcell Journal 13

C O V E R S T O R Y

Microprocessor + DSP + FPGA Preprocessor

Micro Frame Processingand Communications

Processing System (PS)Features A, B, C, D, Application SW

DSP SerialElement Processing

FPGAFPGA Parallel

Pixel Processing

Features A Features B Features C Features D

Camera

Camera

Camera

Camera

Camera

Camera

Camera

Camera

Object Classification Motion Estimation

Image Captureand Transform

Image Warpand Stitch

Object Detection

Zynq SoC

Programmable Logic (PL) HW Acceleration

Solution Benefits

System Integration3 chips to 1 chip

System Performance+ 2x

BOM Cost-25%

Total Power-50%

Xilinx Zynq-7000 All Programmable SoC Solution

Figure 1 – Zynq All Programmable SoC vs. multichip, multiple-camera systems in driver assistance applications

Page 14: Xcell Journal issue 83

interchanging information betweenthe main intelligent processor andFPGA logic at board speeds, we can doit at silicon speeds through 3,000 high-performance connections between theprocessor and logic on the same chip.”

Figure 1 reveals the benefits of theZynq SoC over a traditional multicam-era, multichip architecture in the cre-ation of a multifeature automotivedriver assistance system. Using oneset of cameras connected to one ZynqSoC, the Xilinx architecture (bottomleft in the graphic) can enable featurebundles such as blind-spot detection,360-degree surround view, lane depar-ture warning and pedestrian detection.In comparison, existing multifeatureDA systems require multiple chips andmultiple cameras, which complicatesintegration, adversely affects perform-ance and system power consumption,and leads to higher BOM costs.

A few silicon vendors offer ASSPsthat pair ARM processors with DSPsor with GPUs, but those devices tendto be too rigid or provide insufficientcompute performance for many oftoday’s smarter vision applications.Often, solutions based on thesedevices require the addition of stand-alone FPGAs to fix these deficiencies.

PROGRAMMABILITY AND PERFORMANCEThe Zynq SoC’s programmability andperformance provide key advantagesover GPU- and DSP-centric SoCs. TheARM processing system is softwareprogrammable; the FPGA logic is pro-grammable via HDLs or C++; and eventhe I/O is fully programmable. As aresult, customers can create extremelyhigh-performance smarter vision sys-tems suited to their specific applica-tions, and differentiate their systemsfrom those offered by competitors.

Figure 2 details a generic signal flowof a smarter vision system and showshow the Zynq SoC stacks up againstARM-plus-DSP and ARM-plus-GPU-based ASSPs.

The first signal-processing block inthe flow (shown in green) is the inputthat connects the device to a camerasensor. In the Zynq SoC, developerscan accommodate a wide range of I/Osignals to conform to whatever cam-era connectivity their customersrequire. The next signal-processingblock performs pixel-level processingor video processing (depending onwhether the application is for imageprocessing or display). The next blockperforms analytics on the image, a

compute-intensive process that oftenrequires parallel computing bestimplemented in FPGA logic. The sub-sequent three blocks (in red) arewhere the processing system derivesmetadata results from the analytics,creates a graphic representation of theresult (the graphics step) and encodesresults for transmission.

In the Zynq SoC, the processing sub-system and FPGA logic work together.When compression is required, theappropriate codec can be readilyimplemented in FPGA logic. Then, inthe final signal-processing block(labeled “Output”), the Zynq SoC’s pro-grammable I/O allows developers totarget a vast number of communica-tion protocols and video transportstandards, whether they be propri-etary, market specific or industry-stan-dard IP protocols. In comparison, inboth DSP- and GPU-centric SoCs,developers run the risk of developingalgorithms that require performancenot achievable with the DSP or GPUsections of these ASSPs. They willoften have to make up for this deficien-cy by adding a standalone FPGA totheir systems.

While the Zynq SoC is clearly thebest silicon choice for smarter vision

C O V E R S T O R Y

14 Xcell Journal Second Quarter 2013

Input

4K2K

MIPI

Output

SDIHDMI

DisplayPort

ImageProcessing IP

VideoProcessing IP

Codec

H.265/HEVC

H.264

VideoAnalytics

Metadata

VideoAnalyticsAnalysis

Graphics

ComputeRequirements I/O I/O240 Gops 50-100 Gops 1-10 Gops

Dedicated50-200 Gops

Zynq SoC

DSP + ARM

GPU + ARM

ProgrammableI/O

ProgrammableFPGA

ProgrammableFPGA

FixedI/O

FixedIP

FixedI/O

LimitedFixed I/O

NotPossible

LimitedFixed I/O ARM

ARM +FPGA

DSPARM

GPUIP

GPUIP

ProgrammableFPGA

ProgrammableI/O

FixedI/O

FixedI/O

Figure 2 – Generic video- and image-processing system flow

Page 15: Xcell Journal issue 83

systems, Xilinx realized early in thedevice’s development that program-ming needed to be streamlined,especially for designers who aremore accustomed to C and C++based development of vision algo-rithms. To this end, in June 2012Xilinx delivered to customers astate-of-the-art software environ-ment called the Vivado Design Suitethat includes, among other tech-nologies, best-in-class high-levelsynthesis technology that the com-pany gained in its January 2011acquisition of AutoESL. Vivado HLSis particularly well-suited to embed-ded vision applications. If, forexample, vision developers usingthe Zynq SoC have created an algo-rithm in C or C++ that doesn’t runfast enough or is overburdening theprocessing system, they can sendtheir C algorithms to Vivado HLSand synthesize the algorithms intoVerilog or VHDL to run in the FPGAlogic on the device. This frees upthe processing subsystem on theZynq SoC to handle tasks it is bettersuited to run, and thus speeds upthe overall system performance.

OPENCV LIBRARYXilinx has also rounded out itsSmarter Vision technology offeringby releasing its OpenCV (computervision) library. OpenCV is an indus-try-standard, open-source library ofalgorithms from OpenCV.org thatembedded vision developers use toquickly create vision systems.Embedded vision developers acrossthe world actively contribute newalgorithms to the library, which nowcontains more than 2,500 algorithmswritten in C, C++, Java and Python(see OpenCV story, page 24).Algorithms in the library range incomplexity from simple functionssuch as image filters to moreadvanced functions for analyticssuch as motion detection.

Alvarez said that these OpenCValgorithms target implementation in

just about any commercial micro-processor and DSP. Because the ZynqSoC uses an ARM processing system,users can implement these algorithms,written in C++, in its processor portion.

Thanks to Vivado HLS, said Alvarez,users can also take these algorithmswritten in C or C++, modify functioncalls from OpenCV to HLS and then,using Vivado HLS, synthesize or com-pile the algorithms into RTL code opti-mized for implementation in the logicportion of the Zynq-7000 SoC. HavingOpenCV in the Vivado environmentallows smarter vision architects toeasily compare and contrast whether agiven algorithm in their design will runmost optimally in the processor orFPGA logic portion of the Zynq-7000All Programmable SoC. With therelease of Xilinx’s Open Source library,Xilinx has essentially given customersa head start. Using Vivado HLS, Xilinxhas already compiled more than 30 ofthe most used embedded vision algo-rithms from the OpenCV library.Customers can quickly make proces-sor vs. logic trade-offs at the systemslevel and run them immediately in theZynq-7000 All Programmable SoC toderive the optimal system for theirgiven application.

Xilinx and its Alliance members willactively migrate more functions fromthe OpenCV library on an ongoingbasis, making them available to Xilinx’suser base quarterly. Because develop-ers can run OpenCV libraries on justabout any commercial processor,vision designers will be able to com-pare and even benchmark the perform-ance of algorithms running on varioussilicon devices.

As part of its Smarter Vision initia-tive, Xilinx has also created an intellec-tual property (IP) suite calledSmartCORE IP, addressing smartervision requirements from across themany market segments that will designsmarter vision into their next-genera-tion products. Customers can imple-ment cores from the SmartCORE IPsuite and algorithms from the OpenCV

library into their designs quickly usingXilinx’s newly introduced IP Integratortool. The new tool is a modern plug-and-play IP environment that allowsusers to work in schematics or, if theyprefer, a command-line environment.

TARGETED PLATFORM AWAREAlvarez said that since the VivadoDesign Suite’s inception, Xilinx archi-tected the suite to be device aware, soas to take full advantage of eachdevice’s capabilities. Alvarez said thatthanks to IP Integrator, the VivadoDesign suite is not only device awarebut now targeted platform aware aswell—supporting all Zynq SoC and 7series FPGA boards and kits. Being tar-get platform aware means that theVivado Design Suite will configure andapply board-specific design rulechecks, which ensures rapid bring-upof working systems.

For example, when a designerselects the Xilinx Zynq-7000 SoCVideo and Imaging Kit, and instanti-ates a Zynq SoC processing systemwithin IP Integrator, Vivado DesignSuite preconfigures the processingsystem with the correct peripherals,drivers and memory map to supportthe board. Embedded design teamscan now more rapidly identify, reuseand integrate both software and hard-ware IP, targeting the dual-core ARMprocessing system and high-perform-ance FPGA logic.

Users specify the interface betweenthe processing system and their logicwith a series of dialog boxes. IPIntegrator then automatically gener-ates RTL and optimizes it for perform-ance or area. Then users can add theirown custom logic or use the Vivado IPcatalog to complete their designs.

It’s remarkable to see what smartervision systems Xilinx customers havecreated to date with Xilinx FPGAs.The advent of the Zynq-7000 AllProgrammable SoC and the powerfulSmarter Vision environment guaran-tees that the next crop of productswill be even more amazing.

Second Quarter 2013 Xcell Journal 15

C O V E R S T O R Y

Page 16: Xcell Journal issue 83

16 Xcell Journal Second Quarter 2013

XCELLENCE IN WIRELESS COMMUNICATIONS

Xilinx All Programmable DevicesEnable Smarter Wireless Networks

Page 17: Xcell Journal issue 83

Second Quarter 2013 Xcell Journal 17

Wireless carriers are urgently looking forways to capitalize on the explosivegrowth in mobile Internet usage. Butinstead of simply deploying more and

faster equipment, the carriers are seeking smarter waysto use their networks. They are actively developingnovel architectures such as self-organizing networks todeliver superior quality of service to customers and tomaximize profitability.

Xilinx® is helping wireless-network companies pio-neer these new architectures and deliver them to marketwith 28-nanometer All Programmable devices andSmarter Wireless Networking solutions and supportinginfrastructure, including SmartCORE™ IP, the Vivado™Design Suite and services expertise.

EXPLOSIVE GROWTH OF MOBILE IPThe last four years have seen explosive growth in mobileInternet Protocol usage. The International Telecom-munications Union’s ICT statistics show that in the peri-od from 2000 to 2010, the number of mobile cellular sub-scriptions increased by eightfold, a rate that far exceedsthe growth in the number of Internet users over the sameperiod. The dominance of mobility stands in stark con-trast to trends for fixed telephone line subscriptions,which continue to decline. As of 2010 there were fourtimes more mobile subscriptions than fixed-line. Theintroduction of smartphone and tablet technology hasfundamentally changed the way we do business, spendour leisure time and interact with our family and friends.

There is a symbiotic connection between the capa-bilities and the business models enabled by smart-phones and tablets, and the cellular network that is theanchor point to the Internet: Each feeds off the other.As smartphones and tablets become more capable,people figure out new ways to exploit their capabili-ties, stressing network capacity. Marketplace dynamicsengage, and network operators respond with networkupgrades. In turn, mobile equipment companies intro-duce new devices that consume the new capacity, andso the cycle continues. Today, analysts paint a pictureof a network in 2020 needing to support a capacity thatis 1,000 times greater than today’s.

In the four years since the smartphone moved into themainstream, we have seen screen pixel density doublefrom 163 pixels per square inch to 326 PPI today. At thesame time, storage has doubled from 32 to 64 gigabytes,and rear-facing camera capability has progressed from 3-megapixel photos and VGA video to 8-megapixel camerassupporting 1024p HD video at 30 frames/second. Mobiledevice manufacturers now offer front-facing cameras withapplications like FaceTime, which enable mobile business

X C E L L E N C E I N W I R E L E S S C O M M U N I C AT I O N S

by Chris DickChief DSP ArchitectXilinx, Inc. [email protected]

With the meteoric growth ofmobile IP, carriers need flexiblesilicon and design tools tobuild the high-performance heterogeneous network architectures of the future.

Page 18: Xcell Journal issue 83

videoconferencing and permit themobile user, say, on overseas businesstravel to stay connected with home.

From 2009 to now, there has beenan astonishing six generations ofsmartphone. In the three short yearssince the introduction of the first com-mercially successful tablet, we haveseen five generations of this technolo-gy, taking us from 720p video capabili-ty only two years ago to 1024p HDvideo on a typical rear-facing cameratoday. With in excess of 1 billionsmartphones on the planet, and withmore than half of the population ofNorth America owning a smartphone,feeding all of these pixels with media-rich content and delivering image andvideo data from the mobile back to thecloud, the upshot has been a networkcapacity and latency challenge. Andwhile it took 16 years to reach 1 billionsmartphones worldwide, it is estimat-ed that it will take only three years forthe next billion smartphone users tocome onboard. The capacity challengewill rise exponentially.

As people have invented new serv-ices and applications that increasinglycapable smartphones and tablets cansupport, the network has had toevolve and supply increased capacityand coverage, new levels of quality-of-experience and reduced latency toenhance the mobile experience foranything from videoconferencing andvideo streaming to media contentdownloading to the local device.Before boarding a long-haul flight, forexample, you’ll want to download thelatest edition of Xcell Journal or IEEE

Communications Magazine, a news-paper or two, other magazines, techni-cal reports from the office or multiplevideos to view later, at a more conven-ient time. The growth in online and

interactive gaming and financial trans-actions has also made demands oncapacity and latency.

Advanced socio-technical systems,such as Facebook for example, togeth-er with increasingly capable mobiledevices have become a central part ofdaily life for billions of people world-wide. Social media users are activelyusing mobile applications more thanever before. A recent study by mediaagency Ruder Finn pointed out that 91percent of the mobile subscribersengage in social computing applica-tions, compared with 79 percent ofdesktop users. People in the UnitedStates, on the average, spend 2.7 hoursper day on mobile devices, of which 45percent post comments, 43 percentconnect with friends, 40 percent sharecontent with others and 38 percentshare photos on social networkingwebsites, making it an increasinglyfavorable platform for socializing.

One consequence of the changingnature of how people exploit mobilityis that data traffic is now significantlygreater than voice traffic on the wire-less network. Some industry reportspredict that the global mobile datagrowth will scale by approximately afactor of 40 in the coming years, goingfrom 90 petabytes per month in 2009to in excess of 3.5 exabytes per monthin 2014.

STRUGGLING TO SCALE WITHBANDWIDTH DEMANDSIn response to these demands, and tothe invention of new mobile applica-tions and associated business models,wireless networks have made dramat-ic capacity improvements. Over thepast three decades, since the concep-tion of the 2G GSM system, the indus-try has achieved bit-rate improve-

ments in excess of three orders ofmagnitude, corresponding to an order-of-magnitude throughput improve-ment for each of the past threedecades. Clearly, the increasing smart-phone and tablet computer populationhas resulted in substantial teletrafficgrowth, and it is anticipated that thisgrowth will continue until 2020.

Since the first radio communica-tions at the end of the 19th century,mankind has been on a path of devel-oping technologies enabling people tocommunicate with anyone, anywhereand at any time using a range of media-rich services. However, despite signifi-cant advances in mobile user equip-ment and the wireless and wired net-works themselves, the provisioning ofa true telepresence capability is notyet within our grasp, as anyone whohas experienced dropped connections,capacity and coverage problems, slowcontent download times, terminal bat-tery-life issues and network latencydisrupting the quality of the mediaexperience would know. In addition tohuman-to-human and IP network typecommunications, we are also seeing anew class of service evolve, machine-to-machine (or machine-type commu-nication) that will place furtherdemands on network capabilities.

THE NETWORK EVOLVESStandards organizations, such as3GPP, along with industry conglomer-ates such as the Next GenerationNetwork Mobility Alliance andresearch laboratories in industry andacademia have been racing to defineand drive mobile broadband 4G tech-nologies such as LTE (long-term evolu-tion) and the Evolved Packet Core toaddress all of these varied challenges.In the past four years, we have gone

18 Xcell Journal Second Quarter 2013

X C E L L E N C E I N W I R E L E S S C O M M U N I C AT I O N S

It took 16 years to reach 1 billion smartphones worldwide; it is estimatedthat it will take only three years for the next billion smartphone users to

come onboard. The capacity challenge will rise exponentially.

Page 19: Xcell Journal issue 83

improvements promised by spatialmultiplexing in practice, and the reali-ty is that implementation losses—forexample through imprecise channelestimates—limit the gains promisedby theory.

We are already seeing, and will con-tinue to see at an accelerated pace, thedeployment of heterogeneous net-works comprising a macrocell aug-mented with relays and with a small-cell underlay to address issues of bothcapacity and coverage in dense urban-canyon environments, to extend in-building coverage and for use in envi-ronments where regulatory considera-tions prevent the installation of amacrocell. Heterogeneous networks(aka HetNets) and cell densificationbring yet further challenges to the net-work designer in the form of backhaul,intercell interference and managinghandover between cells or even to car-rier-grade Wi-Fi.

Sixty years of research followingShannon’s pioneering paper have ledto advanced coding schemes—turbo

from the standardization of the 3GPPRelease 8 specification in late 2008—the first 3GPP standard to define next-generation capability making use ofmulticarrier OFDMA and SC-FDMAtechnology—through the first majorevolution of LTE with the LTE-Advanced Release 10 specification,ratified in 2011. While LTE and LTE-Asystems are still being deployed, theRelease 12 working groups are plan-ning the requirements and technolo-gies for LTE-B.

One of the key challenges the wire-less industry faces is extending capac-ity. Claude E. Shannon’s famous chan-nel-capacity formula—outlined in aseminal 1948 paper (“A MathematicalTheory of Communication”) pub-lished when the comms pioneerworked at Bell Labs—informs us thatbandwidth efficiency can onlyimprove at the cost of reduced powerefficiency. The law shows that capaci-ty is a linear function of bandwidthbut a logarithmic function of signal-to-noise ratio. This has led the researchcommunity to invent increasinglyeffective ways to use bandwidth with-in a complex set of constraints involv-ing regulatory requirements, consider-ations for spectrum licensing, interop-erability, multiple access, reliability,quality of service and spectral flexibil-ity to the level of each individual pieceof user equipment.

These constraints are some of themotivating principles that led to themulticarrier modulation scheme andevolved packet core employed in LTE,and they continue to be the motivatorsfor features such as carrier aggrega-tion in LTE-Advanced. But relying onchannel bandwidth alone to addressthe capacity challenge is not enough.

Further capacity improvementshave been made possible through theuse of spatial-division multiplexing(SDM) MIMO, where (and in contrastto the logarithmic capacity formula) alinear increase in capacity is theoreti-cally possible with respect to the num-ber of antennas when the number of

transmit and receive antennas is thesame. While SDM can be employed toincrease the throughput for a singleuser, spatial-division multiple access(SDMA) configurations can maximizethe number of supported users in acell by sharing the total systemthroughput among the users. MIMOtransceivers, together with distributedor virtual MIMO, have permittedpower-efficient or “green” solutions tothe capacity problem. As the industrymoves to LTE-B, it is likely that 3DMIMO, or full-dimension MIMO, willbe embraced by both a future versionof the 3GPP standard and by OEMsbuilding macro basestations (Figure1). Operators will deploy heteroge-neous networks along with beam-forming small cells that offer verticaland horizontal electrical-beam tilt toaddress coverage holes and minimizeintercell interference.

However, augmenting carrier aggre-gation with advanced multi-antennatechniques is still not enough. It’s diffi-cult to realize the theoretical capacity

X C E L L E N C E I N W I R E L E S S C O M M U N I C AT I O N S

Second Quarter 2013 Xcell Journal 19

• 2D active antenna array (AAA) and up to ~100 at eNB

• MU-MIMO with 10s of EUs

• High-order MU-MIMO transmissionto more than 10 EUs

LTEInfrastructure

• IP

FD-MIMO eNB• CPRI

Solution for Capacity Demand

Figure 1 – Industrial and academic research communities, and the 3GPP PHY workinggroup, are analyzing the potential use of massive, or 3D, MIMO for future-generation cel-lular access. Massive MIMO scales the antenna system by an order of magnitude comparedwith the systems of today, deploying hundreds of antennas.

Page 20: Xcell Journal issue 83

codes, for example—along with multi-antenna techniques, modulation sys-tems, communication protocols anddigital-IF processing that networkcompanies can combine to realize sys-tems that operate arbitrarily close tochannel capacity. But there is still along way to go to address the fero-cious appetite for even more capacity,robust communications and reducedend-to-end latency. In addition to moreflexible and sophisticated or smarter(if you will) use of spectrum by meansof OFDMA and smart-antenna configu-rations, concepts such as enhancedintercell interference cancellation arecoming online to assist with handoverand maintaining the resilience of aconnected device.

The rich set of concepts capturedunder the umbrella of self-organizingnetworks (SONs) is one of the “must-have” technologies on the LTE roadmap. There are multiple facets toSON, including automatic neighborrelations, load balancing, trafficgrowth and capacity management.The use of self-optimizing, self-heal-ing technology promises energy sav-ings, improved coverage and LTEparameter optimization, an examplebeing the random-access channel, orRACH, optimization.

Another key aspect of SON is mak-ing the network smarter, with the endgoal of minimizing OPEX-related drivetests. To improve their networks, oper-ators often deploy engineers in thefield to collect radio measurements, todiscover problems such as coverageholes in the network and to determinewhether there’s a need to tune basesta-tion or network parameters. However,such conventional drive tests areexpensive, and the measurements theycollect can only give a limited view, atone point in time, of the entire net-work. Member companies of the NextGeneration Mobile Networks and3GPP organization are actively dis-cussing ways to address these chal-lenges. The Release 10 specificationintroduced the concept of minimiza-

tion of drive tests and is an importantcontribution to the area of SON thatwill assist with node insertion in thenetwork and enabling automaticparameter tuning over the life of thenode and network.

Yet another consideration for net-work design is the operating cost, withone of the significant contributors toOPEX, and actually CAPEX, being theRF platform, which includes thepower amplifier, or amplifiers formulti-antenna systems. The green con-siderations for the RF electronics aremultifaceted. Deployment of a nonlin-ear power amplifier is desirable fromthe vantage point of cost, but the non-constant-envelope OFDMA modulationscheme used in the LTE RAN downlinkhas the undesirable consequence of

spectral regrowth when processed by alow-cost nonlinear amplifier. Thesehighly undesirable out-of-band spectralemissions, which are delivered to neigh-boring spectrum, compromise nearbycommunication systems and breachregulatory requirements.

One solution is to use a large back-off and exercise only the linear por-tion of the amplifier transfer function.However, since we are also interestedin the efficiency of the power amplifi-er or the effectiveness of the amplifi-er to convert DC power to the RF car-rier, it’s best to operate the amplifierwith only a small output backoff,since the amplifier is most efficient atthis bias point. Biasing the amplifierto minimize the backoff means mov-ing closer to the compressive part of

20 Xcell Journal Second Quarter 2013

X C E L L E N C E I N W I R E L E S S C O M M U N I C AT I O N S

Figure 2 – The first-ever published waterfall plot for convolutional turbocodes, published by Berrou et al. in their famous 1993 paper. Turbo codesrevolutionized the coding system employed in 3G and 4G wireless systems.

Page 21: Xcell Journal issue 83

tectures that integrate the basebandinto the radio head, or remote radiohead architectures supporting the“baseband hotel” model exploiting thecloud. The go-to-market and technolo-gy considerations will even be differ-ent for a single OEM based on thegeography of the deployed network.Moreover, now more than ever time-to-market governs whether a companyis successful, wildly successful orobsolete. It comes down to selectingtechnology that is flexible, scalableand possesses the rich mix of embed-ded software, compute capability andconnectivity to match the complex setof attributes that characterizes wire-less networks. Beyond the silicon plat-form, design tools and intellectual-property (IP) libraries also play a cru-cial role in enhancing productivity.

Xilinx’s 28-nm 7 series technology,and in particular the All ProgrammableZynq SoC, bring to bear the connectiv-ity, signal-processing and embeddedcomputing capability that is required atthe heart of next-generation networks.The FPGA compute fabric, and itsunparalleled signal-processing capabili-ty, can be applied to OFDMA and SC-FDMA LTE baseband processing forrealizing the high-speed math require-ments of advanced receivers that mightemploy iterative techniques, exchang-ing statistical information between thechannel decoder, equalizer and MIMOdetector in a multi-antenna system. TheZynq SoC likewise supports future-gen-eration systems that might exploit 3DMIMO as one option for increasingcapacity and exceeding the minimumrequirements defined by the ITU as partof the IMT-Advanced requirements.

Another application of FPGA signalprocessing is to address the cost,CAPEX and OPEX, and what we might

the transfer function, which causesissues with multicarrier signals. TheLTE OFDMA downlink waveform isformed as the weighted sum oforthogonal sampled sine waves. Thesignal formed by this sum exhibits alarge peak-to-average power ratio.The amplitude of the real and imagi-nary time series tends to be Gaussianwith a Rayleigh distributed envelope.

The Rayleigh envelope exhibits alarge peak-to-average ratio, with peakvalues exceeding four times the aver-age value, with a probability of 0.00035.To preserve the fidelity of the OFDMAtime series and to avoid spectral arti-facts due to power amplifier clipping,we may choose to operate the amplifierwith the average signal level at one-fourth of full scale; however, the effi-ciency sacrificed in passing signals witha 4-to-1 peak-to-average ratio through apower amplifier is excessive. The peakpower level would be 16 times the aver-age power level. This means that anamplifier required to deliver 5 watts ofaverage power must be capable ofdelivering 80 W of peak power.

Power amplifiers are very inefficientin their transduction process of turningDC power to signal power when theyoperate at small fractions of their peakpower level. For a Class B amplifier, todo a first approximation and for a widerange of output power levels, thepower pulled from the power supply isconstant and approximately 35 percentof the peak power level. Thus, our 80-Wamplifier will pull 28 W from the DCpower supply and deliver 23 of thesewatts to its heat sink while delivering5 W to the external load.

The answer to these dilemmas isdigital signal processing. A midrangeAll Programmable device such asXilinx’s 28-nm Kintex™-7 410T, with

1,540 multiply-accumulators, operat-ing at 491.52 MHz, can deliver a peakcompute performance of 757 billionMACs per second. That is a 300,000xincrease over the first generation ofHarvard architecture, integrated DSPprocessors released in the early 1980s.Companies can use this DSP computepower to implement advanced crest-factor reduction to control the peak-to-average ratio of the transmissionwaveform, and to apply digital predis-tortion processing to linearize a low-cost, highly nonlinear Doherty poweramplifier. In this way, designers con-trol the overall cost of their equipmentby defraying some of the high costassociated with building high-fidelityRF processing and amplification to therelatively low-cost world of digital sig-nal processing, which unlike the for-mer, has been able to benefit fromMoore’s Law process node scaling.

XILINX AND NEXT-GENERATIONSMARTER NETWORKSThe approach to solving for capacity,latency, coverage, quality of service,OPEX and CAPEX is going to be differ-ent for each mobile-network operatorand cellular-infrastructure manufac-turer. Each unique solution will be afunction of existing assets, expertiseand different go-to-market strategies.A single hardware platform with a sin-gle SoC is not going to allow OEMs todifferentiate, or to have a unified, orcommon platform approach that per-mits technology scaling from small-cell through to macrocell through tocloud RAN network architectures.

A conventional merchant siliconsystem-on-a-chip doesn’t allow for thevarious architectures that an OEMmight need to support, ranging fromthe conventional basestation to archi-

X C E L L E N C E I N W I R E L E S S C O M M U N I C AT I O N S

Second Quarter 2013 Xcell Journal 21

The approach to solving for capacity, latency, coverage, quality of service, OPEX and CAPEX is going to be different for each

mobile-network operator and cellular-infrastructure manufacturer.

Page 22: Xcell Journal issue 83

call green considerations for the net-work. With LTE-A and carrier aggrega-tion, channel bandwidths have movedto 100 MHz. Crest-factor reduction(CFR) and power amplifier lineariza-tion techniques need to evolve to han-dle these wide channels, and thismeans more signal processing. TheCommunications Business Unit atXilinx has been busy for some yearsnow working on these problems andhas developed CFR and digital-predis-tortion IP to service a wide range of airinterface protocols.

As we look to the future, it is easyto see that radio processing is goingto become more complicated as anever-widening range of carrier config-urations must be supported. In lessthan four years, the LTE specificationhas evolved from a basic set of sixchannel bandwidths (1.4, 3, 5, 10, 15and 20 MHz) through to Release 10,where component carriers can beaggregated in intra- as well as inter-band configurations. It is within thescope of Release 12 work that is onlybeginning now to investigate solu-tions to achieve even further spec-trum flexibility.

Of course, not all basestations,whether macro or small cell, need tosupport all possible scenarios. Thatfact leaves OEMs with the dilemma ofneeding to rapidly build a suite of cost-effective solutions that can capture thefrequency-planning requirements of anetwork operator. These requirementsare a complex equation based on theparameters of spectrum licensing,spectrum fragmentation and the needto support multiple radio-access tech-nologies from a single RF chain—themulti-RAT challenge. Further compli-

cating the picture is the introductionof heterogeneous networks comprisinga mixture of macro basestations, small-cell underlays and Wi-Fi offload. It’s aquestion of scale and flexibility, twoproperties at the very heart of XilinxAll Programmable SoCs and FPGAs.

But there is more to the problemthan signal processing in the PHY.LTE, and the future LTE-A and LTE-Bnetworks, are getting smarter in somany dimensions, especially as soft-ware increasingly becomes a key con-sideration for anything from runningself-organizing network applicationsand minimizing drive tests, performing

LTE parameter optimization, imple-menting real-time performance man-agement, collecting statistics for net-work and equipment planning purpos-es, through to running protocol stacksand O&M software.

With the introduction of the ZynqSoC in 2012, Xilinx put an industry-firstcomputing paradigm into the hands ofsystem designers by providing a combi-nation of dual-Cortex™ A9 processingtightly coupled with a high-performanceprogrammable logic fabric that can beused for signal processing, MAC-layer

acceleration and connectivity such asEthernet, CPRI and OBSAI. Thesedevices, and other products in the port-folio, go directly toward meeting thediverse mix of compute requirements,ISA and datapath, scalability and flexi-bility needed for an OEM to build differ-entiated products and get them to mar-ket quickly and cost-effectively.

Backhaul in this emerging age willalso be a challenge. As the capabilityof the radio-access network evolves,the backhaul requirements of the net-work increase dramatically. Differentcell structures and deployment sce-narios will use different backhaul

22 Xcell Journal Second Quarter 2013

X C E L L E N C E I N W I R E L E S S C O M M U N I C AT I O N S

DAC

DAC

ADC

ADC

ADC

OpticalModules

Memory

MemoryControl SPI/I2C UARTS GPIO

6/10GSerdes

6/10GSerdes

CPRI/OBSAIMaster

CPRI/OBSAIMaster

ARM A9:O&MRTOS

ARM A9:DPD

Update

Digital Up-conversion

(DUC)

Digital Down-

conversion(DDC)

Crest-Factor

Reduction(CFR)

DigitalPre-

distortion(DPD)

JESD204or LVDS

JESD204or LVDS

JESD204or LVDS

Figure 3 – The Zynq SoC enables the construction of a single-chip digital-IF processing chain incorporating the functions of up- and downconversion, crest-factor reduction, digital predistortion and connectivity.

Page 23: Xcell Journal issue 83

brought a new generation of designtools to the market. One of the excitingadvances in this area was with the 2012public release of high-level synthesistechnology: Vivado HLS. With this flowcustomers can write in C, C++ orSystemC, and compile to Xilinx silicon.This ability to program FPGAs from ahigh-level language enhances productiv-ity and puts the All Programmabledevices within reach of designers whodon’t speak hardware description lan-guages as a first language. A new Vivadophysical-implementation tool bringsadvances in quality of results, run-timeand ease of use to customers.

For more information on how Xilinxis helping customers build smarter net-works, visit http://www.xilinx.com/

applications/smarter-networks/

index.htm.

technologies, ranging from Ethernetand optical fiber for some small-cellinstallations, through point-to-pointmicrowave and e-band millimeter-wave links in other configurations.One way Xilinx is assisting OEMs inthis area is with backhaul modem IP.

The recent acquisition of a compa-ny with expertise in this space meansthat Xilinx can now supply backhaultechnology that not only competeswith incumbent ASSP silicon, but goesa step further by providing the flexibil-ity to scale up, or down, link capacitybased on the exact needs of the prob-lem at hand. For example, for somehigh-end scenarios, dual-mode trans-mission on both the vertical and hori-zontal polarizations of the electromag-netic wave can be used to increaselink capacity or reliability. In other sit-

uations, a simpler, lower-data-rate andalso lower-cost, single polarizationscheme might be sufficient. The soft-modem IP can be integrated in a chipto realize any of the common configu-rations, 1+1, 2+0 and so on, in a flexi-ble manner, enabling a customer witha progression of solutions that scaleover performance and cost.

The Zynq SoC can bring all of theseaspects of network design to reality ina single chip: software running on theARM Cortex-A9 processors; RANphysical-layer functions and the back-haul modem implemented within thecompute fabric and massively parallelarray of multiply-accumulate engines;and connectivity protocols built usingthe high-speed serdes (Figure 3).

Along with advanced silicon and IPsolutions, Xilinx has also recently

X C E L L E N C E I N W I R E L E S S C O M M U N I C AT I O N S

Second Quarter 2013 Xcell Journal 23

Debugging Xilinx's ZynqTM -7000 family

with ARM CoreSight

RTOS support, including Linux

kernel and process debugging

SMP/AMP multicore CortexTM-

A9 MPCoreTMs debugging

Up to 4 GByte realtime trace

including PTM/ITM

Profiling, performance and

statistical analysis of Zynq's

multicore CortexTM -A9 MPCoreTM

Page 24: Xcell Journal issue 83

24 Xcell Journal Second Quarter 2013

XCELLENCE IN EMBEDDED VIS ION

Using OpenCV and Vivado HLS to AccelerateEmbedded Vision Applicationsin the Zynq SoC

Page 25: Xcell Journal issue 83

Second Quarter 2013 Xcell Journal 25

Computer vision has been awell-established discipline inacademic circles for several

years; many vision algorithms todayhail from decades of research. Butonly recently have we seen the prolif-eration of computer vision in manyaspects of our lives. We now have self-driving cars, game consoles that reactto our every move, autonomous vacu-um cleaners and mobile phones thatrespond to our gestures, among othervision products.

The challenge today is how toimplement these and future vision sys-tems efficiently while meeting strictpower and time-to-market constraints.The Zynq™ All Programmable SoCcan be the foundation for such prod-ucts, in tandem with the widely usedcomputer vision library OpenCV andthe high-level synthesis (HLS) toolsthat accelerate critical functions inhardware. Together, this combinationmakes a powerful platform fordesigning and implementing SmarterVision systems.

Embedded systems are ubiquitousin the market today. However, limita-tions in computing capabilities, espe-cially when dealing with large picturesizes and high frame rates, haverestricted their use in practical imple-mentations for computer/machinevision. Advances in image sensortechnologies have been essential inopening the eyes of embeddeddevices to the world so they can inter-act with their environment using com-puter vision algorithms. The combina-tion of embedded systems and com-puter/machine vision constitutes

embedded vision, a discipline that isfast becoming the basis for designingmachines that see and understandtheir environments.

DEVELOPMENT OF EMBEDDEDVISION SYSTEMSEmbedded vision involves runningintelligent computer vision algorithmsin a computing platform. For manyusers, a standard desktop-computingprocessing platform provides a conve-niently accessible target. However, ageneral computing platform may notmeet the requirements for producinghighly embedded products that arecompact, efficient and low in powerwhen processing large image-datasets, such as multiple streams of real-time HD video at 60 frames/second.

Figure 1 illustrates the flow thatdesigners commonly employ to createembedded vision applications. Thealgorithm design is the most impor-tant step in this process, since thealgorithm will determine whether wemeet the processing and quality crite-ria for any particular computer visiontask. At first, designers explore algo-rithm choices in a numerical-comput-ing environment like MATLAB® inorder to work out high-level process-ing options. Once they have deter-mined the proper algorithm, they typi-cally model it in a high-level language,generally C/C++, for fast executionand adherence to the final bit-accu-rate implementation.

System partitioning is an importantstep in the development process. Here,through algorithm performance analy-sis, designers can determine what por-

Pairing Vivado HLS with the OpenCVlibraries enables rapid prototyping anddevelopment of Smarter Vision systemstargeting the Zynq All Programmable SoC.

X C E L L E N C E I N E M B E D D E D V I S I O N

by Fernando Martinez Vallina HLS Design Methodology Engineer Xilinx, [email protected]

José Roberto AlvarezEngineering Director for Video TechnologyXilinx, [email protected]

Page 26: Xcell Journal issue 83

tions of the algorithm they will needto accelerate in hardware given thereal-time requirements for processingrepresentative input data sets. It isalso important to prototype the entiresystem in the target platform, so as torealistically measure performanceexpectations. Once the prototypingprocess indicates that a design hasmet all performance and quality tar-gets, designers can then start imple-menting the final system in the actualtargeted device. Finally, the last stepis to test the design running on thechip in all use-case scenarios. Wheneverything checks out, the team canrelease the final product.

ZYNQ SOC: SMARTEST CHOICEFOR EMBEDDED VISIONIn the development of machinevision applications, it is very impor-tant for design teams to choose ahighly flexible device. They need acomputing platform that includespowerful general-purpose process-

ing capabilities supporting a widesoftware ecosystem, along withrobust digital signal-processing capa-bilities for implementing computa-tionally demanding and memory-effi-cient computer vision algorithms.Tight integration at the silicon level isessential for implementing efficientand complete systems.

Xilinx® All Programmable SoCs areprocessor-centric devices that offersoftware, hardware and I/O program-mability in a single chip. The ZynqSoC features an ARM® dual-coreCortex™-A9 MPCore™ processingsystem coupled with FPGA logic andkey peripherals on a single device. Assuch, the device enables designers toimplement extremely efficient embed-ded vision systems.

This level of integration betweenthe processing subsystem, FPGA logicand peripherals in the Zynq SoCensures faster data transfer speeds anda much lower power requirement andBOM cost than a system designed withindividual components. It is feasible toimplement systems in the Zynq SoCthat require real-time processing for1080p60 video sequences (1,920 x 1,080

RGB pictures at 60 frames/s) with pro-cessing capabilities in the hundreds ofgiga-operations per second.

To take full advantage of the manyfeatures of the Zynq SoC, Xilinx pro-vides the Vivado™ Design Suite, an IP-and system-centric design environ-ment that increases designer develop-ment productivity with the fast inte-gration and implementation cyclesneeded to dynamically developsmarter embedded products. A com-ponent of that suite, Vivado HLS,allows you to take algorithms you’vedeveloped in C/C++ and compile theminto RTL to run in the FPGA logic.

The Vivado HLS tool is particularlywell-suited to embedded vision design.In this flow, you create your algorithmsin C/C++; compile the algorithm orparts of the algorithm into RTL usingVivado HLS; and determine which func-tions are better suited to run in FPGAlogic and which are better suited to runon the ARM processor. In this way, yourdesign team can home in on the optimalperformance for their vision systemsrunning in the Zynq SoC.

To further help embedded visiondevelopers to create Smarter Vision

26 Xcell Journal Second Quarter 2013

X C E L L E N C E I N E M B E D D E D V I S I O N

AlgorithmDesign

Modeling

Implementation

Release

SystemPartitioning

Prototyping

+_ +_

C, C++ or SystemC

VHDL or Verilog

System IP Integration

Algorithmic Specification

Microarchitecture Exploration

RTL Implementation

Comprehensive Integration withthe Xilinx Design Environment

Accelerate Algorithmic C-to-IP Integration

Vivado HLS

Figure 1 – Embedded vision system

development process

Figure 2 – High-level synthesis design flow

Page 27: Xcell Journal issue 83

systems, Xilinx has added to Vivadosupport for the OpenCV libraries ofcomputer vision algorithms. Xilinxhas also introduced the new IPIntegrator tools and SmartCORE™ IPto support these kinds of designs (seecover story, page 8).

OPENCV MAKES COMPUTERVISION ACCESSIBLEOpenCV provides a path to the devel-opment of intelligent computer visionalgorithms that are predicated on real-time performance. The libraries pro-vide designers with an environmentfor experimentation and fast proto-typing of algorithms.

The design framework of OpenCVis supported in multiple platform. Butin many cases, to make the librariesefficient for embedded productsrequires implementation in an embed-ded platform that is capable of accel-erating demanding routines for real-time performance.

While OpenCV was designed withcomputational efficiency in mind, itoriginated in traditional computingenvironments with support for multi-core processing. These computingplatforms may not be optimal for anembedded application where efficien-cy, cost and power consumption areparamount.

FEATURES OF OPENCVOpenCV is an open-source computervision library released under a BSDlicense. This means that it is free touse in academic and commercialapplications. It was originallydesigned to be computationally effi-cient on general-purpose multipro-cessing systems, with a strong focuson real-time applications. In addi-tion, OpenCV provides access tomultiple programming interfaceslike C/C++ and Python.

The advantage of an open-sourceproject is that the user community isconstantly improving algorithms andextending them for use in a widevariety of application domains. There

are now more than 2,500 functionsimplemented in OpenCV; here aresome examples:

• Matrix math

• Utilities and data structures

• General image-processing functions

• Image transforms

• Image pyramids

• Geometric descriptor functions

• Feature recognition, extractionand tracking

• Image segmentation and fitting

• Camera calibration, stereo and3D processing

• Machine learning: detection,recognition

For a more detailed description ofOpenCV, please go to opencv.org andopencv.willowgarage.com.

ACCELERATING OPENCV FUNCTIONS USING HLSOnce you have partitioned thearchitecture of an embedded visionsystem to single out computational-ly demanding portions, HLS tools

can help you accelerate these func-tions while still written in C++.Vivado HLS makes use of C, C++ orSystemC code to produce an effi-cient RTL implementation.

Furthermore, the Vivado IP-centricdesign environment provides a widerange of processing IP SmartCOREsthat simplify connections to imagingsensors, networks and other neces-sary I/O interfaces, easing the processof implementing those functions in theOpenCV libraries. This is a distinctadvantage from other implementationalternatives, where there is a need toaccelerate even the most fundamentalOpenCV I/O functionality.

WHY HIGH-LEVEL SYNTHESIS?The Vivado HLS compiler from Xilinxis a software compiler designed totransform algorithms implemented inC, C++ or SystemC into optimized RTLfor a user-defined clock frequency anddevice in the Xilinx product portfolio.It has the same core technology under-pinnings as a compiler for an x86processor in terms of interpretation,analysis and optimization of C/C++programs. This similarity enables arapid migration from a desktop devel-opment environment into an FPGA

X C E L L E N C E I N E M B E D D E D V I S I O N

Second Quarter 2013 Xcell Journal 27

Current Frame

Previous Frame

Detected New Car

Output Frame

OpenCVMotion-Detection

Algorithm

Figure 3 – Motion-detection example from the OpenCV library of algorithms

Page 28: Xcell Journal issue 83

implementation. The default behaviorof Vivado HLS will generate an RTLimplementation without the need foruser input after you have selected thetarget clock frequency and device. Inaddition, Vivado HLS, like any othercompiler, has optimization levels.Since the final execution target of thealgorithm is a tailor-made microarchi-tecture, the level of optimizations pos-sible in Vivado HLS is finer-grainedthan in a traditional compiler. Theconcept of O1 – O3 optimizations typ-ical in software design for a processoris replaced with architecture-explo-ration directives. These directivesdraw on the expertise of the user toguide Vivado HLS in creating the bestpossible implementation for a particu-lar algorithm in terms of power, areaand performance.

The user design flow for the HLScompiler is shown in Figure 2. At a

conceptual level, the user provides aC/C++/SystemC algorithmic descrip-tion and the compiler generates anRTL implementation. The transfor-mation of a program code into RTL isdivided into four major regions: algo-rithm specification, microarchitec-ture exploration, RTL implementa-tion and IP packaging.

The algorithmic-specification stagerefers to the development of the soft-ware application that will be targetedto the FPGA fabric. This specificationcan be developed in a standard desk-top software-development environ-ment and can make complete use ofXilinx-provided software librariessuch as OpenCV. In addition toenabling a software-centric develop-ment flow, Vivado HLS elevates theverification abstraction from RTL toC/C++. The user can carry out com-plete functional verification of the

algorithm using the original software.After RTL generation through VivadoHLS, the generated design is analo-gous to processor assembly code gen-erated by a traditional software com-piler. The user can debug at thisassembly code level, but is notrequired to do so.

While Vivado HLS can handlealmost all C/C++ code targeted atother software compilers, there isone restriction placed on the code.For compilation into an FPGA usingVivado HLS, the user code cannotinclude any run-time dynamic memo-ry allocations. Unlike a processor,where the algorithm is bounded by asingle memory architecture, an FPGAimplementation has an algorithm-specific memory architecture. Byanalyzing the usage patterns ofarrays and variables, Vivado HLS candetermine the physical-memory lay-

28 Xcell Journal Second Quarter 2013

X C E L L E N C E I N E M B E D D E D V I S I O N

Figure 4 – Motion detection on the Zynq SoC using the ARM processor

Page 29: Xcell Journal issue 83

out and memory types that will bestfit the storage and bandwidthrequirements in an algorithm. Theonly requirement for this analysis towork is that you explicitly describeall memory that an algorithm con-sumes in the C/C++ code in the formof arrays.

The second step in the transfor-mation from a C/C++ implementa-tion into an optimized FPGA imple-mentation is the microarchitectureexploration. At this stage, you applyVivado HLS compiler optimizationsto test out different designs for theright mix of area and performance.You can implement the same C/C++code at different performance pointswithout having to modify sourcecode. The Vivado HLS compiler opti-mizations or directives are how theperformance of different portions ofan algorithm are stated.

The final stage in the Vivado HLScompiler flow is RTL implementationand IP packaging. These are automat-ic stages inside the Vivado HLS com-piler that do not require RTL knowl-edge from users. The details of opti-mized RTL creation for differentdevices in the Xilinx product portfo-lio are built into the compiler. At thisstage, the tool is, for all intents andpurposes, a pushbutton utility thathas been thoroughly tested and veri-fied to produce timing-driven andFPGA fabric-driven RTL. The outputfrom the Vivado HLS compiler isautomatically packaged in formatsaccepted by other Xilinx tools such asIP-XACT. Therefore, there are noadditional steps in using an HLS-gen-erated IP core in Vivado.

The OpenCV libraries from Xilinxprovide a shortcut in the process ofdesign optimization using Vivado

HLS. These libraries have beenprecharacterized to yield functionscapable of pixel-rate processing at1080p resolutions. The optimizationknowledge required to guide theVivado HLS compiler is embeddedinto these libraries. Thus, you are freeto quickly iterate between an OpenCVconcept application in a desktop envi-ronment to a running OpenCV applica-tion on the Zynq SoC, which enablesoperations on the ARM processor andthe FPGA fabric.

Figure 3 shows an overview of amotion-detection application devel-oped in OpenCV. The goal of thisdesign is to detect moving objects ina video stream by comparing the cur-rent image frame with the previousone. The first stage in the algorithmis to detect the edges in both frames.This data-reduction operation makesit easier to analyze the relative

X C E L L E N C E I N E M B E D D E D V I S I O N

Second Quarter 2013 Xcell Journal 29

Figure 5 – Motion detection on the Zynq SoC using the programmable fabric

Page 30: Xcell Journal issue 83

change between consecutive frames.Once the edge information has beenextracted, edges are compared todetect edges that appear in the cur-rent image but not in the previousimage. These new detected edgescreate a movement-detection maskimage. Before the results of the newedge detection can be highlighted onthe current image, you must takeinto account the effects of imagesensor noise. This noise, which canvary from frame to frame, can leadto random false edges in the motion-detection mask image. Therefore,you must filter this image to reducethe impact of noise on the quality ofthe algorithm.

Noise reduction for this applicationis accomplished through the use of a7x7 median filter on the movement-detection mask image. The centralidea of a median filter is to take themiddle value in a 7x7 window ofneighboring pixels. The filter thenreports back the median value as thefinal value for the center pixel of thewindow. After noise reduction, themovement-detection mask image iscombined with the live input image tohighlight moving edges in red.

You can fully implement the appli-cation to run on the ARM processingsubsystem with the source code toZynq SoC mapping shown in Figure4. The only hardware elements in thisimplementation are for the “cvget-frame” and “showimage” functions.These video I/O functions are imple-mented using Xilinx video I/O sub-systems in the FPGA fabric. At thetime of the cvgetframe function call,the input side of the video I/O sub-system handles all the details, grab-bing and decoding a video streamfrom an HDMI interface and placingthe pixel data into DDR memory. Atthe time of showimage, this subsys-tem handles the transfer of pixel datafrom DDR memory into a video dis-play controller in order to drive a TVor other HDMI-compliant video dis-play device.

Vivado HLS-optimized OpenCVlibraries for hardware accelerationenable the porting of the code inFigure 4 into a real-time 60-frame/spixel-processing pipeline on theFPGA fabric. These OpenCV librariesprovide foundational functions forthe elements in OpenCV that requirehardware acceleration. Without hard-ware acceleration—that is, if youwere to run all the code in the ARMprocessors only—this algorithm hasa throughput of merely 1 frame every13 seconds (0.076 frames/s). Figure 5shows the new mapping of the appli-cation after Vivado HLS compilation.Note that the video I/O mapping ofthe original system is reused. Thecomputational core of the algorithm,which was previously executing onthe ARM processor, is compiled intomultiple Vivado HLS-generated IPblocks. These blocks, which are con-nected to the video I/O subsystem inVivado IP Integrator, are optimizedfor 1080p resolution video process-ing at 60 frames/s.

The All Programmable environ-ment provided by the Zynq SoC andVivado Design Suite is well-suitedfor the design, prototyping and test-ing of embedded vision systems atthe high data-processing ratesrequired by the latest high-definitionvideo technologies. Using the open-source set of libraries included inOpenCV is the best way to imple-ment demanding computer visionapplications in short developmenttimes. Since OpenCV libraries arewritten in C++, we use Vivado HLSto create source code that can beefficiently translated to hardwareRTL in the Zynq SoC FPGA fabric,and used as convenient processingaccelerators without sacrificing theflexibility of the design environmentoriginally envisioned by OpenCV.

For more information on creatingSmarter Vision designs with VivadoDesign Suite, visit http://www.xil-

inx.com/products/design-tools/viva-

do/index.htm.

30 Xcell Journal Second Quarter 2013

X C E L L E N C E I N E M B E D D E D V I S I O N

FPGA S OLUTIONS

? 100K Logic Cells + 240 DSP Slices? DDR3 SDRAM + Gigabit Ethernet? SO-DIMM form factor (68 x 30 mm)

Mars AX3 Artix®-7 FPGA Module

from

Mars ZX3 SoC Module

? Xilinx Zynq™-7000 All Programmable SoC (Dual Cortex™-A9 + Xilinx Artix®-7 FPGA)

? DDR3 SDRAM + NAND Flash? Gigabit Ethernet + USB 2.0 OTG? SO-DIMM form factor (68 x 30 mm)

We speak FPGA.

www.enclustra.com

? Xilinx Kintex™-7 FPGA? High-performance DDR3 SDRAM? USB 3.0, PCIe 2.0 + 2 Gigabit Ethernet ports? Smaller than a credit card

Mercury KX1 FPGA Module

FPGA Manager Solution

Simple, fast host-to-FPGA data transfer, for PCI Express, USB 3.0 and Gigabit Ethernet. Supports user applications written in C, C++, C#, VB.net,

MATLAB®, Simulink® and LabVIEW.

Page 32: Xcell Journal issue 83

32 Xcell Journal Second Quarter 2013

XPLANATION: FPGA 101

Xilinx High-Level Synthesis Tool SpeedsFPGA Designby Sharad SinhaPhD CandidateNanyang Technological University, [email protected]

Page 33: Xcell Journal issue 83

H igh-level synthesis (HLS)refers to an automated wayof synthesizing a digital

design beginning with its C, C++ orSystemC description. Engineershave a keen interest in high-levelsynthesis not only because it allowsthem to work at a high level ofabstraction, but also because itoffers the ability to easily generatemultiple design solutions. With HLS,you get the opportunity to investi-gate numerous possibilities andweigh their area and performancecharacteristics before settling onone of them to implement on yourFPGA chip. For instance, you caninvestigate the effects of mappingmemories to Block RAM (BRAM) ordistributed RAM, or explore theeffects of loop unrolling and otherloop-related optimizations—all with-out manually generating differentregister-transfer-level (RTL) designs.Just setting the relevant directives inthe C/C++/SystemC design is all youneed to do.

Xilinx has introduced an HLS toolwithin its newly released Vivado™tool suite. Vivado HLS, a rebrandingand respin of the AutoESL tool, pro-vides many techniques to optimizeyour C/C++/SystemC code to achievetarget performance. HLS tools likethis one help you rapidly imple-ment algorithms on FPGAs withoutresorting to a time-consuming RTLdesign methodology based on hard-ware description languages such asVerilog and VHDL.

To get a handle on how VivadoHLS operates from a user’s perspec-tive, let’s take a walk through an end-to-end synthesis process, fromdesign description (C/C++/SystemC)all the way to FPGA implementation,using matrix multiplication as ourdesign example. Matrix multiplica-tion is common in many applicationsand is extensively used in image andvideo processing, scientific comput-ing and digital communications. Allthe results in this project were gen-erated using Vivado HLS 2012.4,along with Xilinx® ISE® software(version 14.4) for physical synthesisand place-and-route. This flow alsouses ModelSim and GCC-4.2.1-mingw32vc9 for RTL co-simulation.

Figure 1 shows a simple synthe-sis-oriented flow beginning with aC/C++/SystemC description of thedesign to be synthesized in anFPGA. The C/C++/SystemC test-bench is the testbench needed toverify the correct functioning of thisdesign. You will also need it for RTLand C co-simulation. Co-simulationinvolves verifying the generatedRTL design (the .v or .vhd design)for functionality using this C/C++/SystemC testbench rather than anRTL testbench or a testbench writ-ten in a verification language like“e” or Vera. The clock period con-straint sets the target clock periodat which the design is supposed torun. The target FPGA device is theXilinx FPGA on which the designwill be mapped.

Second Quarter 2013 Xcell Journal 33

X P L A N A T I O N : F P G A 1 0 1

Pairing Vivado HLS with high-level languages like C allows you to rapidlyimplement algorithms on FPGAs.

Page 34: Xcell Journal issue 83

Synthesis takes place in Step 3, asVivado HLS synthesizes the functiondefined in the source file. The output ofthis stage includes Verilog and VHDLcode (the RTL design) of the C functionas well as estimates of resource utiliza-tion on the target FPGA and the esti-mated clock period with respect to thetarget clock period. Vivado HLS alsogenerates estimates of latency as wellas loop-related metrics.

Step 4 involves simulating the gener-ated RTL using the C testbench. Thisstep is called RTL co-simulation becausethe tool employs the same testbenchthat was used to verify the C source codeto now verify the functional correctnessof the RTL. For this step to be success-ful, your PATH environment variable onyour system (Windows or Linux) shouldinclude the path to your ModelSim

MATRIX MULTIPLICATION IN CTo get the most of our matrix multipli-cation example, we will explore vari-ous modifications of the C implementa-tion of matrix multiplication to showtheir effect on the synthesized design.This process will highlight the impor-tant points you will need to be aware ofwhen using HLS for prototyping as wellas actual design. I will skip the stepsinvolved in setting up a project, as youcan easily find these in the tool docu-mentation. Instead, I will focus on thedesign and implementation aspects.

In a typical Vivado HLS flow, youwill need three C/C++ files: the sourcefile, which includes the C function tobe synthesized; the header file; and afile describing the testbench within acall to the main() function.

The header file includes not onlythe declaration of the function imple-mented in the source file, but alsodirectives to support user-defined datatypes with specific bit widths. Thisallows a designer to use bit widths dif-ferent from the standard bit widthsdefined in C/C++. For instance, aninteger data type (int) is always 32 bitslong in C. However, in Vivado HLS, youcan specify a user-defined data type,“data,” that uses only 16 bits.

Figure 2 shows a simple C functionfor matrix multiplication. Two matri-ces, mat1 and mat2, are multiplied toget matrix prod. For simplicity, allmatrices are of the same size—namely,two rows by two columns.

34 Xcell Journal Second Quarter 2013

X P L A N A T I O N : F P G A 1 0 1

The steps you will need to executein the HLS flow are:

• Step 1: Build the project

• Step 2: Test the build

• Step 3: Synthesis

• Step 4: RTL co-simulation

• Step 5: Export RTL / RTL implementation

Step 1 compiles the project and testsfor syntax errors and so on in the differ-ent design files. Step 2 tests the func-tion to be implemented (present in thesource file) for its functional correct-ness. In this step, you will execute thecall to the function in the testbench andverify it as functionally correct. If thefunctional verification fails, you can goback and modify the design files.

C/C++/SystemC

.v/.vhd Design

Target FPGA Device

Clock Period Constraint

C/C++/SystemC Testbench Vivado HLS Xilinx ISE FPGA bitstream

void matrixmultiply(data matleft[2][2], data matright[2][2], dataproduct[2][2])

{ data i,j,k; for(i=0;i<2;i++) { for(j=0;j<2;j++)

{ for(k=0;k<2;k++)

{ product[i][j]=product[i][j]+matleft[i][k]* matright[k][j]; }}

}}

Figure 1 — FPGA synthesis flow using Vivado HLS

Figure 2 – Simple C code for 2 x 2 matrix multiplication

Page 35: Xcell Journal issue 83

X P L A N A T I O N : F P G A 1 0 1

Second Quarter 2013 Xcell Journal 35

installation. Also, you must also havethe package GCC-4.2.1-mingw32vc9 inyour ModelSim installation folder.

Finally, Step 5 involves exportingthe RTL as an IP block to be used in abigger design and to be processed byother Xilinx tools. You can export theRTL as an IP-XACT-formatted IPblock, as a System Generator IP blockor as pcore-formatted IP for use witha Xilinx embedded design kit. Whileexporting the Vivado-generated RTL,you can evaluate its post-place-and-route performance by selecting thetool’s “evaluate” option, thus resultingin RTL implementation. In this caseXilinx ISE runs from within theVivado HLS tool itself. For this to hap-pen, you need to set the path to yourISE installation in your PATH environ-ment variable setting, and Vivado HLSwill search for an ISE installation.

Of course, you are also free not toexport the Vivado-generated RTL asan IP block in one of the three for-

mats just described. The exported for-mat files are available in three places:<project_directory>/<solution_num-ber>/impl/<pcores> or <project_direc-tory>/<solution_number>/impl/<sys-

gen> or <project_directory>/<solu-tion_number>/impl/<ip>. You can aswell use the Vivado-generated RTL in abigger design or use it as a top-leveldesign in itself. You need to take careof its interface requirements wheninstantiated in the bigger design.

When the C function in Figure 2 issynthesized, you get the RT-levelimplementation shown in Figure 3. Asyou can see, in this implementation,the elements of matrices 1 and 2 areread into the function and the ele-ments of the product matrix are writ-ten out. Thus, this implementationassumes that memory external to the“matrixmultiply” entity is availablefor storing the elements of matrices 1,2 and prod. Table 1 gives a descrip-tion of the signals and Table 2 showsthe design metrics.

In Table 1, start, done and idle sig-nals are related to the finite statemachine (FSM) that controls the data-path in the design. As you can see, theVivado HLS-generated Verilog assumesthat the operation starts with the startsignal and that valid output data isavailable, with the ap_done signalgoing high from low. The Vivado HLS-generated Verilog/VHDL will alwayshave at least three basic signals,ap_start, ap_done and ap_idle, along

product_we0

product_ce0

product_q0[15:0]

ap_idle

ap_done

ap_start

ap_rst

ap_clk

matright_ce0

matleft_address0[1:0]

matleft_q0[15:0]

product_d0[15:0]

product_address0[1:0]

matrixmultiply

matleft_ce0

matright_address0[1:0]

matright_q0[15:0]

ap_ready

Figure 3 – Design resulting from the code in Figure 2

Design Metric

DSP48E

Lookup tables

Flip-flops

Best achieved clock period (ns)

Latency

Throughput (intitiation interval)

Device: XC6VCX75TFF784-2

1

44

61

2.856

69

69

Table 1 – Design metrics for the design in Figure 3

Page 36: Xcell Journal issue 83

Figure 2 does. This implementationexpects the input and the output matri-ces’ data to be available in memoriesexternal to the implementation.

RESTRUCTURING THE CODEThe code in Figure 4 will serve yourpurpose. It will be part of the sourcefile, which should be a C++ file andnot a C file as earlier. You shouldinclude the relevant headers—hls_stream.h and ap_int.h—in the

with the ap_clk signal. This meansthat no matter which design youimplement using Vivado HLS, thelatency of the design will constrainyour streaming throughput. Thedesign in Figure 2 has a latency of 69clock cycles at a target clock period of3 nanoseconds. That means that forthis particular case, it will take 69clock cycles for all the product matrixelements to be available. Hence, youcannot feed a new set of input matri-

36 Xcell Journal Second Quarter 2013

X P L A N A T I O N : F P G A 1 0 1

ces to your design before at least 69clock cycles.

Now, the implementation as shownin Figure 3 is not probably what youwould have in mind when you tried toimplement matrix multiplication on anFPGA. You would probably want animplementation where you feed theinput matrices, store and computethem internally, and then read out theproduct matrix elements. This is clear-ly not what the implementation in

Signal

matleft_ce0

matleft_q0[15:0]

matleft_address[1:0]

matright_ce0

matright_q0[15:0]

matright_address[1:0]

product_ce0

product_we0

product_d0[15:0]

product_q0[15:0]

product_address0[1:0]

ap_clk

ap_rst

ap_start

ap_done

ap_idle

ap_ready

Description

Chip enable for memory that stores matrix 1

16-bit element of matrix 1

Address to read data from memory storing matrix 1

Chip enable for memory that stores matrix 2

16-bit element of matrix 2

Address to read data from memory storing matrix 2

Chip enable for memory that stores product matrix

Write enable for memory that stores product matrix

Data written to memory that stores product matrix

Data read from memory that stores product matrix

Address from which data is to be read/written to in product matrix

Clock signal for design

Active high synchronous reset signal for design

Start signal for start of computation

Done signal for end of computation and output ready

Idle signal indicating that the entity (design) is idle

Indicates to the data-feeding stage that the design is ready for new input data; to be used in conjunction with ap_idle

Table 2 – Description of signals for the design in Figure 3

Page 37: Xcell Journal issue 83

X P L A N A T I O N : F P G A 1 0 1

Second Quarter 2013 Xcell Journal 37

header file matrixmultiply.h. Notethat in Figure 2, when the source filewas a C file, the header file includedan ap_cint.h header. The header filesap_int.h and ap_cint.h help defineuser-defined data types with arbi-trary bit widths for C++ and C sourcefiles respectively. The header filehls_stream.h file is required to makeuse of the stream interfaces and canonly be used when the source file isin the C++ language.

In order to have a design where youcan just stream in input matrices andstream out the product matrix, youneed to implement read and writestreams in your code. The codehls::stream<> stream_name is used toname the read and the write streams.Thus, d_mat1 and d_mat2 are readstreams and d_product is a writestream. The streaming interfacesbehave as FIFOs. By default, the depthof the FIFOs is 1. You will need to setthe depths in the Vivado HLS direc-tives pane by selecting the definedstreams. For the code in Figure 4, eachstream is four data units deep. Notehere that the (i,j) loop executes beforethe (p,q) loop due to the sequentialnature of C++ code. Hence, thed_mat2 stream will fill up after thed_mat1 stream.

Once you are done with the streaminterface, you can select the matrices tobe mapped to BRAM by applying thedirective RESOURCE and choosing acore through the directives pane.Otherwise, the matrices will be imple-mented using flip-flops and lookup

#include "matrixmultiplystream.h"using namespace hls;stream <data> d_mat1;stream <data> d_mat2;stream <data> d_prodmat;

void matrixmultiplystream (stream<data>& d_mat1, stream<data>& d_mat2,stream<data>& d_prodmat){data matrixleft[2][2]={{0}};data matrixright[2][2]={{0}};data matrixproduct[2][2]={{0}};

data elementmat1;data elementmat2;data i,j,p,q,k;for (i=0;i<2;i++) {for(j=0;j<2;j++)

{ matrixleft[i][j] = d_mat1.read(); }}

for (p=0;p<2;p++) {for(q=0;q<2;q++)

{ matrixright[p][q]=d_mat2.read(); }}for(i=0;i<2;i++) {

for(j=0;j<2;j++){for(k=0;k<2;k++) {

matrixproduct[i][j] = matrixproduct[i][j]+matrixleft[i][k]*matrixright[k][j];

}}

}

for(i=0;i<2;i++) {

for(j=0;j<2;j++){

d_prodmat << matrixproduct[i][j];}

}}

Figure 4 – Restructured source code for matrix multiply

In order to have a design where you can just stream in input matrices and stream out the product matrix, you need

to implement read and write streams in your code. The streaming interfaces behave as FIFOs. By default,

the depth of the FIFOs is 1.

Page 38: Xcell Journal issue 83

38 Xcell Journal Second Quarter 2013

X P L A N A T I O N : F P G A 1 0 1

Without BRAM or distributed RAM for matrices

With single-port BRAM for matrices

With distributed RAM (implemented in LUTs) for matrices

Design Metric

Device: XC6VCX75TFF784-2

DSP48E

Lookup tables

Flip-flops

BRAM

Best achieved clock period (ns)

Latency

Throughput (initiation interval)

1

185

331

0

2.886

84

84

1

109

102

3

3.216

116

116

1

179

190

0

2.952

104

104

Signal

d_mat1_V_read

d_mat1_V_dout [15:0]

d_mat1_V_empty

d_mat2_V_read

d_mat2_V_dout [15:0]

d_mat2_V_empty

d_product_V_din [15:0]

d_product_V_full_n

d_product_V_write

ap_clk

ap_rst

ap_start

ap_done

ap_idle

ap_ready

Description

Signals when the design is ready for inputs to matrix 1 (left matrix)

16-bit streaming element of matrix 1

Signals to the design that no more elements for matrix 1 are left

Signals when the design is ready for inputs to matrix 2 (right matrix)

16-bit streaming element of matrix 2

Signals to the design that no more elements for matrix 2 are left

16-bit output element of product matrix

Signals that product matrix should be written to

Signals that data is being written to the product matrix

Clock signal for design

Active high synchronous reset signal for design

Start signal to begin computation

Done signal to end computation and signal output ready

Idle signal to indicate that the entity (design) is idle

Indicates to the data-feeding stage that the design is ready for new input data; to be used in conjunction with ap_idle

Table 3 – Description of signals for the design in Figure 5

Table 4 – Design metrics for the design in Figure 5

Page 39: Xcell Journal issue 83

X P L A N A T I O N : F P G A 1 0 1

Second Quarter 2013 Xcell Journal 39

tables (LUTs). Note that the directivespane will be active only if you keep thesource file active in the synthesis view.

The implemented design for thecode in Figure 4 is shown in Figure 5.Table 3 describes the signals availableon this design’s interface. In Table 3,d_product_V_full_n is an active lowsignal when the core is to be signaledthat the product matrix is full. Thiswould not be generally required in animplementation.

Table 4 shows the different designmetrics after place-and-route for aclock period constraint of 3 ns, withand without mapping matrix arrays toBRAM or distributed RAM. You cansee from Table 4 that the design fails tomeet a timing constraint of 3 ns whenmatrices are mapped to single-portBRAM. This result has been deliber-ately included in the table to show thatyou can use this methodology to gen-erate a variety of designs with differ-ent area-time metrics. You can also seefrom Table 1 that although the code in

Figure 2 gives a latency of 69 clockcycles, which is less than the restruc-tured design based on code in Figure 4,this design would require memoryexternal to the matrixmultiply entity asexplained earlier.

PRECISION OF THEIMPLEMENTATIONFor the results shown here, I definedthe data type “data” to be 16 bits wide.Hence, elements of all matrices (left,right and product) were only 16 bitswide. As a result, the matrix multiplica-tion and addition operations were notdone at full precision. You can chooseto define another data type, data_t1, inthe header file that would be 32 bitswide; all the elements of the productmatrix would be of this data type,because a 16-bit number (element ofleft matrix) multiplied by another 16-bitnumber (element of right matrix) canbe at most 32 bits wide. In this case, theresource utilization and timing resultswill differ from those in Tables 1 and 4.

The restructured source codeshows how multiple design solutionscan arise from the same source files. Inthis case, one design solution hadBRAM while the other one did not.Within each Vivado HLS project direc-tory, you will see that Vivado HLS hasgenerated separate directories for sep-arate solutions. Within each solutiondirectory will be a subdirectory named“impl” (for implementation). Here,within this subdirectory, you will havea directory titled “Verilog” or “VHDL,”depending on which source code youused during the RTL implementationphase. This same subdirectory alsocontains a Xilinx ISE project file (fileextension .xise). If the Vivado HLS-generated design is your top-leveldesign, you can launch your solution inXilinx ISE by clicking on this file, andthen generate the post-place-and-routemodel for gate-level timing and func-tional simulation. You cannot do thissimulation from within Vivado HLS.

After launching your solution in ISE,you should assign I/O pins to yourdesign. Then you can select “generateprogramming file” in ISE ProjectNavigator to generate the bitstream.

In this exercise, we have walkedthrough an actual Vivado HLS end-to-end flow and then implementation onan FPGA. For many of the advancedfeatures in Vivado HLS, you need toknow your desired hardware architec-ture in order to restructure the sourcecode. For further information, a coupleof documents will prove helpful: theVivado High-Level Synthesis Tutorial(UG871; http://www.xilinx.com/sup-

port/documentation/sw_manuals/xil-

inx2012_2/ug871-vivado-high-level-

synthesis-tutorial.pdf) and the VivadoDesign Suite User Guide (UG002;http://www.xilinx.com/support/docu-

m e n t a t i o n / s w _ m a n u a l s / x i l -

inx2012_2/ug902-vivado-high-level-

synthesis.pdf).

To read more by Sharad Sinha, follow his

blog at http://sharadsinha.wordpress.com.

ap_rst

d_mat2_V_empty_n

d_mat1_V_dout [15:0]

d_mat1_V_empty_n

d_mat2_V_dout [15:0]d_product_V_write

d_product_V_full_n

ap_idle

ap_done

ap_start

ap_clk

d_product_V_din [15:0]

matrixmultiply

d_mat1_V_read

d_mat2_V_read

ap_ready

Figure 5 – Design resulting from the code in Figure 4

Page 40: Xcell Journal issue 83

40 Xcell Journal Second Quarter 2013

XPLANATION: FPGA 101

Having developed your Zynq SoC applicationand tested it using the JTAG interface, the nextstep is to get the boot loader working.

How to ConfigureYour Zynq SoCBare-Metal Solution

by Adam TaylorPrincipal EngineerEADS [email protected]

Page 41: Xcell Journal issue 83

Because of its unique mix of ARM processingclout and FPGA logic in a single device, theZynq™-7000 All Programmable SoC requires a

twofold configuration process, one that takes intoaccount both the processor system and the program-mable logic. Engineers will find that the configurationsequence differs slightly from that of traditionalXilinx® FPGAs. Nevertheless, the methodology isfamiliar and it’s not at all difficult to generate a bootimage and program the configuration memory.

Where standard FPGA configuration practices nor-mally require only the FPGA bit file, you will need toadd a second type of configuration file to get the max-imum benefit from your Zynq SoC: the SW Executableand Linakble Format (ELF) file. The FPGA bit filedefines the behavior of the programmable logic sectionof your design, while ELF file is the software programthat the processing system will execute.

So let’s have a look at how to implement a bare-metal (no operating system) software application onyour Zynq SoC.

CONFIGURATION OVERVIEWWithin a Zynq SoC, the processing system (PS) is themaster and therefore configures the programmablelogic (PL) side of the device. (The only exception tothis rule is if you use the JTAG interface for configu-ration.) This means you can power the processingsystem and have it operating while the programma-ble logic side is unpowered, if so desired, to reducethe overall system power. Of course, if you want touse the PL side of the Zynq SoC, you will need topower it too.

The software application and FPGA bit file arestored within the same configuration memory deviceattached to the processing system. The PS supportsconfiguration by means of a number of nonvolatilememory types, including quad SPI flash, NAND flash,NOR flash and SD cards. You can also configure thesystem via JTAG, just like any other device.

Therefore, the Zynq SoC follows a typical processorboot sequence to configure both sides of the device,initially running from an internal boot ROM that can-not be modified. This boot ROM contains drivers for

Second Quarter 2013 Xcell Journal 41

X P L A N A T I O N : F P G A 1 0 1

This is the second in a planned series of hands-on

Zynq-7000 All Programmable SoC tutorials by Adam

Taylor. A frequent contributor to Xcell Journal, Adam wrote

the cover story on Zynq SoC design in Issue 82 as well as

the article on XPE and XPA in this issue (see page 46).

He is also a blogger at All Programmable Planet. – Ed.

Page 42: Xcell Journal issue 83

quad SPI configuration, FSBL off-set and image length)

2. First-stage boot loader

3. Programmable-logic bit file

4. Software application (ELF file)for the processing-system side

Like all other Xilinx FPGAs, theZynq SoC device uses a number ofmode pins to determine which type ofmemory the program is stored within,

the nonvolatile memories supported.You configure it by means of a headercontained within the nonvolatile memo-ry. The header marks the start of theconfiguration image and is the firstthing the boot ROM looks for. The head-er defines a number of boot options thatthe boot ROM can implement, such asexecute in place (not for all memories,however), first-stage boot loader(FSBL) offset and secure configuration.This header process ensures the bootROM operates in the mode that is com-patible with how the configurationmemory has been formatted.

For designs, users have the optionof either secure or no-secure methodsof configuration. The boot ROM head-er supports and defines both modes.In secure configuration, the program-mable logic section of the device mustbe powered up as the hard macrosAES and SHA. You need both fordecryption and must place them inthe PL side of the device.

For the next stage of configuration,you will need to provide an FSBL thatwill configure the DDR memory andother peripherals on the processor asdefined in Xilinx Platform Studio(XPS) before loading the softwareapplication and configuring the pro-grammable logic. Overall, the FSBL isresponsible for four major tasks:

• Initializing the processing systemwith the information that XPSprovides

• Programming the PL side of theZynq SoC if a bit file is provided

• Loading either a second-stageboot loader (SSBL), if an operat-ing system is being used, or a bare-metal application into DDR

• Starting the execution of the SSBLor the bare-metal application

You program the PL side of the ZynqSoC via the Processor ConfigurationAccess Port (PCAP), which allowsboth partial and full configuration ofthe programmable logic. This meansyou can program the FPGA at any time

X P L A N A T I O N : F P G A 1 0 1

once the processing system is up andrunning. The PCAP also lets you readback and check for errors, if you areusing the device in an environmentwhere it may be subject to single-eventfunctional interrupts (SEFIs).

To create a bootable image for ourZynq SoC, you will need at least the fol-lowing functions:

1. Boot ROM header to control set-tings for the boot ROM (for exam-ple, execute in place, encryption,

Figure 1 – Creating the first-stage boot loader project

Figure 2 – Creating the FSBL from the template provided

42 Xcell Journal Second Quarter 2013

Page 43: Xcell Journal issue 83

X P L A N A T I O N : F P G A 1 0 1

along with other crucial system set-tings. These mode pins share theMultiuse I/O (MIO) pins on the PS sideof the device. In all, there are sevenmode pins mapped to MIO[8:2], withthe first four defining the boot mode.The fifth defines whether the PLL isused or not, and the sixth and seventhdefine the bank voltages on MIO bank0 and bank 1 during power-up. Thefirst-stage boot loader can change thevoltage standard defined on MIO

banks 0 and 1 to the correct standardfor the application; however, if you aredesigning a system from scratch, makesure the voltage used during power-upwill not damage the device connectedto these pins.

FIRST-STAGE BOOT LOADER CREATIONThe next step in making a boot imageis to create a first-stage boot loader.Thankfully, you don’t have to do this

manually—the FSBL that Xilinx hasprovided will load your applicationand configure your FPGA. You can, ofcourse, customize the code providedto alter the boot sequence as desiredfor your application. Within the cur-rent Software Development Kit (SDK)workspace—the one that containsyour project—create a new projectusing the new -> application project,as shown in Figure 1.

Choose whether you want to use Cor C++, pick the board support packagedefined for your system and name theproject—in this case, I used the namezynq_fsbl_0 for our example design.

On the next tab, select the ZynqFSBL option from the available tem-plates as shown in Figure 2, and yourFSBL project will be created. You arethen nearly ready to create the bootimage. If you have “compile” automat-ically selected, the FSBL will be com-piled; if not, it will be compiled ondemand later on.

However, we need to make achange to the linker script providedwith the FSBL, as it has no ideawhere the DDR memory is located inthe processor address space.Therefore, we need to open thelscrip.ld and add the DDR memorylocation into the file. The location ofthe DDR within the address spacecan be found in the linker script youhave created for your application(see cover story, Xcell Journal Issue82, for details on how to create thisscript). Otherwise, you can also findit in the system.xml file under thehardware platform definition.

Figure 3 shows the FSBL lscript.ldwith the DDR address for the systemadded in. If you forget to add it, youwill find that the boot loader will runand the FPGA will configure—but theapplication will not run if you haveconfigured it to execute in DDR.

GENERATING A CONFIGURATION IMAGELooking under the Xilinx SDK ProjectExplorer, you should hopefully now

Figure 4 – Finding the complete files needed for the boot image

Figure 3 – Defining the DDR address space

Second Quarter 2013 Xcell Journal 43

Page 44: Xcell Journal issue 83

0x0 in the dialog box that appears.Then, after making sure your targethardware is powered on, click on“program” (Figure 6).

It may take a few minutes for thedevice to be programmed and verified(if you ticked the verify option). Oncethe process is complete, power downyour hardware and reapply the power(alternatively, you can reset the proces-sor). Provided you have the mode pins

have the following four modules. Eachshould have a slightly different symbolidentifying the type of module. UsingFigure 4 as an example, you will seethe following structure.

1. proc_subsystem_hw_platform –Named after the processing sub-system you created, this is thehardware definition of your file.

2. zed_blog_0 – This is the boardsupport package you created.

3. zed_blog_C – This is the applica-tion itself.

4. zynq_fsbl_0 – This is the first-stage boot loader you have justcreated and modified the linkerscript for.

Since we are creating a bare-metalapplication (no OS), you will need threefiles to create a boot image, in this spe-cific order: first-stage boot loader; FPGAprogramming bit file; and C application.

Creating a boot image is very simplewithin the SDK using the “Create ZynqBoot Image” option under the Xilinxtools menu. Once you have selectedthe boot image, a new dialog box willopen, allowing you to choose the filesneeded, as seen in Figure 4.

It is important to stress that theFPGA bit file must always follow theFSBL. Clicking on “create image” willcreate both a *.bin and *.mcs file, whichyou can program into the target device.

PROGRAMMING THECONFIGURATION MEMORYYou can program your configurationmemory through either the SDK, usingthe “program flash” option under theXilinx tools options, or by using Xilinx’siMPACT programming tools. For thisexample, we will be storing the configu-ration file within quad SPI flash. TheQSPI interface is defined in the systemassembly view of Xilinx PlatformStudio. Hence, the hardware definitionwill contain the QSPI IP core and thelocation in the address map, allowingthe first-stage boot loader to interface tothis device and configure it.

X P L A N A T I O N : F P G A 1 0 1

The next stage in programmingthe device is to connect the ZynqSoC hardware system to your PCusing the JTAG programming cable.Programming the configurationmemory is then as simple as select-ing the “program flash” option fromthe Xilinx tools menu within theSDK, as shown in Figure 5.

Navigate and select the MCS fileyou generated and enter an offset of

Figure 6 – Programming the flash dialog

Figure 5 - Programming the configuration memory

44 Xcell Journal Second Quarter 2013

Page 45: Xcell Journal issue 83

X P L A N A T I O N : F P G A 1 0 1

Second Quarter 2013 Xcell Journal 45

configured correctly, your Zynq SoCsystem will spring to life, fully config-ured, and start to run the applicationsoftware and FPGA design.

ADVANCED CONFIGURATION:PARTITION SEARCH ANDMULTIBOOTThe example provided above walksyou through the configuration of asingle configuration image. However,it is possible for the boot ROM to per-form a partition search should the ini-tial image fail to load if the boot ROMchecks fail. If this is the case, the bootROM will look for another goldenimage located at a higher locationwithin the configuration memory. The

type of configuration memory youuse will impact how much of theaddress space the boot ROM search-es. If you are using an SD card,there is no partition search.

It is also possible for the ZynqSoC to load a different image fromthe configuration memory follow-ing a warm reset. To do this, youmust define the multiboot addressfor the boot ROM either throughthe FSBL or at a later stage withinthe second-stage boot loader orthe application. This allows theZynq SoC’s boot ROM following awarm reset to configure from thelocation toward which the multi-boot address points, allowing a dif-

ferent image from the initial one tobe loaded. Unlike in the partitionsearch, these images can be locatedin any order within the configura-tion memory. However, followingfrom power-on reset, the boot ROMwill load the first valid image it findswithin the configuration memory.

While different from the traditionalFPGA configuration approach, ZyncSoC configuration is not an arcaneart. Many engineers will be familiarwith the two-pronged methodology,and will find it very easy to generate aboot image and program the configu-ration memory using the tools provid-ed within the Xilinx SoftwareDevelopment Kit.

Page 46: Xcell Journal issue 83

46 Xcell Journal Second Quarter 2013

XPLANATION: FPGA 101

Using Xilinx’s PowerEstimator and Power Analyzer Tools

Page 47: Xcell Journal issue 83

F PGAs are unlike many classesof components in that thepower they will require on their

core, auxiliary and I/O voltagesdepends upon the implementation ofthe design. Determining the power dis-sipation of the FPGA in your applica-tion is thus a little more complicatedthan just reading the datasheet. It cantherefore be challenging to ensure youhave the correct power architecture—one that takes into account not only therequired quiescent currents, ramp ratesand sequencing, but also has the abilityto suitably power the end applicationwhile remaining within the acceptablejunction temperature of the device.

What are XPE and XPA? These aretwo design tools provided by Xilinxthat enable you to obtain accuratepower analysis of your FPGA design.You will use the spreadsheet-basedXilinx® Power Estimator (XPE) in theearly stages of your design and turn toXilinx Power Analyzer (XPA) aftercompleting the implementation. XPAallows you to perform further analysison the power requirements of yourdesign and assists in power optimiza-tion should it be required.

INITIAL STEPSWhen you first launch a developmentproject, it is rare that the completeFPGA design will exist (if you arelucky you may get to reuse some code,

which will provide more accurateinformation for the power prediction).Therefore, the starting point for yourpower estimation will be the XilinxPower Estimator spreadsheet (seehttp://www.origin.xilinx.com/prod-

ucts/design_tools/logic_design/xpe.htm).Your first power estimation will bebased on the engineering team’s initialthoughts on the number of clocks,logic and additional resources that thedesign will require.

Using XPE is very straightforward.Even better, the tool allows you to doplenty of “what if” implementations todetermine the impact of variousdesign choices on your power estima-tion. If your solution is power hungry,this capability can prove to be veryimportant in helping you find the opti-mal implementation

The ability of XPE to predict thejunction temperature of the devicedependent upon the heat sinking, theairflow and the number of layers with-in the printed-circuit board is very use-ful at an early stage too. It can showwhether your design can achieve de-rated junction temperatures for theintended implementation.

Within XPE, your first step is tocomplete the settings tab as accu-rately as you can. Along with select-ing the device, pay particular atten-tion that the package, speed gradeand temp grade are correctly set—

Second Quarter 2013 Xcell Journal 47

X P L A N A T I O N : F P G A 1 0 1

Obtaining an accurate power estimation of your FPGA design can be critical for ensuring you havethe correct power architecture.

by Adam TaylorPrincipal EngineerEADS [email protected]

Page 48: Xcell Journal issue 83

responsible for power engineering—informed of any change in the estima-tion. The power engineers should beable to provide worst-case maximumvoltages for the supply rails, which canfurther improve the accuracy of yourestimation. A higher worst-case maxi-mum supply voltage will increase thepower dissipation.

XPE is intelligent. It will color thevoltage cells on your spreadsheetorange and issue a warning if theworst-case maximum supply voltage isoutside of the acceptable tolerance forthe device, as shown in Figure 2.

likewise stepping, process andpower mode, if applicable. All ofthese parameters will have a consid-erable impact on the overall powerrequired—especially the process,which has the settings of “maxi-mum” or “typical.” The typical set-ting will give you the power yourapplication would statistically see,while the maximum will give theworst case. It is important to ensurethe power solution can handle themaximum case, but this can becomechallenging with larger devices thathave higher current requirements.

You can also define the operatingenvironment here as well, with theambient temperature, heat sinkingand airflow. Defining the maximumambient temperature at this point willprovide a more accurate power esti-mation, because the power requiredwill increase as the device junctiontemperature rises. Including heatsinking, airflow or both will thereforelower your power estimation.

Also at this stage, don’t overlookthe Xilinx ISE® optimization setting.This setting will also have an impacton the power estimation, as differentoptimization schemes—for example,timing performance against areaminimization—will result in varyingimplementations, each with its ownresource usage and fan-out. They toowill affect the power estimation.

The next stage of the estimationis to go through the tabs at the bot-tom of the XPE spreadsheet and fillin details on your proposed solutionas accurately as possible. To ensurethe most accurate estimation at anearly stage, it is very important todefine at least the resource usage,clock frequencies, toggle rate andenable rate. It is also a good idea toinclude a level of contingency toaddress the ever-present require-ments creep.

This process will result in XPEproviding an overall power and junc-tion temperature estimation on thesummary tab, like the one in Figure 1.

48 Xcell Journal Second Quarter 2013

X P L A N A T I O N : F P G A 1 0 1

Once your estimation is complete,you can, if you wish, export the set-tings from XPE for use later within theXilinx Power Analyzer (XPA) by usingthe “export file” option on the summa-ry page. This ensures the settings inXPA are aligned with those you initial-ly used for the estimate.

As the development progressesthrough RTL production, trail synthe-sis and place-and-route, it is importantto keep the estimation up to date asmore-accurate information becomesavailable. Remember also to keep thehardware team—especially the group

Figure 1 – Summary of the XPE device settings and power estimation

Figure 2 – The warning issued when your power supply voltage is outside of acceptable tolerance

Page 49: Xcell Journal issue 83

X P L A N A T I O N : F P G A 1 0 1

Second Quarter 2013 Xcell Journal 49

MOVING ON TO XPAOnce you have implemented thedesign, it is possible to obtain a muchmore accurate estimate of its powerpicture using the Xilinx PowerAnalyzer. How accurate this result willbe depends upon the inputs you givethe tool. You can open XPA by clickingon Analyze Power Distribution underthe Place and Route Options withinthe processes window of the ISEdesign suite (see Figure 3).

Once XPA opens, you will again bepresented with a summary screensimilar to that in XPE (Figure 4). Thisis the place where you can define theenvironment and either create addi-tional settings or import the settingsfrom your XPE analysis.

To include the .xpa file or open anew project, you will need to use thefollowing flow, which will include:

• A Native Circuit Design (NCD)file that defines the physicalFPGA implementation

• A Settings File, to define the set-tings imported from XPE

• A Physical Constraints File(PCF), which contains the clock-ing information and mapping andUCF constraints

• A simulation file, either a ValuesChange Dump (VCD) or SwitchingActivity Interchange File (SAIF),which you can generate with yoursimulation tool, allowing XPA toobtain the switching informationof the design

Naturally, the more information youinclude upfront, the more accurate thepower estimation will be. Helpfully,XPA provides a confidence level on itsestimation—either low, medium orhigh—for five categories: designimplementation state, clock nodesactivity, I/O nodes activity, internalnodes activity and device models.

These individual confidence levelscontribute to the overall confidencelevel of the estimation. Again, ratherhelpfully, XPA will provide sugges-

Figure 3 – Opening XPA from ISE

To ensure you obtain the most accurate power estimation possible, here is a checklist that covers the most important steps.

1. Start with the correct environmental setting—stepping, speed grade,heat sinking, airflow and ambient temperature.

2. Ensure the process is set correctly to “typical” or “maximum” (for worst case).

3. Set the supply voltages to their worst-case maximum.

4. Make sure you have the most accurate toggle rate, fan-out, enable rate,I/O properties and clock rate.

5. Use a simulation VCD/SAIF file to provide the best information on thedesign performance within XPA. To obtain the best results, this fileneeds to be a gate-level simulation.

6. Keep the power estimation up to date as both the hardware and FPGAdesigns progress, so revisit it often.

7. Include contingency within your power estimation to address requirements creep in your application.

8. Be sure to define termination schemes and load capacitance correctly for the I/O in both XPA and XPE.

9. Make sure the confidence level reported by XPA is high.

10. Ensure the power solution meets all the other power requirements for the device and for reliability, and is not running at or near its load capacity.

10 Steps to Accurate Power Estimation

Page 50: Xcell Journal issue 83

When you exit the simulation, theresults will be saved into the VCDfile, which you can then use withXPA. If you are using ISim, the formatis a little different:

vcd dumpfile myvcd.vcd—define the file name andlocation if desired

vcd dumpvars -m/memory_if_tb/uut_apply

—the region of the designthat is to be recorded

Run Simulation

vcd dumpflush —save results to the vcdfile created

There are many more-advanced com-mands within the simulation tools thatyou can use to generate a VCD; theseare detailed within the tool documenta-tion itself.

Having included the simulation VCDfile, the level of confidence of yourpower estimation should increase. Ifyou cannot provide a simulation result(some can take a long time to run),then XPA will run using its internalanalysis engine. For this to be accu-rate, it is important you again specifythe toggle and enable rates.

Just as in XPE, it is also important toinclude the worst-case maximum sup-ply voltages within XPA. It is possibleas well to export the XPA design backinto XPE to allow the engineering teamto try out more experimental changesand determine their effect on the powerand junction temperature estimates.

WHAT IF IT IS NOT OK?Having included a simulation and pro-vided the rest of the information, youwill have achieved a high level of confi-dence in the power estimate and pre-dicted junction temperature. In anideal world, this will hopefully beacceptable for the application.However, should that not be the case,what other options are open to the

tions on how to increase the confi-dence level of each individual category,hence boosting the overall confidencelevel (Figure 5).

To obtain the highest level of confi-dence in the power estimation, youare required to include a VCD or SAIFfile from a gate-level simulation. Thisallows XPA to understand the behav-ior of the internal nodes, hence pro-viding a better estimate.

Obtaining the VCD from your sim-ulation is pretty straightforward, butdepending on the tool you are using

50 Xcell Journal Second Quarter 2013

X P L A N A T I O N : F P G A 1 0 1

(Mentor Graphics’ ModelSim, Xilinx’sISim, etc.), the format of the com-mands required might be slightly dif-ferent. If you are using ModelSim,you can create a simple VCD file withthe following syntax:

vcd file myvcd.vcd —define the file name andlocation if desired

vcd add /memory_if_tb/uut_apply/*

—the region of the designthat is to be recorded

Figure 4 – The XPA summary screen

Figure 5 – XPA confidence-level report and suggested actions to improve it

Page 51: Xcell Journal issue 83

X P L A N A T I O N : F P G A 1 0 1

Second Quarter 2013 Xcell Journal 51

engineer to ensure your design hasachieved the available power budgetor required junction temperature?

To obtain the desired results, thereare three places where we can affectthe design: within the source, withinthe implementation or (in the case ofjunction temperature) within thephysical-module design.

Within the design source, you cantake the following steps:

1. Remove any asynchronous resets.

2. Include pipeline stages betweenlarge combinatorial functions.

3. Ensure you are using dedicatedresources available within thedevice by either inference orinstantiation.

The options available within synthesisand place-and-route are:

1. Set the optimization scheme forarea minimization or powerreduction.

2. Examine the design constraintsto ensure they are not overly con-straining the device.

3. Limit the fan-out of high-fan-outpaths.

4. Utilize RAMs for things like statemachines wherever possible.

5. Ensure retiming is enabled.

Hopefully, the above steps will reduceboth the estimated power and the pre-dicted junction temperature. However, ifnecessary you can reduce the junction

temperature by using heat sinks orforced airflow, should the above not besufficient. Doing so may, however, affectthe mechanical design of the project.

ACCURATE ESTIMATESYou can use Xilinx Power Estimator andXilinx Power Analyzer to obtain accuratepower estimations for an FPGA design,and information can be easily sharedbetween the two tools. The accuracy ofthe power estimate will depend on thequality of data from both the simulationand your power-engineering team.

You should also provide sufficientcontingencies to account for require-ments creep and to ensure the solu-tion is capable of addressing thepower on currents and ramp rates ofthe voltage supplies.

Page 52: Xcell Journal issue 83

52 Xcell Journal Second Quarter 2013

TOOLS OF XCELLENCE

A tiny GPS disciplined oscillator from Symmetricom can easily substitute for an atomic clock in your system design. Pair it with aXilinx Zynq SoC to build an NTP server.

What Time Is It?

Page 53: Xcell Journal issue 83

Second Quarter 2013 Xcell Journal 53

What if you need to know what time itis, or what frequency you have, orneed to provide a precise frequencyin your system? A few months back Iobtained a sample of a small assem-bly, measuring roughly 1 x 1 x 0.25inch, that seems to perform magic: aGPS disciplined temperature-com-pensated crystal oscillator (GPS-TCXO) from Symmetricom.

The evaluation board has anRS232-to-USB chip on the mother-board that converts the interfaceto/from the TCXO and its GPS receiv-er device. Installing the software tosupport the USB is probably the onlyreal effort required, other than find-ing a place to stick the magnet baseso that the GPS antenna can viewsome satellites in the sky.

Cellular basestations, radio systemsand computer systems all requireeither precise time, or precise frequen-cies, to operate. One big advantage ofusing a fixed GPS system is that,because it isn’t moving, it gives a bet-ter frequency reference, a more pre-cise location and more precise timeinformation than a system that is notfixed. If you have a mobile applicationsome of that precision gets lost, but byusing the TCXO you still wind up witha really nice system.

WHAT IS PRECISE?An atomic clock is not really a clock,but an atomic oscillator with an activeor passive maser. A cesium clock isthe typical frequency source used forwhat is known as a “primary refer-ence.” It provides an accurate and pre-cise frequency, which varies by no

more than 1E-11. Typical cesium refer-ences operate down to perhaps 1E-12,but have a wander of about +/-100nanoseconds associated with them (butno drift). Such a reference is also some-times referred to as a Stratum 1 clock.

The GPS constellation consists ofmany such precise clocks in orbit(mostly rubidium sources that do notwander, but do drift a tiny bit), steeredand corrected periodically by theworld’s stable time sources. Some 200of the world’s most accurate clocksget “voted” in a complex mathematicalprocess, and then we all agree whattime it was some 30 days ago andapply corrections.

Thus, a 10-MHz temperature-com-pensated crystal oscillator steered bythe GPS constellation becomes a pre-cise Stratum 1 time/frequency/positionreference while locked to the satel-lites, and when the satellites are lost, itthen drifts. However, even this driftmay be anticipated and canceled if youhave a history of the drift and knowl-edge of your temperature (the temper-ature compensation is not perfect, andis easily observed in operation).

NOW, FOR SOME REALLY SMALL NUMBERSAfter a month of operation pluggedinto my desktop machine at work and,from time to time, into a laptop athome, my little board settled into itsroutine. When it first turned on, it waswithin +/-1E-10, but after a few weeksthe inherent drift of my unit was clear-ly visible, about 3.031E-12, which isbeing compensated for, so the deliv-ered precision is less than 3E-12 (Allanvariance[v] for ~6 hours; the Allan, ortwo-sample, variance is a measure offrequency stability in clocks, oscilla-tors and amplifiers). See Figure 1.This is about three times better than acesium clock, which wanders about abit more but would have zero driftunless it were broken. The drift of aTCXO is not constant, because crys-tals do age. So the drift is now down to3.029E-12 after six months.

T O O L S O F X C E L L E N C E

Wby Austin LeseaPrincipal EngineerXilinx [email protected]

Page 54: Xcell Journal issue 83

The pulse-per-second (1-PPS) out-put had a six-month set of equallyimpressive statistics: a median of -397picoseconds (it is averaging just under400 ps of late, which could be removedas a fixed delay) with a variance of1.72E-16.

Such levels of precision easily allowyou to measure the effect of gravityand velocity on time (both of whichthe GPS satellites correct for as theyorbit around the Earth, affected byboth gravity and acceleration).

The delivered frequency is remark-ably stable and constant while it islocked, which is almost 100 percent ofthe time. Why does it not stay lockedall the time? My window faces south-east, so all satellites directly to thenorthwest are blocked from view ofthe antenna. Even so, the time spentwith no satellites in view is only a fewminutes a day and has little to noeffect on the performance.

The receiver needs a minimum offour satellites in view to operate, and itusually sees 10 or 11 in a clear skyview. If the GPS signal is lost, I willstart to see the intrinsic drift of myunit: ~3.03E-12. Since every crystal isdifferent, your unit will have its owndrift. Now that I know that drift isthere, I could modify the 10-MHz out-put with an external synthesizer with areverse correction while the satellitesare not being tracked. I could thenturn off that correction when the satel-lites come back in view.

WHAT ELSE?I am also able to download the latitude,longitude and altitude, and form a long-term running average. After six months,you could say that I really know exact-ly where my antenna is located! Of

54 Xcell Journal Second Quarter 2013

T O O L S O F X C E L L E N C E

In the event that you cannot acquire any satellites,there is another alternative: the Network Time Protocol.

NTP provides time stamping over the Internet.

Figure 2 – Chart demonstrates the spectral purity of the outputs

Figure 1 – The Allan variance from a test run of another unit

Page 55: Xcell Journal issue 83

course, I can see it from my desk, soperhaps I should say I can locate it withreference to WGS84, the WorldGeodetic System standard, on GoogleEarth. If I were in a mobile application,the receiver would also report velocity.

If I needed to make precise meas-urements, I would use the 10 MHz as areference. Most high-end lab equip-ment has a back-panel input for anexternal 10-MHz reference, so theinstruments are able to lock to thatreference, and thus all their measure-ments are traceable back to the inter-national standard.

If I were building a frequencymeter, counter or RF signal source, Iwould put such a module in there,and pretty much forget about everhaving to calibrate it. Of course, you

have to occasionally get a view ofthe sky or place the instrument in thewindow so it can see as many satel-lites as possible. The spectral purityof the outputs is also impressive, asseen in Figure 2 (also from a testreport of another unit).

NO GPS? WHAT THEN?In the event that you cannot acquireany satellites, there is another alter-native: the Network Time Protocol.NTP provides time stamping over theInternet so that you may measuretime or frequency, or control a fre-quency source. So, in a future fre-quency meter one could envision awireless 802g modem for theInternet, a GPS disciplined oscillatorand the necessary control electronics

(perhaps a Xilinx® Zynq™-7000 AllProgrammable SoC device). Such aninstrument would always be accurateand would always know just how accu-rate it was. It would be able to diag-nose itself and report whether it hadany problems. I need a view of the skyfor an hour or two, please….

You could port an NTP server to theZynq SoC platform, as there is publiccode available. Running an NTP serveron one of the ARM® cores in the Zynqdevice under a Linux operating systemis a fairly straightforward task.

For more on the SymmetricomTCXO, see http://www.symmetri-

com.com/. Additional information onthe Zynq-7000 SoC is available athttp://www.xilinx.com/products/silicon-

devices/soc/zynq-7000/index.htm.

T O O L S O F X C E L L E N C E

Second Quarter 2013 Xcell Journal 55

Page 56: Xcell Journal issue 83

56 Xcell Journal Second Quarter 2013

XAPP897: DESIGNING VIDEO STREAMING SYSTEMSUSING ZYNQ-7000 AP SOC WITH FREERTOShttp://www.xilinx.com/support/documentation/applica-

tion_notes/xapp897-video-streaming-system-freertos.pdf

This application note by Dinesh Kumar describes how tocreate video systems for Digital Visual Interface (DVI)input and video test pattern generator (TPG) input usingXilinx® native IP in the Zynq™-7000 All Programmable(AP) SoC. The reference design, which is targeted for theZC702 evaluation board, configures video IP cores with aprocessing frame rate of 60 Hz and 1,920 x 1,080 resolu-tion, and displays system-level bandwidth utilization andvideo latency as metrics. In this way, designers can createcomplex, high-performance video systems using the Zynq-7000 AP SoC, with one input from the DVI and one inputfrom the TPG.

This application note demonstrates the application ofFreeRTOS, one of the two recommended operating systemsfor the Zynq-7000 AP SoC (Linux is the other). FreeRTOS isa free operating system that consists of only a few files. It iseasy to port, use and maintain. FreeRTOS supports multiplethreads or tasks, mutexes, semaphores and software timers.In the reference design, the main application runs in one ofthe FreeRTOS threads while another thread is created togradually change the transparency of the on-screen display(OSD) to show the blending effect.

The design uses two AXI video direct-memory access(VDMA) cores to simultaneously move four streams (twotransmit video streams and two receive video streams),each in a 1,920 x 1,080 frame size at 60 frames/second and24 data bits per pixel (RGB). A TPG with a video timingcontroller (VTC) block drives one VDMA, while incomingvideo from DVI-In drives the other. The data from bothstreams to memory map (S2MM) paths of the VDMA coresare buffered into DDR, read back by the MM2S channel ofthe AXI VDMA and sent to a common OSD core that mul-

tiplexes or overlays multiple video streams to a single out-put video stream. The output of the OSD core drives theonboard HDMI video display interface through the colorspace converter.

The reference design is created with the Xilinx PlatformStudio (XPS) in the Vivado™ System Edition 2012.4.Software created with the Xilinx Software Development Kitruns on the ARM® dual-core processor and implements con-trol, status and monitoring functions. The reference designhas been fully verified and tested on hardware.

XAPP1078: SIMPLE AMP RUNNING LINUX AND BARE-METAL SYSTEM ON BOTH ZYNQ SOC PROCESSORShttp://www.xilinx.com/support/documentation/applica-

tion_notes/xapp1078-amp-linux-bare-metal.pdf

The Zynq-7000 All Programmable SoC contains two ARMCortex™-A9 processors that can be configured to concur-rently run independent software stacks or executables. Thisapplication note describes a method of starting up bothprocessors, each running its own operating system andapplication, and allowing the two to communicate throughshared memory.

The Zynq-7000 SoC’s Cortex-A9 processors share com-mon memory and peripherals. Asymmetric multiprocess-ing (AMP) is a mechanism that allows both processors torun their own operating systems or bare-metal applica-tions with the possibility of loosely coupling those appli-cations via shared resources. The reference designincludes the hardware and software necessary to run bothCortex-A9 processors in an AMP configuration: CPU0runs Linux and CPU1 runs a bare-metal application.Author John McDougall has taken care to prevent theCPUs from conflicting on shared hardware resources.This document also describes how to create a bootablesolution and how to debug both CPUs.

XAMPLES. . .

Application NotesIf you want to do a bit more reading about how our FPGAs lend themselves to a broad number of applications,we recommend these application notes.

Page 57: Xcell Journal issue 83

Second Quarter 2013 Xcell Journal 57

XAPP890: ZYNQ ALL PROGRAMMABLE SOC SOBEL FILTER IMPLEMENTATION USING THE VIVADO HLS TOOLhttp://www.xilinx.com/support/documentation/applica-

tion_notes/xapp890-zynq-sobel-vivado-hls.pdf

This application note describes how to generate the Sobeledge-detection filter in the Zynq-7000 All Programmable SoCZC702 Base Targeted Reference Design (TRD) using theVivado high-level synthesis (HLS) tool. The techniquesdescribed by authors Fernando Martinez Vallina, ChristianKohn and Pallav Joshi present the fundamental flow for inte-grating an IP block generated by Vivado HLS into a ZynqSoC-based system.

The Vivado HLS tool provides a methodology for migrat-ing algorithms from a processor onto the FPGA logic. In thecontext of Zynq devices, this means moving code from theARM dual-core Cortex-A9 processor to the FPGA logic foracceleration. The code implemented with the HLS tool in hard-ware represents the computational bottleneck of the algo-rithm. For this application note, the computational bottleneckis the Sobel edge-detection algorithm running at 60 frames persecond on a resolution of 1080p. The authors describe how totake a C description of the algorithm, generate RTL with theHLS tool and integrate the resulting block into a hardware sys-tem design. While this application note focuses on the Sobel IPblock, the techniques described apply to all designs in which aVivado tool-generated IP block is integrated into a Zynq SoC.

XAPP896: SMPTE2022-5/6 HIGH-BIT-RATE MEDIATRANSPORT OVER IP NETWORKS WITH FORWARDERROR CORRECTION ON KINTEX-7 FPGAShttp://www.xilinx.com/support/documentation/applica-

tion_notes/xapp896-k7-smpte2022-56-fec.pdf

This application note by Gilbert Magnaye, Josh Poh, MyoTun Aung and Tom Sun covers the design considerations ofa video-over-IP network system using the performance fea-tures of the LogiCORE™ IP SMPTE 2022-5/6 video-over-IPtransmitter and receiver cores. The design focuses on high-bit-rate native media transport over 10-Gbit/second Ethernetwith a built-in forward error correction engine. The design isable to support up to three SD/HD/3G SDI streams.

The reference design has two platforms: transmitter andreceiver. The transmitter platform design uses threeLogiCORE SMPTE SDI cores to receive the incoming SDIvideo streams. The SMPTE 2022-5/6 video-over-IP transmit-ter core receives the SDI streams, multiplexes them andencapsulates them into fixed-size datagrams before send-ing them out through the LogiCORE IP 10-Gigabit EthernetMAC. On the receiver platform, the Ethernet datagrams arecollected at the 10-Gigabit Ethernet MAC.

The DDR traffic passes through the AXI interconnect tothe AXI 7 series memory controller. A MicroBlaze™processor is included in the design to initialize the coresand read the status. The reference design targets theXilinx Kintex™-7 FPGA KC705 evaluation kit.

XAPP596: 4K2K UPCONVERTER REFERENCE DESIGNhttp://www.xilinx.com/support/documentation/applica-

tion_notes/xapp596-4k2k-up-converter-ref-design.pdf

In the digital display market, the next innovation wave—ultrahigh-definition (UHD) 4K2K—is now emerging. Gettingto market faster than the competition with 4K2K viewingexperiences is the challenge for product developmentdesigners. Xilinx Kintex-7 FPGA Display Targeted ReferenceDesigns give designers immediate access to the power effi-ciency and price/performance of 28-nanometer 7 seriesFPGA devices for efficiently managing increased bandwidthand algorithm complexity. An upconverter is among threeXilinx reference designs that will enable customers to con-centrate on product differentiation by providing basic infra-structure for 4K2K digital display signal processing.

The HDTV-to-4K2K upconverter reference designdescribed in this application note by Yasushi Tatehiraenables upconversion from 1080p high-definition TV to4K2K progressive images. The upconversion results inshowing HDTV content, which is very popular in broadcast-ing and packaged media, on a 4K x 2K flat-panel display. Thereference design is built from two pieces of LogiCORE IP:the video scaler and the on-screen display (OSD).

This reference design, which is part of the Kintex-7FPGA-based Display Targeted Design Platform, makes itpossible to show HDTV content on a 4K2K monitor or4K2K flat panel, providing design engineers with the basefor their product designs. With this foundation address-ing the mandatory upconversion, engineers can focustheir efforts on differentiating their 4K2K digital TVs, dis-plays and projectors.

XAPP891: AXI USB 2.0 DEVICE:DEMONSTRATING PERFORMANCE FOR BULKAND ISOCHRONOUS TRANSFERShttp://www.xilinx.com/support/documentation/applica-

tion_notes/xapp891-7series-axi-usb-2-0.pdf

This application note demonstrates the performancemeasurement of the Xilinx USB 2.0 high-speed devicewith AXI for bulk and isochronous transactions. The testsystem generated is based on a Kintex-7 FPGA. The per-formance is measured with two separate host drivers forbulk and isochronous transactions. Authors Ravi KiranBoddu and Dinesh Kumar describe how to develop a

Page 58: Xcell Journal issue 83

58 Xcell Journal Second Quarter 2013

USB system and corresponding ELF files for these bulkand isochronous transactions.

The AXI USB 2.0 device enables USB connectivity for adesign using minimal resources. This interface is suitablefor USB-centric, high-performance designs, bridges andlegacy port replacement operations. The USB 2.0 protocolmultiplexes many devices over a single, half-duplex serialbus. Running at 480 Mbits/second (high speed) or at 12Mbps (full speed), the AXI USB 2.0 device is designed to beplug-and-play. The host controls the bus and sends tokens tothe device specifying the required action.

The AXI USB 2.0 device supports up to eight end-points. Endpoint 0 is used to enumerate the device withcontrol transactions. The seven remaining user endpointscan be configured as bulk, interrupt or isochronous. Also,endpoints can be configured as input (to the host) or out-put (from the host). The user endpoints’ data buffers areunidirectional and are configured by the configuration-and-status register of the respective endpoint. The size ofthe buffers can be configured from 0 to 512 bytes forbulk, 64 bytes for interrupt and up to 1,024 bytes forisochronous endpoints.

XAPP554: XADC LAYOUT GUIDELINEShttp://www.xilinx.com/support/documentation/applica-

tion_notes/xapp554-xadc-layout-guidelines.pdf

The Xilinx analog-to-digital converter (XADC) is a precisionmixed-signal measurement system. As with any othermixed-signal/analog circuitry, a suboptimum printed-circuitboard (PCB) layout can have a big impact on performance.In this application note, author Cathal Murphy details anumber of simple layout guidelines to help a board designerachieve the best possible performance from the XADC.

The first step is to stagger the via associated with eachBGA ball to maximize the routing space for differential sig-nals and shields. Then:

• Route the N and P lines together at the minimum PCBspacing where possible, to maximize the benefits of dif-ferential sampling that the XADC offers.

• Place all decoupling and anti-aliasing capacitors asclose to the FPGA pins as possible.

• Minimize the impedance on the power supply andground lines.

• Minimize any common impedance between the RefNline and the GNDADC of the XADC, preferably using astar connection.

Following these guidelines will maximize the probability ofachieving excellent XADC results on the first board iteration.

XAPP1077: PHYSICAL LAYER HDMI RECEIVER USING GTP TRANSCEIVERShttp://www.xilinx.com/support/documentation/applica-

tion_notes/xapp1077-phy-hdmi-rx-gtp.pdf

The High-Definition Multimedia Interface (HDMI) I/Ostandard utilizes 3.3-volt terminated transition-minimizeddifferential signaling (TMDS). Although TMDS signalingcan be received natively using the Spartan®-6 FPGASelectIO™ interface, the GTP transceivers can increaseperformance. The focus of this application note is on tech-niques to enable systems using the higher-bandwidthcapability of GTP receivers.

It is possible to use an external passive network toadapt the GTP transceivers to receive a signal that com-plies with HDMI technology. Authors Paolo Novelliniand Rodney Stewart present, analyze and compare twoalternative passive networks from both a theoreticaland a practical point of view. The authors tested bothnetworks on a custom development board to exploretheir signal integrity limits and to confirm theoreticalexpectations. Although the results are general, the chip-to-chip use case is the one they primarily consider inthis application note. The target data rate used for HDMIis 1.45 Gbps.

XAPP586: USING SPI FLASH WITH 7 SERIES FPGAShttp://www.xilinx.com/support/documentation/applica-

tion_notes/xapp586-spi-flash.pdf

This application note by Arthur Yang describes the advan-tages of selecting a serial peripheral interface (SPI) flashas the configuration memory storage for the Xilinx 7series FPGAs. The document includes the required con-nections between the FPGA and the SPI flash memory,and the details necessary to select the proper SPI flash.

The SPI flash is a simple, low-pin-count solution forconfiguring 7 series FPGAs. Support of indirect program-ming enhances ease of use by allowing in-system pro-gramming updates through reuse of connections alreadyrequired for the configuration solution. Although someother configuration options permit faster configurationtimes or higher density, the SPI flash solution offers agood balance of speed and simplicity.

XAPP797: THROUGHPUT PERFORMANCEMEASUREMENThttp://www.xilinx.com/support/documentation/applica-

tion_notes/xapp797.pdf

This application note discusses the SPI bandwidth meas-urement for 1 Mbyte of data, writing and reading from the

XAMPLES. . .

Page 59: Xcell Journal issue 83

Second Quarter 2013 Xcell Journal 59

SPI flash in the Enhanced Quad mode of the AXI Quad SPIIP core. The document is based on using the KC705 boardwith Numonyx SPI memory, which, with a few modifica-tions in the software example files, can be tested on anyother board.

Authors Sanjay Kulkarni and Prasad Gutti show howto measure the performance of the system by writingand reading 1 Mbyte of data to and from SPI flash. Thesesystems are built using Xilinx Platform Studio (XPS),v14.4, which is part of the ISE® Design Suite SystemEdition. The design also includes software, built usingthe Xilinx Software Development Kit, that runs on aMicroBlaze processor subsystem and implements con-trol, status and monitoring functions. The main focus ofthis application note is to measure the SPI bandwidthwhere the core is configured in Quad SPI mode with anSPI clock rate of 40 MHz.

XAPP1084: DEVELOPING TAMPER-RESISTANTDESIGNS WITH XILINX VIRTEX-6 AND 7 SERIES FPGAShttp://www.xilinx.com/support/documentation/applica-

tion_notes/xapp1084_tamp_resist_dsgns.pdf

Keeping one step ahead of the adversary, whether mili-tary or commercial, is a continuous process that involvesunderstanding the potential vulnerabilities and attacks,and then developing new mitigation techniques or coun-termeasures to combat them. By taking advantage of var-ious Xilinx FPGA anti-tamper (AT) features, a systemsengineer can choose how much AT to include with theFPGA design, enabling individual silicon AT features orcombining a number of them.

This application note provides anti-tamper guidanceand practical examples to help the FPGA designer pro-tect the intellectual property (IP) and sensitive data thatmight exist in an FPGA-enabled system. Tamper resist-ance needs to be effective before, during and after theFPGA has been configured by a bitstream. Sensitive datacan include the configuration data that sets up the func-tionality of the FPGA logic, critical data or parametersthat might be included in the bitstream, along with exter-nal data that is dynamically brought in and out of theFPGA during post-configuration normal operation.

Author Ed Peterson summarizes the silicon AT fea-tures available in the Virtex-6 and 7 series devices andoffers guidance on various methods you can employ toprovide additional tamper resistance.

XAPP739: AXI MULTI-PORTED MEMORY CONTROLLERhttp://www.xilinx.com/support/documentation/applica-

tion_notes/xapp739_axi_mpmc.pdf

Designers use a multiported memory controller (MPMC) inapplications where several devices share a common memo-ry controller. This is often a requirement in many video,embedded and communications applications, where datafrom multiple sources moves through a common memorydevice, typically DDR3 SDRAM memory. This applicationnote by Khang Dao and Dylan Buli demonstrates how to cre-ate a basic DDR3 MPMC design using the ISE Design SuiteLogic Edition tools, including Project Navigator and COREGenerator™. The idea is to create a high-performanceMPMC by combining the Memory Interface Generator (MIG)IP block and the AXI Interconnect IP block, both provided inthe ISE Design Suite Logic Edition.

The AXI interfaces used in this example design consist ofAXI4, AXI4-Lite and AXI4-Stream, all of which provide a com-mon IP interface protocol framework for building the sys-tem. The example design, a full working hardware system onthe Virtex-6 FPGA ML605 evaluation platform board, imple-ments a simple video system in which data from a video testpattern generator loops in and out of memory multiple timesbefore being sent to the DVI display on the board. The DDR3memory therefore acts as a multiported memory shared bymultiple video frame buffers.

XAPP593: DISPLAYPORT SINK REFERENCE DESIGNhttp://www.xilinx.com/support/documentation/applica-

tion_notes/xapp593_DisplayPort_Sink.pdf

This application note by Arun Ananthapadmanaban andVamsi Krishna describes the implementation of aDisplayPort™ sink core and policy maker referencedesign targeted for a MicroBlaze processor in the Spartan-6FPGA Consumer Video Kit. The reference design is a loop-through system that receives video from a DisplayPortsource via the receive link, buffers the video data andretransmits it over the DisplayPort transmit link. The pol-icy maker performs several tasks, such as initialization ofGTP transceiver links, register probing and other featuresuseful for bring-up and use of the core. The applicationcontrols both the sink and source of the reference designand communicates with the monitor (sink) connected onthe transmit port of the reference design using the auxil-iary channel. The reference design uses DisplayPortsource and sink cores generated from the Xilinx COREGenerator tool, along with a policy maker and framebuffer logic using external memory.

Page 60: Xcell Journal issue 83

T he Xilinx® Alliance Pro-

gram is a worldwide

ecosystem of qualified

companies that collaborate with

Xilinx to further the develop-

ment of All Programmable tech-

nologies. Xilinx has built this

ecosystem, leveraging open

platforms and standards, to

meet customer needs and is

committed to its long-term suc-

cess. Alliance members—includ-

ing IP providers, EDA vendors,

embedded software providers,

system integrators and hard-

ware suppliers—help accelerate

your design productivity while

minimizing risk. Here are reports

from five of these members.

CUSTOM VISIONALGORITHMS FROMMATHWORKS

MathWorks (Natick, Mass.), providerof the MATLAB® and Simulink®

tools, offers several add-ons integralfor use with Xilinx hardware forembedded vision. The company’sComputer Vision System Toolboxprovides algorithms and tools for thedesign and simulation of computer-vision and video-processing systems.HDL Coder generates portable, syn-thesizable Verilog and VHDL codefrom MATLAB functions, Simulinkmodels and Stateflow charts. HDLVerifier automates the verification ofHDL code on Xilinx FPGA boards by

enabling FPGA-in-the-loop (FIL) test-ing. In addition, MathWorks recentlyreleased a hardware support packagefor working with the Xilinx Zynq™-7000 All Programmable SoC.

“Smarter vision applications areconstantly evolving, adding new fea-tures like analytics, recognition andtracking,” said Ken Karnofsky, seniorstrategist at MathWorks “These fea-tures require improved system per-formance and greater design flexibilityfor hardware and software implemen-tation. Over the last few years,MathWorks and Xilinx have enabled

custom visiona l g o r i t h m sdeveloped withMATLAB andSimulink to beeasily integrat-ed with avail-able IP andimplementedon Xilinx hard-ware, including

the Zynq-7000 SoC for hardware/soft-ware coprocessing. This workflowaccelerates development by bridgingthe divide between algorithm explo-ration and system implementation.”

Please visit www.mathworks.com/

zynq for more information.

NI LABVIEW SIMPLIFIESFPGA PROGRAMMING

National Instruments’ LabVIEWchanges the rules of FPGA program-ming, delivering new technologies

that convert graphical block diagramsinto digital hardware circuitry, simplify-ing development and shortening time-to-market for advanced control, moni-toring and test applications.

The Austin, Texas, company offersframe grabbers with a user-programmableFPGA in the image path as well as embed-ded systems, such as CompactRIO, thathave image-processing capabilities com-bined with FPGA-enabled I/O.CompactRIO combines an embedded con-troller for communication and processing,

plug-in I/O mod-ules and a recon-figurable chassishousing the user-programmableFPGA. Systemdesigners use theLabVIEW graphi-cal developmentplatform for pro-gramming float-

ing-point processors and FPGAs for imageprocessing, digital and analog I/O, com-munication and more within a singledevelopment environment.

“Xilinx All Programmable FPGA tech-nology is at the center of the NI LabVIEWRIO architecture,” said Jamie Smith,director of embedded systems marketingat NI. “LabVIEW lets engineers who arefamiliar with FPGAs take advantage oftheir inherent benefits of performance,versatility and determinism.”

Learn more about NI’s approachto FPGA-based design athttp://www.ni.com/fpga/.

60 Xcell Journal Second Quarter 2013

Latest and Greatest from theXilinx Alliance Program PartnersXpedite highlights the latest technology updates

from the Xilinx Alliance partner ecosystem.

XPEDITE

Ken KarnofskyMathWorks

Jamie Smith National Instruments

Page 61: Xcell Journal issue 83

ZYNQ SoC-BASED VIDEOENGINE FROM OMNITEK

The OSVP IP block from IP vendorOmniTek (Basingstoke, U.K.) pro-vides complete multivideo formatconversion and compositing. Allinterfaces conform to ARM®’sAXI4 protocol. The IP supports upto eight channels of simultaneouschroma upsample, color-spaceconversion, deinterlacing andresizing, with overlay graphicscapability.

The OZ745 is a video develop-ment platform based around theXilinx Zynq-7045 SoC. The kitincludes all the basic componentsof hardware, design tools, IP, pre-verified reference designs and aboard support package to rapidlydevelop video- and image-process-ing designs.O m n i T e kalso suppliessoftware andfirmware IPcores alongwith designservices tofurther accel-erate time-to-market.

“At OmniTek, we are extremelyproud of our design team’s skills inimage-processing algorithm designfor optimum FPGA and CPU parti-tioning,” said managing directorRoger Fawcett. “The latest ZynqSoC devices provide a single-chipsolution for such designs. We haveexploited this efficient architec-ture in the Real Time Video Engine2.1, which combines highly opti-mized Xilinx and OmniTek FPGAand software IP.”

Please visit http://www.omnitek.tv/

sites/defaul/files/OSVP.pdf andhttp://www.omnitek.tv/sites/defa

ult/files/OZ745.pdf for moreinformation.

NEWS FROM THE EMBEDDEDVISION ALLIANCE

The Embedded Vision Alliance(Walnut Creek, Calif.) is an industrypartnership formed to inspire andempower design engineers to createmore capable and responsive prod-ucts through integration of visioncapabilities.

According to the alliance’s founder,Jeff Bier, computer vision has longbeen a niche technology, becausecomputer vision equipment has beenbig, expensive and complex to use.Recently, however, products like theMicrosoft Kinect and vision-basedautomotive safety systems havedemonstrated that computer visioncan now be deployed even in cost-sensitive applications, and in waysthat are easy for nonspecialists to use.

“We use the term ‘embedded vision’to refer to the incorporation of visualintelligence into a wide range of sys-tems, creating ‘machines that see andunderstand,’” said Bier. “One of thekey factors enabling the current tran-sition from the niche technology ofcomputer vision to pervasive embed-ded vision is more capable and effi-cient proces-sors. Xilinx hasbeen a leaderin this realm,d e v e l o p i n gnew devices,like the Zynq-7000 SoC fami-ly, that deliveran enormousamount of pro-grammable processing power in avery modest price, power and sizeenvelope.”

The Embedded Vision All ianceprovides training videos, tutori -al articles, code examples andan array of other resources (allfree of charge) on its website,www.Embedded-Vision.com.

VIDEO AND GRAPHICS IP FROM XYLON

The power of the logicBRICKSvideo IP cores from Xylon (Zagreb,Croatia) are best demonstrated bythe logiADAK Automotive DriverAssistance kit. Built around theXilinx Zynq-7000 All ProgrammableSoC, this kit makes it easy to com-bine hardware-accelerated video-processing features implemented inprogrammable logic with high-level,complex control algorithms run-ning on the ARM processor systemin a single chip.

G r a p h i c scores for full-range imple-mentations of2D and 3Dgraphics pro-cessing units(GPUs) seam-lessly blendwith the videoIP cores andcan be immediately used with differ-ent operating systems (e.g., Linux,Microsoft Windows EmbeddedCompact 7 and Android).

“More than 15 years ago, whenwe started developing our logic-BRICKS IP cores for video process-ing with Xilinx FPGAs, no one couldimagine today’s smart vision appli-cations running on a single XilinxAll Programmable device,” saidDavor Kocacec, CEO of Xylon. “Oureasy-to-use IP cores, software anddesign services enable our cus-tomers to quickly build large por-tions of their next Xilinx-based SoCand fully concentrate on key differ-entiating features.”

More information about Xylonproducts and solutions for embed-ded vision, as well as a number ofpreverified reference designs fordownload, can be found atwww.logicbricks.com.

Second Quarter 2013 Xcell Journal 61

Roger FawcettOmniTek

Davor KocacecXylon

Jeff Bier Embedded Vision

Alliance

Page 62: Xcell Journal issue 83

VIVADO IP INTEGRATOR:ACCELERATED TIME TOIP CREATION AND INTEGRATION

To accelerate the creation of highlyintegrated, complex designs in AllProgrammable FPGA devices, Xilinxhas delivered the early-accessrelease of the Vivado IP Integrator(IPI). Vivado IPI accelerates the inte-gration of RTL, Xilinx® IP, third-partyIP and C/C++ synthesized IP. Basedon industry standards such as theARM® AXI interconnect and IP-XACT metadata for IP packaging,Vivado IPI delivers intelligent cor-rect-by-construction assembly ofdesigns co-optimized with Xilinx AllProgrammable solutions.

Built on the foundation of theVivado Design Suite, IP Integrator is adevice- and platform-aware interactive,graphical and scriptable environmentthat supports IP-aware automated AXIinterconnect, one-click IP subsystemgeneration, real-time DRC, interfacechange propagation and a powerfuldebug capability. When targeting aZynq™-7000 All Programmable SoC,embedded design teams can now morerapidly identify, reuse and integrateboth software and hardware IP target-ed for the dual-core ARM processingsystem and high-performance FPGA

fabric. To obtain an early-accesslicense, please contact your local salesrepresentative.

To see a video of the IP Integratorcreating an IP subsystem, please visithttp://www.xilinx.com/training/

vivado/creating-ip-subsystems-

with-vivado-ip-integrator.htm.

VIVADO HIGH-LEVEL SYNTHESIS LIBRARYENHANCEMENTS

To accelerate C/C++ system-leveldesign and high-level synthesis(HLS), Xilinx has enhanced its VivadoHLS libraries with support for indus-try-standard floating-point math.hoperations and real-time video-pro-cessing functions. More than 350active users and upwards of 1,000customers evaluating Vivado HLSwill now have immediate access tovideo-processing functions integrat-ed into an OpenCV environment forembedded vision running on thedual-core ARM processing system.The resulting solution enables up to a100x improvement in the perform-ance of existing C/C++ algorithmsthrough hardware acceleration. Atthe same time, Vivado HLS acceler-ates system verification and imple-mentation times by up to 100x com-pared with RTL design entry flows.

When targeting a Zynq-7000 AllProgrammable SoC, design teams cannow more rapidly develop C/C++ codefor the dual-core ARM processing sys-tem, while compute-intensive func-tions are automatically accelerated inthe high-performance FPGA fabric.

VIVADO DESIGN SUITEDEVICE SUPPORT

Vivado now supports Zynq-7000 AllProgrammable SoC devices, includingthe 7Z030 and 7Z045 (supportrequires early access to the IPIntegrator). The Vivado Design Suitesupports all 7 series devices and the2013.1 release includes these updates:

■ Production-ready: Virtex®-77VX690T, 7VX1140T, 7VX330T,7VX415T and 7VX980T

■ Defense-grade: Kintex™-7Q(7K325T and 7K410T) andVirtex-7Q (7V585T and 7VX485T)

■ General ES ready: Virtex-77VH580T and 7VH870T

The Vivado Design Suite, Web-PACK™ Edition is a free downloadthat provides support for Artix™-7(100T and 200T) and Kintex-7 (70Tand 160T) devices, and for Zynq-7000All Programmable SoC (early-accessZ010, Z020 and Z030) devices.

62 Xcell Journal Second Quarter 2013

What’s New in theVivado 2013.1 Release? The Vivado™ Design Suite provides a highly integrated design environment with

a completely new generation of system-to-IC-level features, including high-level

synthesis, analytical place-and-route and an advanced timing engine. These tools

enable developers to increase design integration and implementation productivity.

XTRA, XTRA

Page 63: Xcell Journal issue 83

www.trenz-electronic.de

difference by design

Platform Features• 4×5 cm compatible footprint• up to 8 Gbit DDR3 SDRAM• 256 Mbit SPI Flash• Gigabit Ethernet• USB option

All ProgrammableFPGA and SoC modules

4form x factor

5

Available SoMs:

Design Services• Module customization• Carrier board customization• Custom project development

rugged for harsh environmentsextended device life cycle

Q: WHAT ARE THE DIFFERENT EDITIONS OF THE VIVADO

DESIGN SUITE?

A: The Vivado Design Suite is now available in three editions: Design,System and WebPACK—the no-cost, device-limited version of the

Vivado Design Suite Design Edition. In-warranty Vivado Design Suite customers will receive the ISE® Design Suite

at no additional cost.The Vivado and ISE Design Suites are now separate downloads and

installers. The Vivado Design Edition includes the ISE Embedded Edition; theVivado System Edition includes the ISE System Edition, which also includesVivado High-Level Synthesis and System Generator for DSP.

For license generation, please visit www.xilinx.com/getlicense.

Q: SHOULD I CONTINUE TO USE THE ISE DESIGN SUITE

OR MOVE TO VIVADO?

A: ISE Design Suite is an industry-proven solution for all generations ofXilinx’s All Programmable devices. The Xilinx ISE Design Suite contin-

ues to bring innovations to a broad base of developers, and extends the familiardesign flow for 7 series and Xilinx Zynq-7000 All Programmable SoC projects.ISE 14.5, which brings new innovations and contains updated device support, isavailable for immediate download.

The Vivado Design Suite 2013.1, Xilinx’s next-generation design environment,supports 7 series devices and Zynq All Programmable SoC (early-access) devices.It offers enhanced tool performance, especially on large or congested designs.

Xilinx recommends that customers starting a “new” design contact theirlocal FAE to determine if Vivado is right for the design. Xilinx does not recom-mend transitioning during the middle of a current ISE Design Suite project, asdesign constraints and scripts are not compatible between the environments.

For more information, please read the Vivado 2013.1 and ISE 14.5release notes.

Q: IS VIVADO DESIGN SUITE TRAINING AVAILABLE?

A: Vivado takes full advantage of industry standards such as powerful inter-active Tcl scripting, Synopsys Design Constraints, SystemVerilog and

more. To reduce your learning curve, Xilinx has rolled out new instructor-ledclasses that will show you how to use the Vivado tools. To learn more aboutinstructor-led courses, please visit www.xilinx.com/training.

Q: IS THERE A WAY TO EXPLORE THE FEATURES OF

THE VIVADO DESIGN SUITE ONLINE?

A: Vivado Quick Take Tutorials provide a fast review of specific featuresof the new design environment. Topics include flow overview, system-

level design, synthesis, design analysis, I/O planning and high-level synthesis.New topics are added regularly. Visit www.xilinx.com/training/vivado

Second Quarter 2013 Xcell Journal 63

Page 64: Xcell Journal issue 83

64 Xcell Journal Second Quarter 2013

Xpress Yourself in Our Caption Contest

H eavens to Murgatroyd! If you’re looking to Xercise your funnybone, here’s your opportunity. We advise readers to duck forcover before taking on our verbal challenge and submitting

an engineering- or technology-related caption for this cartoon showing anasteroid about to crash into an engineering lab. The image might inspire acaption like “3D printing experiment run amok.”

Send your entries to [email protected]. Include your name, job title, companyaffiliation and location, and indicate that you have read the contest rules atwww.xilinx.com/xcellcontest. After due deliberation, we will print the sub-missions we like the best in the next issue of Xcell Journal. The winner willreceive an Avnet Zedboard, featuring the Zynq™-7000 All Programmable SoC(http://www.zedboard.org/). Two runners-up will gain notoriety, fame and acool, Xilinx-branded gift from our swag closet.

The contest begins at 12:01 a.m. Pacific Time on April 15, 2013. All entriesmust be received by the sponsor by 5 p.m. PT on July 1, 2013.

So, get writing!

WARREN TUSTIN, a design engineer at Agilent Technologies(Colorado Springs, Colo.), won ashiny new Avnet Zedboard with

this caption for the Tarzan cartoonin Issue 82 of Xcell Journal:

Congratulations as well to

our two runners-up:

“Tarzan switches careers from ape man to code monkey.”

– Joseph Trebbien, software engineer

“You know things are slow in theValley when the headhunters start

making house calls.”

– David Pariseau, Technology Plus, Inc.

(Los Altos, Calif.)

“It always was a bit noisy in the lab at the start of swing shift.”

DA

NIE

LG

UID

ER

AXCLAMATIONS!

NO PURCHASE NECESSARY. You must be 18 or older and a resident of the fifty United States, the District of Columbia, or Canada (excluding Quebec) to enter. Entries must be entirely original. Contest begins onApril 15, 2013. Entries must be received by 5:00 pm Pacific Time (PT) on July 1, 2013. Official rules available online at www.xilinx.com/xcellcontest. Sponsored by Xilinx, Inc. 2100 Logic Drive, San Jose, CA 95124.