6
Comparison of Different Design Methodologies of Hardware-based Image Processing for Automation in Microrobotics Tobias Tiemerding 1 , Claas Diederichs 1 , Christian Stehno 2 and Sergej Fatikow 1 Abstract— Object-detection and classification is a key task in micro- and nanohandling. The microscopy image is often the only available sensor to detect information about the positions and orientations of objects. FPGA-based image processing is superior to state of the art PC-based image processing in terms of achievable update rate, latency and jitter. Development of specialized FPGA solutions for object detection and tracking using a hardware-description language is time consuming and requires deep knowledge of the target system. Using SystemC, aC++ based class library, fast implementation with less knowledge of the system is possible. However, the gain in development speed is accompanied by higher resource usage of the FPGA and lower performance regarding computation speed. In this paper, several image processing algorithms are implemented in both manners and are compared regarding achievable update-rate, resource consumption and development time. The SystemC implementations could be implemented twice as fast and with less knowledge about the system. The in- creased resource usage of SystemC is negligible for small series and prototyping applications, while the decreased computation speed may be problematic for high-speed applications. I. MOTIVATION Vision-based object-detection and classification is a key task in micro- and nanohandling, as the microscopy image is often the only available sensor in the system. The image in- formation is used to detect and classify objects and specimen. It is also used to detect the position of the manipulator, if the used positioning system has no internal sensor or the internal sensor’s resolution is not sufficient for the task. Vision-based tracking of the manipulator for closed loop positioning is called visual servoing. State of the art is to use PC-based computer vision for both tasks. There are several algorithms available for for these tasks such as template matching with cross correlation or active contours [1]. PC-based image processing has several drawbacks for visual servoing, as discussed in [2]. The speed and quality of closed-loop control is directly connected to the speed and the quality of the sensors. Three timing quality characteristics of an optical sensor are update rate, latency and jitter. The sensor’s update rate is a limiting factor for the closed-loop control of a highly dynamic system. For vision-based sensor systems, the update rate is compar- atively low, because a full image must be acquired and then transferred to a PC. Common USB- or FireWire- cameras have update rates of 10 upto 30 Hz. 1 Division Microrobotics and Control Engineering, University of Oldenburg, Ammerländer Heerstraße 114-118, D-26111 Oldenburg, Germany {tobias.tiemerding, claas.diederichs, fatikow} at uni-oldenburg.de 2 CoSynth Embedded Systems GmbH & Co. KG, Escherweg 2, D-26121 Oldenburg, Germany stehno at cosynth.com The latency of a sensor describes the age of a sensor value. With high latency, the closed-loop control works with old data. Vision-based sensors have a high latency because the object position can be calculated after a full image was captured. The latency of vision-based object tracking is usually at least one update interval. Jitter is time variation in a periodic signal (e.g. update rate), adding an uncertainty for closed-loop control. Jit- ter is a main problem in software-based object tracking on general purpose CPUs because standard operating systems have an unpredictable scheduling of tasks. Fig. 1: Nanorobotic handling cell Hardware-based image processing using FPGAs can overcome these drawbacks. It was successfully used for high-speed visual servoing of mobile microrobots and nanostages [3], [4]. Figure 1 shows a nanorobotic cell with hardware-based image processing for pick and place handling of microspheres as used in [5]. However FPGAs need dedicated hardware description languages (HDL) such as VHDL 1 or Verilog to be programed [6], [7]. These languages model a hardware description including e.g. parallelism and clocking. Therefore, they operate at a very low abstraction layer. Thus, they require deep knowledge in hardware design and syntax [8]. In addition, the low abstraction layer is more error-prone. The overall design process using HDLs is significantly more complicated and time consuming. "Compared to software-based languages, HDLs are a lot like programming in assembly language"[8], which makes a broad exploration of the design space infeasible. New design processes based on software-based languages have been developed in recent years. The idea is to implement software algorithms and automatically transform them into a hardware description using a technique called 1 Very-high-speed integrated circuit hardware description language 2013 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) Wollongong, Australia, July 9-12, 2013 978-1-4673-5320-5/13/$31.00 ©2013 IEEE 565

[IEEE 2013 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) - Wollongong, NSW (2013.07.9-2013.07.12)] 2013 IEEE/ASME International Conference on Advanced

  • Upload
    sergej

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: [IEEE 2013 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) - Wollongong, NSW (2013.07.9-2013.07.12)] 2013 IEEE/ASME International Conference on Advanced

Comparison of Different Design Methodologies of Hardware-basedImage Processing for Automation in Microrobotics

Tobias Tiemerding1, Claas Diederichs1, Christian Stehno2 and Sergej Fatikow 1

Abstract— Object-detection and classification is a key task inmicro- and nanohandling. The microscopy image is often theonly available sensor to detect information about the positionsand orientations of objects. FPGA-based image processing issuperior to state of the art PC-based image processing in termsof achievable update rate, latency and jitter. Development ofspecialized FPGA solutions for object detection and trackingusing a hardware-description language is time consuming andrequires deep knowledge of the target system. Using SystemC,a C++ based class library, fast implementation with lessknowledge of the system is possible. However, the gain indevelopment speed is accompanied by higher resource usageof the FPGA and lower performance regarding computationspeed. In this paper, several image processing algorithms areimplemented in both manners and are compared regardingachievable update-rate, resource consumption and developmenttime. The SystemC implementations could be implementedtwice as fast and with less knowledge about the system. The in-creased resource usage of SystemC is negligible for small seriesand prototyping applications, while the decreased computationspeed may be problematic for high-speed applications.

I. MOTIVATION

Vision-based object-detection and classification is a keytask in micro- and nanohandling, as the microscopy imageis often the only available sensor in the system. The image in-formation is used to detect and classify objects and specimen.It is also used to detect the position of the manipulator, if theused positioning system has no internal sensor or the internalsensor’s resolution is not sufficient for the task. Vision-basedtracking of the manipulator for closed loop positioning iscalled visual servoing. State of the art is to use PC-basedcomputer vision for both tasks. There are several algorithmsavailable for for these tasks such as template matching withcross correlation or active contours [1].

PC-based image processing has several drawbacks forvisual servoing, as discussed in [2]. The speed and quality ofclosed-loop control is directly connected to the speed and thequality of the sensors. Three timing quality characteristics ofan optical sensor are update rate, latency and jitter.

• The sensor’s update rate is a limiting factor for theclosed-loop control of a highly dynamic system. Forvision-based sensor systems, the update rate is compar-atively low, because a full image must be acquired andthen transferred to a PC. Common USB- or FireWire-cameras have update rates of 10 upto 30 Hz.

1Division Microrobotics and Control Engineering, University ofOldenburg, Ammerländer Heerstraße 114-118, D-26111 Oldenburg,Germany {tobias.tiemerding, claas.diederichs,fatikow} at uni-oldenburg.de

2CoSynth Embedded Systems GmbH & Co. KG, Escherweg 2, D-26121Oldenburg, Germany stehno at cosynth.com

• The latency of a sensor describes the age of a sensorvalue. With high latency, the closed-loop control workswith old data. Vision-based sensors have a high latencybecause the object position can be calculated after a fullimage was captured. The latency of vision-based objecttracking is usually at least one update interval.

• Jitter is time variation in a periodic signal (e.g. updaterate), adding an uncertainty for closed-loop control. Jit-ter is a main problem in software-based object trackingon general purpose CPUs because standard operatingsystems have an unpredictable scheduling of tasks.

Fig. 1: Nanorobotic handling cell

Hardware-based image processing using FPGAs canovercome these drawbacks. It was successfully used forhigh-speed visual servoing of mobile microrobots andnanostages [3], [4]. Figure 1 shows a nanorobotic cellwith hardware-based image processing for pick and placehandling of microspheres as used in [5].

However FPGAs need dedicated hardware descriptionlanguages (HDL) such as VHDL1 or Verilog to be programed[6], [7]. These languages model a hardware descriptionincluding e.g. parallelism and clocking. Therefore, theyoperate at a very low abstraction layer. Thus, they requiredeep knowledge in hardware design and syntax [8]. Inaddition, the low abstraction layer is more error-prone.The overall design process using HDLs is significantlymore complicated and time consuming. "Compared tosoftware-based languages, HDLs are a lot like programmingin assembly language"[8], which makes a broad explorationof the design space infeasible.

New design processes based on software-based languageshave been developed in recent years. The idea is toimplement software algorithms and automatically transformthem into a hardware description using a technique called

1Very-high-speed integrated circuit hardware description language

2013 IEEE/ASME International Conference onAdvanced Intelligent Mechatronics (AIM)Wollongong, Australia, July 9-12, 2013

978-1-4673-5320-5/13/$31.00 ©2013 IEEE 565

Page 2: [IEEE 2013 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) - Wollongong, NSW (2013.07.9-2013.07.12)] 2013 IEEE/ASME International Conference on Advanced

high level synthesis. This raises the level of abstraction;the design process becomes better accessible and requiresless time, thus reducing the overall development costs [9].Software-based languages such as C++ are often used as abase. This is justified by the high spread and the assumptionthat it is possible to turn every software developer intoa hardware developer [10], [9]. The C++ based classlibrary SystemC follows the syntax modification approachof software-based languages and enables the description ofhardware. This aims to overcome the differences betweensoft- and hardware [11], [10], [8]: Modeling concurrency,providing a timing model, using special data types suchas bitvectors, creating a memory model and supportingcommunication patterns. The design methodology used forthis paper is described in further detail in section II.

This paper evaluates C++ with SystemC (hereinafterreferred to as SystemC) for image processing algorithmstargeting FPGAs. In Section III, the image processing algo-rithms are presented. The following section IV describes theimplementation of these algorithms, both in SystemC andVHDL, which represents the quasi standard for modelinghardware descriptions. In addition, this paper presents anapproach to raise the level of abstraction even further for thedevelopment of image processing algorithms, by introducinga C++ class library based on SystemC. The implementationsare compared in terms of development time, performance andresource consumption on the FPGA in section V. The paperconcludes with a discussion in section VI and an outlook onfuture work in section VII.

II. SYSTEM LEVEL DESIGN WITH SYSTEMC

Today, image processing systems are usually developed asa software prototype first. Implementing this initial versionon an embedded device, here FPGA, requires intensiveengineering and is a long and expensive task. Electronicsystem level (ESL) methodologies facilitate this task byraising the low-level hardware engineering to simplersoftware design tasks. The formerly separated hardware andsoftware flows are unified. ESL synthesis automates thestep from software models to the hardware implementation.Thus, it is possible to create an early hardware prototypeof the system for an FPGA-based system using an ESLapproach with automated synthesis from the softwareprototype into the hardware.

The evaluated approach is built around the CoSynth Syn-thesizer and a SystemC based development flow. It startswith a pure software design, where initially the functionalspecification of the system is implemented as a plain C++program. This first implementation is augmented with de-tails about the hardware architecture and eventually alsowith timing requirements and an implicit schedule. Thesehardware specific extensions are provided by SystemC, andare integrated with the functional C++ implementation.This step results in another executable program, which isstill a usual C++ program and makes use of the freely

available SystemC runtime library2. The program can berun on all PCs and simulated and optimized with commondevelopment tools. As a last step, the CoSynth Synthesizerautomatically generates the hardware description required forthe FPGA directly from the software model. Function andtiming specifications are exactly reproduced, such that thepreviously tested properties are preserved by the hardware.

III. IMAGE PROCESSING ALGORITHMS

Image processing algorithms can be categorized inregards to the pixel access method: random / non-random.Random access algorithms require access to arbitrary pixelsat random times. This requires the image, sometimes severalMB in size, to be stored in memory. FPGAs often do nothave large memories3. Therefore, non-random algorithmsare preferred, e.g. point operations or local filters [8].

This section presents the algorithms that were chosen forcomparison. They are all local filters where the output Q notonly depends on one pixel value but also on values withina neighborhood (see equation (1) and figure 2). W is theneighborhood centered on I[x, y]. The function f determineshow to use the values inside the neighborhood.

Fig. 2: A local filter. The gray area denotes the input locatedaround the source pixel x that is used to calculate the filteredvalue for the corresponding location in the output image [8].

Q[x, y] = f (I[x, y], . . . , I[x + ∆x, y + ∆y]) , (1)(∆x,∆y) ∈ W

Figure 3 shows the algorithms demosaicing (3a), binarymorphology (3b) and the edge-detection algorithm proposedby Canny in 1986 [13] (3c). The upper image shows the

2http://www.accellera.org/home/, last access: 16.04.20133Example: Xilinx Spartan-3E FPGA has only 136KB of RAM [12]

(a) Demosaicing (b) Binary morphology (c) Canny algorithm

Fig. 3: Image processing algorithms

566

Page 3: [IEEE 2013 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) - Wollongong, NSW (2013.07.9-2013.07.12)] 2013 IEEE/ASME International Conference on Advanced

input for the algorithm and the lower image the output.The algorithms will be explained in detail in the followingsubsections. The computation required increases from demo-saicing to the multistage edge-detection algorithm.

A. Demosaicing

Almost every CCD image sensor captures raw image datawith a Bayer pattern [14]. Each pixel of the sensor measuresonly a single color due to light filtering with a color filterarray (see upper image in Figure 3a). The RGB values forevery pixel can be interpolated from its neighborhood. Thisis called demosaicing. One possible solution is a bilinearinterpolation, which takes eight neighborhood pixels intoaccount [15] [8]. The missing values are calculated withaddition and averaging of the neighboring values.

B. Binary morphology

The basic operations of binary morphology are erosionand dilation. With erosion a foreground pixel is only kept ifa structure element completely fits the object. This way theobject becomes smaller. Dilation operates complementary.The combination of both define the more complex operationsopening and closing [8]. They are used to reduce imagenoise. Opening removes objects in the foreground that aresmaller than the structure element while closing does thesame with objects in the background (see Figure 3b). "Therelative simple processing . . . has made morphological filtersone of the most implemented filters [on FPGAs]" [8].

C. Edge detection using the canny algorithm

The goal of edge detection is the segmentation of animage into different parts through edge separation. At theedges there is a change of pixel values evoked by a contrastdifference between object and background. Canny proposedan algorithm considering the following goals [13], [16]:

1) Detection: Detect only real edges; Maximizing thesignal-to-noise ratio.

2) Localization: The detected edges should be as close aspossible to real edges.

3) Number of responses: Every real edge should only bedetected once.

In comparison to the algorithms from III-A and III-B, the canny algorithm has multiple stages: Smoothing toremove image noise and thus reduce the false detection ofedges, calculating edge gradients, reducing edge thicknessand determining potential edges. The determination is donewith thresholding and hysteresis. Strong edges are marked ascertain edges. Weak edges are only kept if they are connectedto a certain edge. This is a recursive process [13].

IV. IMPLEMENTATIONEach of the algorithms presented in section III was imple-

mented using both VHDL and SystemC. The result of eachimplementation is a ready-made intellectual property core forthe Xilinx FPGA platforms. In this way each implementationwas easy to evaluate inside a test assembly (see Figure 6).This section describes the implementation and as well asnecessary adaptions and problems that occurred.

A. SystemC and OSSSWith SystemC alone, common programming constructs of

C++ e.g. object orientation, inheritance and polymorphismare not accessible. OFFIS4 and CoSynth5 developed OSSS,a C++ class library [17], [18], [19]. The goal was to remedythese limitations. On top of OSSS an extra class library wasdeveloped, which encapsulates hardware-specific constructsto raise the level of abstraction and thus allow for a simplecreation of image processing modules. The library contains:

• Classes for image processing modules (see figure 4)• Classes for module behavior (called inner filter)• Helper classes e.g. row buffer

Fig. 4: SystemC-module for local filters [20]

A schematic representation of an image processing moduleis presented in figure 4. Next to the process calculate,which determines the behavior by calling an apply()method, a row buffer is instantiated. The row buffer createsthe neighborhood for a local filter. In addition an optionalport param exists, to be used by algorithms with parameterse.g. operation mode for demosaicing. The apply() methodcalculates the output based on data from the row buffer andoptional parameters. Overall, the class library simplifies thedevelopment. A developer has to subclass a template moduleclass and the inner filter. By subclassing the inner filter, theapply() method has to be implemented. As an examplethe implementation of binary thresholding follows. A richexplanation of the implementation can be found in [20].

Fig. 5: Binary thresholding

# i n c l u d e " L i b r a r y / e n c a p s u l a t i o n L i b r a r y . h "

c l a s s t h r e s h o l d _ f i l t e r :p u b l i c i n n e r _ f i l t e r < i n t , bool , i n t >{

/ / O v e r r i d e a p p l y methodbool a p p l y ( i n t in , i n t t h r ) {

i f ( i n >= t h r ) re turn true ;e l s e re turn f a l s e ;

}}

Listing 1: apply() for binary thresholding

The template class for an image processing module with-out neighborhood and with parameter was used as base class.The parameter defines the threshold. Listing 1 shows theimplemented apply() method (compare figure 5).

4http://www.offis.de, last access: 16.05.20135http://www.cosynth.com, last access: 16.05.2013

567

Page 4: [IEEE 2013 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) - Wollongong, NSW (2013.07.9-2013.07.12)] 2013 IEEE/ASME International Conference on Advanced

V. EXPERIMENTAL RESULTS AND COMPARISON

The goal of the following evaluation is to determine if thedevelopment approach with SystemC is a feasible alternativecompared to long-established ones based on VHDL. Theevaluation compares the approaches:

1) General development and derived development times2) Achievable performance on different FPGAs3) Resource consumption on an low cost FPGASubsection V-A describes the test environment with the

used software and FPGA development boards. The subsec-tions V-B to V-D discuss the results.

A. Test assembly

Fig. 6: Evaluation environment [20]

Figure 6 shows the test assembly. The assembly consistsof an FPGA-board which can be configured with severalIP-Cores to create an image processing pipeline. An off-the-shelf PC is used to program the FPGA, to send and receivetest image data via USB, and to compare the performance.The test assembly was used to evaluate the implementationand generate results. Also the output images of Figure 3have been generated with this assembly.

FPGAs from two different Xilinx FPGA families6

were used to generate the experimental results: Fromthe Spartan-6 family, designed for high volume and lowcost, the LX45 and from the Virtex-5 family, designedfor high-end applications, the LX50T were chosen. BothFPGAs are packaged on boards by Digilent Inc.7: Atlys forSpartan-6 LX45 and Genesys for Virtex-5 LX50T [21], [22].

B. Achievable performance

"The achievable clock rate of an FPGA configurationdepends on the depth of the computation in terms of logicblocks, and their relative placement, which determines thelength of the wires needed to connect them" [23]. Withrespect to the achievable clock rate from the synthesis reportsgenerated by Xilinx ISE and the number of clock cycles forevery algorithm the maximum throughput can be calculated[20]. Here, some assumptions were made:

• The image size is one mega-pixel (1024× 1024 pixel).

6http://www.xilinx.com/products/silicon-devices/fpga/index.htm, last access: 17.05.2013

7http://www.digilentinc.com/, last access: 10.05.2013

Fig. 7: Performance (in frames per second)

• The structure element size for morphology is 3×3 pixel.Additionally to the performance on an FPGA, the

achievable performance on the off-the-shelf PC is takeninto account8. The OpenCV-Library for C++ was used,which provides highly optimized implementations of thepresented algorithms [24]. This gives an impression howthe embedded solution performs. Figure 7 shows the result.

For the FPGA based solutions, VHDL offers the highestperformance for every algorithm. SystemC allows, depend-ing on the algorithm, between 20% (Canny algorithm onSpartan-6) and 76.8% (Binary morphology on Spartan-6) ofthe performance generated by the VHDL implementation.The binary morphology shows the smallest difference with23.2% for both types of FPGAs. The largest difference hasthe canny algorithm realized with SystemC on Spartan-6. Ithas only ≈ 20% of the performance compared to VHDL onthe Virtex-5 FPGA. By comparing the FPGA solutions toOpenCV it is possible to see how good the FPGA performs.Every VHDL solution exceeds OpenCV. The performancefor the Canny algorithm is more than doubled on the Virtex-5. This is noteworthy because the FPGA has a significantlylower energy consumption9. For SystemC only the Cannyalgorithm on Virtex-5 exceeds OpenCV. Overall SystemCallows for an average performance of 46.6% of VHDL.

C. Resource consumption

An FPGA contains a limited number of resources. Theseare at first slices that are used to realize logic functions. At

8Hardware: Intel Core i7 CPU with 2800MHz and 8GB RAM9Virtex-5 LX50T: about 10W [22]; PC with Intel Core i7: about 150W

568

Page 5: [IEEE 2013 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) - Wollongong, NSW (2013.07.9-2013.07.12)] 2013 IEEE/ASME International Conference on Advanced

second, these are components such as hard-wired memorycells or multipliers. Therefore, a hardware description shoulduse as few of these resources as possible. In this way, com-plex processing chains with several hardware descriptionscan be realized. Here, some assumptions were made:

• Target platform for synthesis is the Spartan-6 FPGA.• The image size is one mega-pixel (1024× 1024 pixel).

This influences the size of internal data structures.• The structure element size for morphology is 3×3 pixel.

Fig. 8: Resource consumption for Xilinx Spartan-6 FPGA

Figure 8 shows the results for every algorithm. The datawas gathered from the synthesis reports generated by XilinxISE. Three different types of resources have been usedto realize the designs: Slices, Block Rams (BRAMs) andDSP48s. BRAMs are a special kind of memory, whereasthe DSP48s are digital signal processing logic elements10.They can realize different arithmetic operations such asmultiplying or FIR11 filtering.

For the demosaicing the overall usage of resources islow. For SystemC, the resource consumption for slices ismultiplied by factor 5.4, the highest increase overall, and forBRAMs by 3.7. The increased consumption of BRAMs forSystemC can be easily explained: The row buffer in VHDLis optimized to use only one BRAM. In contrast, SystemCis using two BRAMs for every row buffered. SystemC useswith the DSP48s special resources that VHDL doesn’t need.The SystemC designs take them into account to realizemultiplications. In VHDL all multiplications are replaced byshift operations. This also applies for the canny algorithm.The binary morphology overall has the lowest resource

10see http://www.xilinx.com/support/documentation/user_guides/ug369.pdf, last access: 10.04.2013

11Finite impulse response

Fig. 9: Development time (in person days)

consumption, due to the low complexity of the design.The used slices are multiplied by factor 3 for SystemCcompared to VHDL. The number of BRAMs is the same.Compared to demosaicing and morphology, the multistagecanny algorithm is more complex. The implementation withVHDL uses 1.2% and SystemC 6.4% of the available slices(factor 5.3). The usage of BRAMs is increased by factor 8.

D. Development time

Figure 9 shows the rounded development times for thepresented algorithms. The familiarization period with thelanguage and the respective algorithm is not taken intoaccount. This also applies to the development of the imageprocessing class library in SystemC. The development timesfor demosaicing and binary morphology are comparable.They only differ in a few hours. The development time isreduced to only one and a half person-days when usingSystemC. For the canny algorithm, the effort is reduced by40%, from seven and a half to four and a half days.

VI. DISCUSSION

This paper describes the hardware implementationof several image processing algorithms and comparesthem regarding performance, resource consumption anddevelopment time. The implementation was done usingVHDL and C++ with SystemC and OSSS. The developmentwith SystemC was improved by creating an image processingclass library on top of OSSS, which raised the abstractionlevel by encapsulating hardware specific constructs.

The development procedure has an important influenceon the needed development time. Using SystemC, combinedwith the extra class library, it was possible to reduce thedevelopment expenses for every implemented algorithm.For less complex designs an improvement of up to 78% ispossible, due to the higher abstraction level. By reducingthe needed knowledge about hardware development and thetarget platform, it is possible to fully concentrate on thealgorithm. C++ as base also shortens the familiarizationperiod. The development of hardware with VHDL isnot possible without learning the tools, syntax and thesynthesis subset. Additionally to the higher abstractionlevel, the simulation and testing has a high influence tothe development time. While VHDL development relieson the analysis of often complex waveforms, SystemCallows the direct usage of image data. Because C++ is

569

Page 6: [IEEE 2013 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) - Wollongong, NSW (2013.07.9-2013.07.12)] 2013 IEEE/ASME International Conference on Advanced

used as a base, class libraries for image in- and outputcan be used. The reduced development time allows fora faster time-to-market and an improved reaction tochanges in the market. Overall, with SystemC it is possibleto include software developers into the development process.

The comparison of the performance showed that thethroughput for every SystemC design is below the VHDLbased solution. This is mostly due to the face that theSystemC design was only slightly optimized towards thesynthesis tool, and that the CoSynth Synthesizer doesnot extract parallelism automatically from the hardwaredescriptions. Furthermore, the automated translation resultsin larger designs which prevent high clock rates ofhandcrafted VHDL. In comparison to OpenCV, SystemCsometimes has less throughput. However this is not ageneral drawback because image processing chains consistof several algorithms. While the performance of PC-basedimage processing lowers due to chaining, FPGA-basedSystems still deliver the same performance [8].

The general resource consumption for both approaches islow. SystemC has a higher need for resources compared toVHDL. Nevertheless, both approaches are able to realizecomplex image processing chains with several hardwaremodules. The difference between VHDL and SystemCis partially due to the overhead in the generated VHDL.Another factor for more complex designs is that sharedresources are not automatically identified by the synthesistool and thus are sometimes buried beneath the generatedstructures. Furthermore, the synthesizer applies somerestrictions for the designs in SystemC, which requiresfurther optimization techniques.

For microscale automation, image processing speed is amajor concern. If the achievable update rate of the SystemCdesign is sufficient for an automation task, this approachcan be a good alternative to the handcrafted designs, dueto its shorter development time. The above mentioned char-acteristics of FPGAs (e.g. latency, update-rate deviation) arealso valid for the SystemC design. However, for high speedvisual servoing the timing performance of the handcraftedVHDL design is superior to the generated design. Therefore,to improve the closed-loop positioning speed of tools andspecimen, VHDL or Verilog designs should be used.

VII. OUTLOOK AND FUTURE WORK

The next steps will target implementation and evaluationof other algorithms such as random access algorithmsfor image processing and control tasks. Additionally theevaluation will be extended to more developers in order toimprove the comparisons of development times.

A full image-processing chain for visual servoing ofmicrorobots will be developed in SystemC and benchmarkedto the existing VHDL designs. Additionally to statisticaldata (e.g. update rate), the effects of different update rates

on the closed-loop positioning speed will be measured.

REFERENCES

[1] T. Sievers and S. Fatikow, “Real-Time Object Tracking for the Robot-Based Nanohandling in a Scanning Electron Microscope,” Journal ofMicromechatronics - Special Issue on Micro/Nanohandling, vol. 3, no.3-4, pp. 267–284(18), 2006.

[2] C. Diederichs, “Hardware-Software Co-Design Tracking System forPredictable High-Speed Mobile Microrobot Position Control,” in Proc.of IFAC Symposium on Mechatronic Systems, 2010.

[3] ——, “Fast Visual Servoing of Multiple Microrobots using an FPGA-Based Smart Camera System,” in Proc. of the 18th IFAC WorldCongress, 2011.

[4] D. Jasper, C. Diederichs, C. Edeler, and S. Fatikow, “High-speednanorobot position control inside a scanning electron microscope,”ECTI Transactions on Electrical Eng., Electronics, and Communica-tions, vol. 9, no. 1, pp. 177–186, Feburary 2011.

[5] C. Diederichs and S. Fatikow, “FPGA-based high-speed object detec-tion and classification inside optical microscopes,” in Proc. of the 8thInternational Workshop on Microfactories, June 2012.

[6] IEEE Standard VHDL Language Reference Manual, 1076-2000 ed.,The Institute of Electrical and Electronics Engineers, Inc., 345 East47th Street, New York, NY 10017-2394, USA, 2000.

[7] IEEE Standard Hardware Description Language Based on the VerilogHardware Description Language, 1364-2001 ed., The Institute ofElectrical and Electronics Engineers, Inc., 345 East 47th Street, NewYork, NY 10017-2394, USA, 2001.

[8] D. G. Bailey, Design for Embedded Image Processing on FPGAs.John Wiley & Sons (Asia) Pte Ltd, 2011.

[9] I. Alston and B. Madahar, “From C to netlists: Hardware engineeringfor software engineers?” Electronics & Communication EngineeringJournal, vol. 14, no. 4, pp. 165–173, 2002.

[10] S. A. Edwards, “The Challenges of Synthesizing Hardware from C-Like Languages,” vol. 23, no. 5, 2006, pp. 375–386.

[11] ——, “The Challenges of Hardware Synthesis from C-LikeLanguages,” in Proceedings of the conference on Design,Automation and Test in Europe - Volume 1, ser. DATE ’05.Washington, DC, USA: IEEE Computer Society, 2005, pp. 66–67.[Online]. Available: http://date.eda-online.co.uk/proceedings/papers/2005/date05/pdffiles/01e_1.pdf

[12] Spartan-3E FPGA Family Data Sheet, Xilinx Inc., October 2012.[13] J. Canny, “A computational approach to edge detection,” IEEE Trans.

Pattern Anal. Mach. Intell., vol. 8, pp. 679–698, November 1986.[14] Axis Communications, “CCD and CMOS sensor technology -

Technical white paper,” Tech. Rep., 2010. [Online]. Available: http://www.axis.com/files/whitepaper/wp_ccd_cmos_40722_en_1010_lo.pdf

[15] R. Jean, “Demosaicing with the bayer pattern,” University of NorthCarolina, Tech. Rep., 2004.

[16] M. Nosrati, R. Karimi, and M. Hariri, “Detecting circular shapes fromareal images using median filter and CHT,” Global Journals Inc. (US),vol. 12, 2012.

[17] K. Grüttner, C. Grabbe, F. Oppenheimer, and W. Nebel, “Object Ori-ented Design and Synthesis of Communication in Hardware-/SoftwareSystems with OSSS,” in Proceeding of SASIMI 2007, Hokkaido, Japan,Oct. 2007.

[18] K. Grüttner, F. Oppenheimer, and W. Nebel, “OSSS Methodology -System-Level Design and Synthesis of Embedded HW/SW Systemsin C++,” in Proceedings of ISABEL 2008, Jan. 2008.

[19] OSSS- A Library for Synthesisable System Level Models in SystemC,OFFIS, R&D Division Transportation, Hardware/Software DesignMethodology Group, Oldenburg, 2010.

[20] T. Tiemerding, “Implementation and Evaluation of automated gen-erated IP-Cores for image processing systems on FPGAs,” Master’sthesis, University of Oldenburg, 2012.

[21] Digilent Inc., Genesys Board Reference Manual, 2012.[22] ——, Genesys Board Reference Manual, 2012.[23] B. A. Draper, J. R. Beveridge, A. P. W. Böhm, C. Ross,

and M. Chawathe, “Accelerated Image Processing on FPGA,”IEEE Transactions on Image Processing, vol. 12, pp. 1543–1551,2003. [Online]. Available: http://www.cs.colostate.edu/pubserv/pubs/Draper-draper-publications-draper_tip03.pdf

[24] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of SoftwareTools, 2000.

570