Upload
sergej
View
212
Download
0
Embed Size (px)
Citation preview
Comparison of Different Design Methodologies of Hardware-basedImage Processing for Automation in Microrobotics
Tobias Tiemerding1, Claas Diederichs1, Christian Stehno2 and Sergej Fatikow 1
Abstract— Object-detection and classification is a key task inmicro- and nanohandling. The microscopy image is often theonly available sensor to detect information about the positionsand orientations of objects. FPGA-based image processing issuperior to state of the art PC-based image processing in termsof achievable update rate, latency and jitter. Development ofspecialized FPGA solutions for object detection and trackingusing a hardware-description language is time consuming andrequires deep knowledge of the target system. Using SystemC,a C++ based class library, fast implementation with lessknowledge of the system is possible. However, the gain indevelopment speed is accompanied by higher resource usageof the FPGA and lower performance regarding computationspeed. In this paper, several image processing algorithms areimplemented in both manners and are compared regardingachievable update-rate, resource consumption and developmenttime. The SystemC implementations could be implementedtwice as fast and with less knowledge about the system. The in-creased resource usage of SystemC is negligible for small seriesand prototyping applications, while the decreased computationspeed may be problematic for high-speed applications.
I. MOTIVATION
Vision-based object-detection and classification is a keytask in micro- and nanohandling, as the microscopy imageis often the only available sensor in the system. The image in-formation is used to detect and classify objects and specimen.It is also used to detect the position of the manipulator, if theused positioning system has no internal sensor or the internalsensor’s resolution is not sufficient for the task. Vision-basedtracking of the manipulator for closed loop positioning iscalled visual servoing. State of the art is to use PC-basedcomputer vision for both tasks. There are several algorithmsavailable for for these tasks such as template matching withcross correlation or active contours [1].
PC-based image processing has several drawbacks forvisual servoing, as discussed in [2]. The speed and quality ofclosed-loop control is directly connected to the speed and thequality of the sensors. Three timing quality characteristics ofan optical sensor are update rate, latency and jitter.
• The sensor’s update rate is a limiting factor for theclosed-loop control of a highly dynamic system. Forvision-based sensor systems, the update rate is compar-atively low, because a full image must be acquired andthen transferred to a PC. Common USB- or FireWire-cameras have update rates of 10 upto 30 Hz.
1Division Microrobotics and Control Engineering, University ofOldenburg, Ammerländer Heerstraße 114-118, D-26111 Oldenburg,Germany {tobias.tiemerding, claas.diederichs,fatikow} at uni-oldenburg.de
2CoSynth Embedded Systems GmbH & Co. KG, Escherweg 2, D-26121Oldenburg, Germany stehno at cosynth.com
• The latency of a sensor describes the age of a sensorvalue. With high latency, the closed-loop control workswith old data. Vision-based sensors have a high latencybecause the object position can be calculated after a fullimage was captured. The latency of vision-based objecttracking is usually at least one update interval.
• Jitter is time variation in a periodic signal (e.g. updaterate), adding an uncertainty for closed-loop control. Jit-ter is a main problem in software-based object trackingon general purpose CPUs because standard operatingsystems have an unpredictable scheduling of tasks.
Fig. 1: Nanorobotic handling cell
Hardware-based image processing using FPGAs canovercome these drawbacks. It was successfully used forhigh-speed visual servoing of mobile microrobots andnanostages [3], [4]. Figure 1 shows a nanorobotic cellwith hardware-based image processing for pick and placehandling of microspheres as used in [5].
However FPGAs need dedicated hardware descriptionlanguages (HDL) such as VHDL1 or Verilog to be programed[6], [7]. These languages model a hardware descriptionincluding e.g. parallelism and clocking. Therefore, theyoperate at a very low abstraction layer. Thus, they requiredeep knowledge in hardware design and syntax [8]. Inaddition, the low abstraction layer is more error-prone.The overall design process using HDLs is significantlymore complicated and time consuming. "Compared tosoftware-based languages, HDLs are a lot like programmingin assembly language"[8], which makes a broad explorationof the design space infeasible.
New design processes based on software-based languageshave been developed in recent years. The idea is toimplement software algorithms and automatically transformthem into a hardware description using a technique called
1Very-high-speed integrated circuit hardware description language
2013 IEEE/ASME International Conference onAdvanced Intelligent Mechatronics (AIM)Wollongong, Australia, July 9-12, 2013
978-1-4673-5320-5/13/$31.00 ©2013 IEEE 565
high level synthesis. This raises the level of abstraction;the design process becomes better accessible and requiresless time, thus reducing the overall development costs [9].Software-based languages such as C++ are often used as abase. This is justified by the high spread and the assumptionthat it is possible to turn every software developer intoa hardware developer [10], [9]. The C++ based classlibrary SystemC follows the syntax modification approachof software-based languages and enables the description ofhardware. This aims to overcome the differences betweensoft- and hardware [11], [10], [8]: Modeling concurrency,providing a timing model, using special data types suchas bitvectors, creating a memory model and supportingcommunication patterns. The design methodology used forthis paper is described in further detail in section II.
This paper evaluates C++ with SystemC (hereinafterreferred to as SystemC) for image processing algorithmstargeting FPGAs. In Section III, the image processing algo-rithms are presented. The following section IV describes theimplementation of these algorithms, both in SystemC andVHDL, which represents the quasi standard for modelinghardware descriptions. In addition, this paper presents anapproach to raise the level of abstraction even further for thedevelopment of image processing algorithms, by introducinga C++ class library based on SystemC. The implementationsare compared in terms of development time, performance andresource consumption on the FPGA in section V. The paperconcludes with a discussion in section VI and an outlook onfuture work in section VII.
II. SYSTEM LEVEL DESIGN WITH SYSTEMC
Today, image processing systems are usually developed asa software prototype first. Implementing this initial versionon an embedded device, here FPGA, requires intensiveengineering and is a long and expensive task. Electronicsystem level (ESL) methodologies facilitate this task byraising the low-level hardware engineering to simplersoftware design tasks. The formerly separated hardware andsoftware flows are unified. ESL synthesis automates thestep from software models to the hardware implementation.Thus, it is possible to create an early hardware prototypeof the system for an FPGA-based system using an ESLapproach with automated synthesis from the softwareprototype into the hardware.
The evaluated approach is built around the CoSynth Syn-thesizer and a SystemC based development flow. It startswith a pure software design, where initially the functionalspecification of the system is implemented as a plain C++program. This first implementation is augmented with de-tails about the hardware architecture and eventually alsowith timing requirements and an implicit schedule. Thesehardware specific extensions are provided by SystemC, andare integrated with the functional C++ implementation.This step results in another executable program, which isstill a usual C++ program and makes use of the freely
available SystemC runtime library2. The program can berun on all PCs and simulated and optimized with commondevelopment tools. As a last step, the CoSynth Synthesizerautomatically generates the hardware description required forthe FPGA directly from the software model. Function andtiming specifications are exactly reproduced, such that thepreviously tested properties are preserved by the hardware.
III. IMAGE PROCESSING ALGORITHMS
Image processing algorithms can be categorized inregards to the pixel access method: random / non-random.Random access algorithms require access to arbitrary pixelsat random times. This requires the image, sometimes severalMB in size, to be stored in memory. FPGAs often do nothave large memories3. Therefore, non-random algorithmsare preferred, e.g. point operations or local filters [8].
This section presents the algorithms that were chosen forcomparison. They are all local filters where the output Q notonly depends on one pixel value but also on values withina neighborhood (see equation (1) and figure 2). W is theneighborhood centered on I[x, y]. The function f determineshow to use the values inside the neighborhood.
Fig. 2: A local filter. The gray area denotes the input locatedaround the source pixel x that is used to calculate the filteredvalue for the corresponding location in the output image [8].
Q[x, y] = f (I[x, y], . . . , I[x + ∆x, y + ∆y]) , (1)(∆x,∆y) ∈ W
Figure 3 shows the algorithms demosaicing (3a), binarymorphology (3b) and the edge-detection algorithm proposedby Canny in 1986 [13] (3c). The upper image shows the
2http://www.accellera.org/home/, last access: 16.04.20133Example: Xilinx Spartan-3E FPGA has only 136KB of RAM [12]
(a) Demosaicing (b) Binary morphology (c) Canny algorithm
Fig. 3: Image processing algorithms
566
input for the algorithm and the lower image the output.The algorithms will be explained in detail in the followingsubsections. The computation required increases from demo-saicing to the multistage edge-detection algorithm.
A. Demosaicing
Almost every CCD image sensor captures raw image datawith a Bayer pattern [14]. Each pixel of the sensor measuresonly a single color due to light filtering with a color filterarray (see upper image in Figure 3a). The RGB values forevery pixel can be interpolated from its neighborhood. Thisis called demosaicing. One possible solution is a bilinearinterpolation, which takes eight neighborhood pixels intoaccount [15] [8]. The missing values are calculated withaddition and averaging of the neighboring values.
B. Binary morphology
The basic operations of binary morphology are erosionand dilation. With erosion a foreground pixel is only kept ifa structure element completely fits the object. This way theobject becomes smaller. Dilation operates complementary.The combination of both define the more complex operationsopening and closing [8]. They are used to reduce imagenoise. Opening removes objects in the foreground that aresmaller than the structure element while closing does thesame with objects in the background (see Figure 3b). "Therelative simple processing . . . has made morphological filtersone of the most implemented filters [on FPGAs]" [8].
C. Edge detection using the canny algorithm
The goal of edge detection is the segmentation of animage into different parts through edge separation. At theedges there is a change of pixel values evoked by a contrastdifference between object and background. Canny proposedan algorithm considering the following goals [13], [16]:
1) Detection: Detect only real edges; Maximizing thesignal-to-noise ratio.
2) Localization: The detected edges should be as close aspossible to real edges.
3) Number of responses: Every real edge should only bedetected once.
In comparison to the algorithms from III-A and III-B, the canny algorithm has multiple stages: Smoothing toremove image noise and thus reduce the false detection ofedges, calculating edge gradients, reducing edge thicknessand determining potential edges. The determination is donewith thresholding and hysteresis. Strong edges are marked ascertain edges. Weak edges are only kept if they are connectedto a certain edge. This is a recursive process [13].
IV. IMPLEMENTATIONEach of the algorithms presented in section III was imple-
mented using both VHDL and SystemC. The result of eachimplementation is a ready-made intellectual property core forthe Xilinx FPGA platforms. In this way each implementationwas easy to evaluate inside a test assembly (see Figure 6).This section describes the implementation and as well asnecessary adaptions and problems that occurred.
A. SystemC and OSSSWith SystemC alone, common programming constructs of
C++ e.g. object orientation, inheritance and polymorphismare not accessible. OFFIS4 and CoSynth5 developed OSSS,a C++ class library [17], [18], [19]. The goal was to remedythese limitations. On top of OSSS an extra class library wasdeveloped, which encapsulates hardware-specific constructsto raise the level of abstraction and thus allow for a simplecreation of image processing modules. The library contains:
• Classes for image processing modules (see figure 4)• Classes for module behavior (called inner filter)• Helper classes e.g. row buffer
Fig. 4: SystemC-module for local filters [20]
A schematic representation of an image processing moduleis presented in figure 4. Next to the process calculate,which determines the behavior by calling an apply()method, a row buffer is instantiated. The row buffer createsthe neighborhood for a local filter. In addition an optionalport param exists, to be used by algorithms with parameterse.g. operation mode for demosaicing. The apply() methodcalculates the output based on data from the row buffer andoptional parameters. Overall, the class library simplifies thedevelopment. A developer has to subclass a template moduleclass and the inner filter. By subclassing the inner filter, theapply() method has to be implemented. As an examplethe implementation of binary thresholding follows. A richexplanation of the implementation can be found in [20].
Fig. 5: Binary thresholding
# i n c l u d e " L i b r a r y / e n c a p s u l a t i o n L i b r a r y . h "
c l a s s t h r e s h o l d _ f i l t e r :p u b l i c i n n e r _ f i l t e r < i n t , bool , i n t >{
/ / O v e r r i d e a p p l y methodbool a p p l y ( i n t in , i n t t h r ) {
i f ( i n >= t h r ) re turn true ;e l s e re turn f a l s e ;
}}
Listing 1: apply() for binary thresholding
The template class for an image processing module with-out neighborhood and with parameter was used as base class.The parameter defines the threshold. Listing 1 shows theimplemented apply() method (compare figure 5).
4http://www.offis.de, last access: 16.05.20135http://www.cosynth.com, last access: 16.05.2013
567
V. EXPERIMENTAL RESULTS AND COMPARISON
The goal of the following evaluation is to determine if thedevelopment approach with SystemC is a feasible alternativecompared to long-established ones based on VHDL. Theevaluation compares the approaches:
1) General development and derived development times2) Achievable performance on different FPGAs3) Resource consumption on an low cost FPGASubsection V-A describes the test environment with the
used software and FPGA development boards. The subsec-tions V-B to V-D discuss the results.
A. Test assembly
Fig. 6: Evaluation environment [20]
Figure 6 shows the test assembly. The assembly consistsof an FPGA-board which can be configured with severalIP-Cores to create an image processing pipeline. An off-the-shelf PC is used to program the FPGA, to send and receivetest image data via USB, and to compare the performance.The test assembly was used to evaluate the implementationand generate results. Also the output images of Figure 3have been generated with this assembly.
FPGAs from two different Xilinx FPGA families6
were used to generate the experimental results: Fromthe Spartan-6 family, designed for high volume and lowcost, the LX45 and from the Virtex-5 family, designedfor high-end applications, the LX50T were chosen. BothFPGAs are packaged on boards by Digilent Inc.7: Atlys forSpartan-6 LX45 and Genesys for Virtex-5 LX50T [21], [22].
B. Achievable performance
"The achievable clock rate of an FPGA configurationdepends on the depth of the computation in terms of logicblocks, and their relative placement, which determines thelength of the wires needed to connect them" [23]. Withrespect to the achievable clock rate from the synthesis reportsgenerated by Xilinx ISE and the number of clock cycles forevery algorithm the maximum throughput can be calculated[20]. Here, some assumptions were made:
• The image size is one mega-pixel (1024× 1024 pixel).
6http://www.xilinx.com/products/silicon-devices/fpga/index.htm, last access: 17.05.2013
7http://www.digilentinc.com/, last access: 10.05.2013
Fig. 7: Performance (in frames per second)
• The structure element size for morphology is 3×3 pixel.Additionally to the performance on an FPGA, the
achievable performance on the off-the-shelf PC is takeninto account8. The OpenCV-Library for C++ was used,which provides highly optimized implementations of thepresented algorithms [24]. This gives an impression howthe embedded solution performs. Figure 7 shows the result.
For the FPGA based solutions, VHDL offers the highestperformance for every algorithm. SystemC allows, depend-ing on the algorithm, between 20% (Canny algorithm onSpartan-6) and 76.8% (Binary morphology on Spartan-6) ofthe performance generated by the VHDL implementation.The binary morphology shows the smallest difference with23.2% for both types of FPGAs. The largest difference hasthe canny algorithm realized with SystemC on Spartan-6. Ithas only ≈ 20% of the performance compared to VHDL onthe Virtex-5 FPGA. By comparing the FPGA solutions toOpenCV it is possible to see how good the FPGA performs.Every VHDL solution exceeds OpenCV. The performancefor the Canny algorithm is more than doubled on the Virtex-5. This is noteworthy because the FPGA has a significantlylower energy consumption9. For SystemC only the Cannyalgorithm on Virtex-5 exceeds OpenCV. Overall SystemCallows for an average performance of 46.6% of VHDL.
C. Resource consumption
An FPGA contains a limited number of resources. Theseare at first slices that are used to realize logic functions. At
8Hardware: Intel Core i7 CPU with 2800MHz and 8GB RAM9Virtex-5 LX50T: about 10W [22]; PC with Intel Core i7: about 150W
568
second, these are components such as hard-wired memorycells or multipliers. Therefore, a hardware description shoulduse as few of these resources as possible. In this way, com-plex processing chains with several hardware descriptionscan be realized. Here, some assumptions were made:
• Target platform for synthesis is the Spartan-6 FPGA.• The image size is one mega-pixel (1024× 1024 pixel).
This influences the size of internal data structures.• The structure element size for morphology is 3×3 pixel.
Fig. 8: Resource consumption for Xilinx Spartan-6 FPGA
Figure 8 shows the results for every algorithm. The datawas gathered from the synthesis reports generated by XilinxISE. Three different types of resources have been usedto realize the designs: Slices, Block Rams (BRAMs) andDSP48s. BRAMs are a special kind of memory, whereasthe DSP48s are digital signal processing logic elements10.They can realize different arithmetic operations such asmultiplying or FIR11 filtering.
For the demosaicing the overall usage of resources islow. For SystemC, the resource consumption for slices ismultiplied by factor 5.4, the highest increase overall, and forBRAMs by 3.7. The increased consumption of BRAMs forSystemC can be easily explained: The row buffer in VHDLis optimized to use only one BRAM. In contrast, SystemCis using two BRAMs for every row buffered. SystemC useswith the DSP48s special resources that VHDL doesn’t need.The SystemC designs take them into account to realizemultiplications. In VHDL all multiplications are replaced byshift operations. This also applies for the canny algorithm.The binary morphology overall has the lowest resource
10see http://www.xilinx.com/support/documentation/user_guides/ug369.pdf, last access: 10.04.2013
11Finite impulse response
Fig. 9: Development time (in person days)
consumption, due to the low complexity of the design.The used slices are multiplied by factor 3 for SystemCcompared to VHDL. The number of BRAMs is the same.Compared to demosaicing and morphology, the multistagecanny algorithm is more complex. The implementation withVHDL uses 1.2% and SystemC 6.4% of the available slices(factor 5.3). The usage of BRAMs is increased by factor 8.
D. Development time
Figure 9 shows the rounded development times for thepresented algorithms. The familiarization period with thelanguage and the respective algorithm is not taken intoaccount. This also applies to the development of the imageprocessing class library in SystemC. The development timesfor demosaicing and binary morphology are comparable.They only differ in a few hours. The development time isreduced to only one and a half person-days when usingSystemC. For the canny algorithm, the effort is reduced by40%, from seven and a half to four and a half days.
VI. DISCUSSION
This paper describes the hardware implementationof several image processing algorithms and comparesthem regarding performance, resource consumption anddevelopment time. The implementation was done usingVHDL and C++ with SystemC and OSSS. The developmentwith SystemC was improved by creating an image processingclass library on top of OSSS, which raised the abstractionlevel by encapsulating hardware specific constructs.
The development procedure has an important influenceon the needed development time. Using SystemC, combinedwith the extra class library, it was possible to reduce thedevelopment expenses for every implemented algorithm.For less complex designs an improvement of up to 78% ispossible, due to the higher abstraction level. By reducingthe needed knowledge about hardware development and thetarget platform, it is possible to fully concentrate on thealgorithm. C++ as base also shortens the familiarizationperiod. The development of hardware with VHDL isnot possible without learning the tools, syntax and thesynthesis subset. Additionally to the higher abstractionlevel, the simulation and testing has a high influence tothe development time. While VHDL development relieson the analysis of often complex waveforms, SystemCallows the direct usage of image data. Because C++ is
569
used as a base, class libraries for image in- and outputcan be used. The reduced development time allows fora faster time-to-market and an improved reaction tochanges in the market. Overall, with SystemC it is possibleto include software developers into the development process.
The comparison of the performance showed that thethroughput for every SystemC design is below the VHDLbased solution. This is mostly due to the face that theSystemC design was only slightly optimized towards thesynthesis tool, and that the CoSynth Synthesizer doesnot extract parallelism automatically from the hardwaredescriptions. Furthermore, the automated translation resultsin larger designs which prevent high clock rates ofhandcrafted VHDL. In comparison to OpenCV, SystemCsometimes has less throughput. However this is not ageneral drawback because image processing chains consistof several algorithms. While the performance of PC-basedimage processing lowers due to chaining, FPGA-basedSystems still deliver the same performance [8].
The general resource consumption for both approaches islow. SystemC has a higher need for resources compared toVHDL. Nevertheless, both approaches are able to realizecomplex image processing chains with several hardwaremodules. The difference between VHDL and SystemCis partially due to the overhead in the generated VHDL.Another factor for more complex designs is that sharedresources are not automatically identified by the synthesistool and thus are sometimes buried beneath the generatedstructures. Furthermore, the synthesizer applies somerestrictions for the designs in SystemC, which requiresfurther optimization techniques.
For microscale automation, image processing speed is amajor concern. If the achievable update rate of the SystemCdesign is sufficient for an automation task, this approachcan be a good alternative to the handcrafted designs, dueto its shorter development time. The above mentioned char-acteristics of FPGAs (e.g. latency, update-rate deviation) arealso valid for the SystemC design. However, for high speedvisual servoing the timing performance of the handcraftedVHDL design is superior to the generated design. Therefore,to improve the closed-loop positioning speed of tools andspecimen, VHDL or Verilog designs should be used.
VII. OUTLOOK AND FUTURE WORK
The next steps will target implementation and evaluationof other algorithms such as random access algorithmsfor image processing and control tasks. Additionally theevaluation will be extended to more developers in order toimprove the comparisons of development times.
A full image-processing chain for visual servoing ofmicrorobots will be developed in SystemC and benchmarkedto the existing VHDL designs. Additionally to statisticaldata (e.g. update rate), the effects of different update rates
on the closed-loop positioning speed will be measured.
REFERENCES
[1] T. Sievers and S. Fatikow, “Real-Time Object Tracking for the Robot-Based Nanohandling in a Scanning Electron Microscope,” Journal ofMicromechatronics - Special Issue on Micro/Nanohandling, vol. 3, no.3-4, pp. 267–284(18), 2006.
[2] C. Diederichs, “Hardware-Software Co-Design Tracking System forPredictable High-Speed Mobile Microrobot Position Control,” in Proc.of IFAC Symposium on Mechatronic Systems, 2010.
[3] ——, “Fast Visual Servoing of Multiple Microrobots using an FPGA-Based Smart Camera System,” in Proc. of the 18th IFAC WorldCongress, 2011.
[4] D. Jasper, C. Diederichs, C. Edeler, and S. Fatikow, “High-speednanorobot position control inside a scanning electron microscope,”ECTI Transactions on Electrical Eng., Electronics, and Communica-tions, vol. 9, no. 1, pp. 177–186, Feburary 2011.
[5] C. Diederichs and S. Fatikow, “FPGA-based high-speed object detec-tion and classification inside optical microscopes,” in Proc. of the 8thInternational Workshop on Microfactories, June 2012.
[6] IEEE Standard VHDL Language Reference Manual, 1076-2000 ed.,The Institute of Electrical and Electronics Engineers, Inc., 345 East47th Street, New York, NY 10017-2394, USA, 2000.
[7] IEEE Standard Hardware Description Language Based on the VerilogHardware Description Language, 1364-2001 ed., The Institute ofElectrical and Electronics Engineers, Inc., 345 East 47th Street, NewYork, NY 10017-2394, USA, 2001.
[8] D. G. Bailey, Design for Embedded Image Processing on FPGAs.John Wiley & Sons (Asia) Pte Ltd, 2011.
[9] I. Alston and B. Madahar, “From C to netlists: Hardware engineeringfor software engineers?” Electronics & Communication EngineeringJournal, vol. 14, no. 4, pp. 165–173, 2002.
[10] S. A. Edwards, “The Challenges of Synthesizing Hardware from C-Like Languages,” vol. 23, no. 5, 2006, pp. 375–386.
[11] ——, “The Challenges of Hardware Synthesis from C-LikeLanguages,” in Proceedings of the conference on Design,Automation and Test in Europe - Volume 1, ser. DATE ’05.Washington, DC, USA: IEEE Computer Society, 2005, pp. 66–67.[Online]. Available: http://date.eda-online.co.uk/proceedings/papers/2005/date05/pdffiles/01e_1.pdf
[12] Spartan-3E FPGA Family Data Sheet, Xilinx Inc., October 2012.[13] J. Canny, “A computational approach to edge detection,” IEEE Trans.
Pattern Anal. Mach. Intell., vol. 8, pp. 679–698, November 1986.[14] Axis Communications, “CCD and CMOS sensor technology -
Technical white paper,” Tech. Rep., 2010. [Online]. Available: http://www.axis.com/files/whitepaper/wp_ccd_cmos_40722_en_1010_lo.pdf
[15] R. Jean, “Demosaicing with the bayer pattern,” University of NorthCarolina, Tech. Rep., 2004.
[16] M. Nosrati, R. Karimi, and M. Hariri, “Detecting circular shapes fromareal images using median filter and CHT,” Global Journals Inc. (US),vol. 12, 2012.
[17] K. Grüttner, C. Grabbe, F. Oppenheimer, and W. Nebel, “Object Ori-ented Design and Synthesis of Communication in Hardware-/SoftwareSystems with OSSS,” in Proceeding of SASIMI 2007, Hokkaido, Japan,Oct. 2007.
[18] K. Grüttner, F. Oppenheimer, and W. Nebel, “OSSS Methodology -System-Level Design and Synthesis of Embedded HW/SW Systemsin C++,” in Proceedings of ISABEL 2008, Jan. 2008.
[19] OSSS- A Library for Synthesisable System Level Models in SystemC,OFFIS, R&D Division Transportation, Hardware/Software DesignMethodology Group, Oldenburg, 2010.
[20] T. Tiemerding, “Implementation and Evaluation of automated gen-erated IP-Cores for image processing systems on FPGAs,” Master’sthesis, University of Oldenburg, 2012.
[21] Digilent Inc., Genesys Board Reference Manual, 2012.[22] ——, Genesys Board Reference Manual, 2012.[23] B. A. Draper, J. R. Beveridge, A. P. W. Böhm, C. Ross,
and M. Chawathe, “Accelerated Image Processing on FPGA,”IEEE Transactions on Image Processing, vol. 12, pp. 1543–1551,2003. [Online]. Available: http://www.cs.colostate.edu/pubserv/pubs/Draper-draper-publications-draper_tip03.pdf
[24] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of SoftwareTools, 2000.
570