Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Technische Universität München
Institute for Integrated SystemsProf. Dr. sc. techn. Andreas Herkersdorf Prof. Dr.-Ing. Walter Stechele
Arcisstrasse 21
80290 Munich, Germany
http://www.lis.ei.tum.de
AutoVision- Reconfigurable Hardware for video-based Driver Assistance Systems
Speakers
Dipl.-Ing. Christopher Claus
Technische Universität München
AutoVision - 24.9.2009 - 2
Agenda
• Video-based Driver assistance system: AutoVision
• Dynamic Partial Reconfiguration– Fast dynamic partial reconfiguration
– Inter- and Intra-video frame reconfiguration
– Reconfiguration Scheduling
– Performance Results
– Configuration verification
• Cooperations
• Demonstrators
• Publications
• Outlook & Conclusion
• Prototype
Technische Universität München
AutoVision - 24.9.2009 - 3
AutoVision Processor
X
Taillight
Engine
XInside Tunnel
XXUrban
environment
XXXTunnel entrance
XXHighway
PPCOptical
Flow
Cont/Edge Engine
Tunnel Engine
Shape Engine
Region to enhance contrast
SDRAM
TunnelE.
TaillightE.
EdgeEng
EdgeEng
Virtex II Pro FPGA
PPC1I/O
PPC0 Video IF
PLB
Coproc0ShapeEng Coproc1EdgeEng ICAP MEM IF
EdgeEng
ShapeEngShapeEng
CoprocessorConfigurations
Research Foci• Fast dynamic partial reconfiguration
(reducing reconfiguration costs)• reconfigurable SoC Architectures• (HW/SW Codesign of
Image processing Algorithms)
Technische Universität München
AutoVision - 24.9.2009 - 4
DPR in mutually exclusive driving situations
• Daytime vs. Nighttime:
Detection of feature points (corners at day time) vs. Detection of taillights at nighttime
• Driving direction:
backward vs. forward driving, different algorithms for front-and rear camera
• Different velocities:
optical flow algorithm for motion detection during driving (highway), background subtraction when car is standing still (urban environment)
• Different weather conditions:
fog, sun, snow, rain etc.
• Hardware (ASIC) usually too inflexible
• Dynamic partial reconfiguration used to cope with that problem
• Situation adaptive system for driver assistance
Technische Universität München
AutoVision - 24.9.2009 - 5
Fast DPR
BL [cycles] * DW [bit]
BL [cycles] + L [cycles]
* fW [MHz] > IIW [bit] * fR [MHz]Fifo filled?
fRfW
FIFO
DW
FSM
IIW
ICAP
BL: BurstlengthIIW: ICAP input widthDW: Data widthL: Memory LatencyfW:: write frequencyfR:: read frequencyTP: ThroughputBF: Busy factorBS: Bitstream sizeTR: Reconfiuration time
Throughput: TP [byte/s] = fR [Hz] * IIW [byte] * BF [0,1]
Reconfiguration time: TR [s] = BS [byte] / TP [byte/s]
1st year: Combitgen -> BS2nd year: ICAP Ctrl -> TP3rd year: Optimized Engines -> BS4th year: Overclocking ICAP -> TP
DW
PLB
MEM
L
Technische Universität München
AutoVision - 24.9.2009 - 6
Motivation of fast reconfigurationRemove overhead, speed up reconfiguration process
Frame-inputrate 25 frames/sec -> 40 ms to process one image.
Safety critical systems -> no frame must be dropped
Inter-videoframe reconfiguration (Intra VFR)One Reconfiguration between two video frames
If image processing can be done in 35 ms -> 5 ms to reconfigure device.
No frame has to be dropped.
Intra-videoframe reconfiguration (Intra VFR)Multiple reconfigurations within one video frame
Tunnel Entrance detection followed by contrast enhancement -> coprocessors implemented in parallel
If Tunnel Entrance detection can be done in 17 ms + Contrast enhancement in 15 ms -> modules can time share one slot if reconfiguration is fast enough.
Only one module is active -> saving of resources
Technische Universität München
AutoVision - 24.9.2009 - 7
Inter-/Intra-videoframe reconfiguration
80 ms
Frame#1t
40 ms
Imag
e P
roce
ssin
g
Rec
onfig
urat
ion Intra-videoframe
Reconfiguration(Inter VFR)
Rec
onfig
urat
ion
Imag
e P
roce
ssin
gFrame#2
Imag
e P
roce
ssin
g
80 ms
Frame#1t
40 ms
Imag
e P
roce
ssin
g
Rec
onfig
urat
ion
Inter-videoframeReconfiguration
(Intra VFR)Imag
e P
roce
ssin
g
Frame#2
Imag
e P
roce
ssin
g
Technische Universität München
AutoVision - 24.9.2009 - 9
Intra VFR
InputFSM
LIS IPIF
InputLocal Mem
Matrix Userlogic1
Intermed.Local Mem
Matrix
PLB
Userlogic2
Output.Local Mem
64 64
8
Low pass filter Census-transformation
InputFSM
LIS IPIF
InputLocal Mem
Matrix Userlogic3
Output.Local Mem
6464
64
Matching
SDRAM
1 2
1+2
Sequential processing
Technische Universität München
AutoVision - 24.9.2009 - 10
Intra VFR
PLB
Low pass filter
InputFSM
LIS IPIF
InputLocal Mem
Matrix Userlogic1
Intermed.Local Mem
Matrix Userlogic2
Output.Local Mem
64 64
8
Census-transformation
SDRAM
InputFSM
LIS IPIF
InputLocal Mem
Matrix Userlogic3
Output.Local Mem
6464
64
Matching
1 2
Reconfigurable processing
Exchange
Technische Universität München
AutoVision - 24.9.2009 - 11
Reconfiguration scheduling
Imag
e P
rocessin
g
CensusEngine
MatchingEngine
PostprocessingPowerPC
40 ms deadline (25 fps)
5 ms 9 ms 6 ms
DPR
DPR
Ve
rify
Ve
rify
Dis
able
mac
ros
Res
etEn
gine
Con
figur
eIC
AP
Rec
onfig
ure
devi
ceS
end
inte
rrupt
Han
dle
inte
rrupt
Cle
arre
set
Initi
aliz
eEn
gine
Enab
lem
acro
s
Technische Universität München
AutoVision - 24.9.2009 - 12
DPR Results (1)
252.5μs15134 μsMeasured TR (OPB_HWICAP)
3.2 μs803 μsMeasured TR (PLB_ICAP)
3.2 μs802 μsCalculated TR
5.07 MB/s4.77 MB/sMeasured TP (OPBHWICAP)
400.0 MB/s89.9 MB/sMeasured TP (PLB_ICAP)
400.0 MB/s90.0 MB/sCalculated TP
1.00.9BF
128072192BS[byte]
4 byte1 ByteIIW
DDRDDRMain Memory
100 Mhz100 MhzFrequency
Virtex-4 FX (XC4VFX20)Virtex-II Pro (XC2VP30)Device
80 timesfaster
• Maximum frequency has been determined on V2P (160 MHz).
Technische Universität München
AutoVision - 24.9.2009 - 13
DPR Results (2)
400350.010032OPB100[1]Virtex4
40022.910032PLB100XilinxVirtex4
LIS
LIS
[2]
Source
400371.410032PLB/NPI100Virtex4
Device fW [MHz] PIM IIW [Bit]
fR [MHz] measuredThroughput
[MB/s]
TheoreticalThroughput
[MB/s]
Virtex 4 100 NPI 32 100 400 400
Virtex 5 100 PLB 32 100 400 400
=> fastest dynamic partial reconfiguration worldwide
LISVirtex 5 100 PLB 32 125 500 500
LISVirtex 5 150 NPI 32 150 600 600
LISVirtex 5 150 NPI 32 200 800 800
LISVirtex 5 150 NPI 32 250 1000 1000
LISVirtex 5 200 NPI 32 300 1200 1200
[1] Manet et. al.: “An Evaluation of Dynamic Partial Reconfiguration for Signal and Image Processing in Professional Electronics Applications”, EURASIP Journal, 2008[2] Liu et. al.: Run-time partial reconfiguration speed investigation and architectural Design Space Exploration”, FPL 2009
Technische Universität München
AutoVision - 24.9.2009 - 14
Online Verification
MemCtrl
DDRSDRAM
ICAPCtrl
ICAPPORT
ConfigMem
Data
Busy
CRCIP
ICAPOUT
User domain System domain
User System Result0 0 0
001
010 1
11
• Merging information from two domains• Long term test to see if overclocking ICAP
is safe.
Technische Universität München
AutoVision - 24.9.2009 - 15
Cooperations
SPPRR:
• Teich, Angermeier, FAU Erlangen: “VideofilterEngines on the ESM”, Raphael Polig, Matthias Kovatsch, Ulrich Batzer
• Becker, Hübner, ITIV Karlsruhe: „Visualization of fast DPR“, Lars Braun, Bin Zhang
• Maehle, El Sayed Auf, ITI Lübeck: „Embedded Image Processing for robots“, Johnny Paul
• Platzner, Lübbers Paderborn: „Fast Reconfiguration“
others:
• LRT, Munich: “HW Acceleration for still image compression - FPGAs in space”,Stephan Schropp, Florian Aschauer
• Robert Bosch GmbH: “FPGAs in Cars”, Robert Hartl
• BMW: “Optical flow”, Andreas Laika
• Xilinx Automotive: “Intra Video Frame Reconfiguration of the Optical Flow”, FlorianAltenried
• Xilinx Automotive: “Head and Taillamp detection on low cost FPGA devices”, SyedAbbas Ali, Darmstadt (Glesner)
Technische Universität München
AutoVision - 24.9.2009 - 16
Demonstrators
AutoVision in a 5-series BMW
Intelligent Vehicles Symposium 2009
DATE 2008
Cebit 2008
Technische Universität München
AutoVision - 24.9.2009 - 17
Publications• C. Claus, R. Huitl, J. Rausch, W. Stechele, "Optimizing the SUSAN corner detection algorithm for
a high speed FPGA implementation", 19th International Conference on Field Programmable Logic and Applications (FPL09), Prague, Czech Republic, August 31 - September 2, 2009
• C. Claus, A. Laika, L. Jia, W. Stechele, "High performance FPGA based optical flow calculation using the census transformation", The Intelligent Vehicles Symposium (IV'09), Xi'an, China, June 3-5, 2009
• C. Claus, W. Stechele, M. Kovatsch, J. Angermeier, J. Teich, "A comparison of embedded reconfigurable video-processing architectures", Proceedings of the International Conference on Field Programmable Logic and Applications (FPL08), Heidelberg, Germany, September 8-10, 2008
• C. Claus, B. Zhang, W. Stechele, L. Braun, M. Hübner, J. Becker, "A multi-platform controller allowing for maximum dynamic partial reconfiguration throughput", Proceedings of the International Conference on Field Programmable Logic and Applications (FPL08), Heidelberg, Germany, September 8-10, 2008
• J. Angermeier, U. Batzer, M. Majer, J. Teich, C. Claus, W. Stechele, "Reconfigurable HW/SW Architecture of a Real-Time Driver Assistance System", International Workshop on Applied Reconfigurable Computing (ARC2008), Imperial College London, U.K., March 26-28, 2008
• N. Alt, C. Claus, W. Stechele, "Hardware/software architecture of an algorithm for vision-based real-time vehicle detection in dark environments", Design, Automation & Test in Europe (DATE 2008), Munich, March 10-14, 2008
• M. Ihmig, N. Alt, C. Claus, A. Herkersdorf, "Resource-efficient Sequential Architecture for FPGA-based DAB Receiver", Workshop zu Software Radio “WSR 08”, Karlsruhe, March 5-6, 2008
• C. Claus, W. Stechele, A. Herkersdorf, "Autovision-A Run-time Reconfigurable MPSoCArchitecture for future Driver Assistance Systems", it - Information Technology Journal, Issue No. 3, June 20, 2007
Technische Universität München
AutoVision - 24.9.2009 - 18
Publications• C. Claus, B. Zhang, M. Huebner, C. Schmutzler, J. Becker, W. Stechele, "An XDL-based
busmacro generator for customizable communication interfaces for dynamically and partially reconfigurable systems", Workshop on Reconfigurable Computing Education at ISVLSI 2007, Porto Alegre, Brazil, May 12, 2007
• M. Hübner, L. Braun, J. Becker, C. Claus, W. Stechele, "Physical Configuration On-Line Visualization of Xilinx Virtex-II FPGAs", IEEE Computer Society Annual Symposium on VLSI (ISVLSI '07), pp. 41-46, Porto Alegre, Brazil, May 9-11, 2007
• C. Claus, J. Zeppenfeld, W. Stechele, "Using Partial-Run-Time Reconfigurable Hardware to accelerate Video Processing in Driver Assistance Systems", Proceedings of DATE 2007, Nice, France, April 16-20, 2007
• C. Claus, J. Zeppenfeld, F. H. Müller, W. Stechele, "A new framework to accelerate VirtexII Pro dynamic partial self-reconfiguration", 14th Reconfigurable Architectures Workshop, Long Beach, CA, March 26-27, 2007
• M. Vuletic, P. Ienne, C. Claus, W. Stechele, "Multithreaded Virtual-Memory-Enabled Reconfigurable Hardware Accelerators", accepted for IEEE International Conference on Field Programmable Technology, FPT, Bangkok, Thailand, December 13-15, 2006
• C. Claus, H. C. Shin, W. Stechele, "Tunnel Entrance Recognition for video-based Driver Assistance Systems", IWSSIP 2006, 13th International Conference on Systems, Signals and Image Processing, Budapest, Hungary, September 21-23, 2006
• A. Herkersdorf, C. Claus, M. Meitinger, R. Ohlendorf, T. Wild, "Reconfigurable Processing Units vs. Reconfigurable Interconnects", Dagstuhl Seminar on Dynamically Reconfigurable Architectures, Dagstuhl Seminar Proceedings 06141, April 2-7, 2006
• C. Claus, F. Müller, W. Stechele, "Combitgen: A new approach for creating partial bitstreams in Virtex-II Pro devices", International Conference on Architecture of Computing Systems, ARCS 2006, in GI Lecture Notes in Informatics, Workshop on Dynamically Reconfigurable Systems, Frankfurt, March 16, 2006
Technische Universität München
AutoVision - 24.9.2009 - 19
Outlook: Computing in space
• Image Compression in a rocket on aSpartan3 device
• Rexus rocket project (Rocket Experimentsfor University Students)
• Video frames are grabbed by a camera,Compressed and sent back to the earth
• Reconfiguration in space meaningful
Technische Universität München
AutoVision - 24.9.2009 - 20
Outlook
• Proof that partial exchange of modules is possible within 40 ms.
• Demonstrator for Intra VFR on Virtex 5
• > 1 000 000 reconfigurations with overclocked ICAP
• Two projects in collaboration with Xilinx Automotive
• Implementation of OF on V5 (internal structure similar to upcoming automotive devices)
• Intelligent headlamp and taillight detection on Spartan3
Technische Universität München
AutoVision - 24.9.2009 - 21
Conclusion
Lessons Learned:
• Reconfigurable Computing (DPR) offers great potential
• HW blocks can time share the same resources.
• Beneficial when massive parallel processing and mutually exclusive tasksare demanded.
But:
• Current Flow requires a lot of hand work.
• Limited tool support and documentation.
• New PR Flow (ISE Design Suite 12) will significantly ease this process.
Technische Universität München
AutoVision - 24.9.2009 - 22
Prototype
Virtex II Pro FPGA
PPC1I/O
PPC0 VGA core
PLB
Coproc0DVI Coproc1Engine ICAP MEM IF
010111001000110
Placement Visualization
FPGA Editor
Technische Universität München
AutoVision - 24.9.2009 - 23
Acknowledgements
Special thanks to:
Walter Stechele, Andreas Herkersdorf
Abdallah Youssef Andreas Laika Benjamin Gatscher Bin ZhangCarlos Bernal Dirk Pomsler Feng Tao Firat KilincFlorian Altenried Florian Müller Hoo Chang Shin Ingmar CrammJian Wang Jian Lu Joachim Rausch Johannes ZeppenfeldLars Braun Lei Jia Nico Kiessling Nicolas AltNing Chen Radoslav Denchev Raphael Polig Rehan AhmedRobert Hartl Robert Huitl Stephan Schropp Tobias KafflYun Hu Zike Chen Florian Aschauer Aurang Zaib