Corporate Overview Founded in 2007, located in McLean VA
Mission: Provide easy to use, real-time 3D computer vision (CV)
technology for embedded and mobile applications 2D to 3D for better
visualization, higher reliability, and accuracy Solve problems that
require spatial measurements (e.g. parts inspection) Target
customer: Application and System Developers Enhance existing
product or develop new products Product: Starry Night 3D-CV
Middleware (Unity Plugin) Operating Systems: Android and Linux 3D
Sensor: Occipital Structure and Intel RealSense Processors: ARM and
Xilinx Zynq Our focus Object recognition Feature detection Analysis
(e.g., measurements)
Slide 5
Potential Applications 3D Printing Parts Inspection Robotics
Entertainment Automotive Safety Security Medical Imaging
Slide 6
Challenges for Implementing Real-Time 3D Computer Vision Busy
uncontrolled real-world environment Limited processing power and
memory Noisy and uncalibrated low-cost scanners Difficult to use
libraries Hard to find proficient computer vision engineers Lack of
standards Large development investment
Slide 7
Starry Night Unity Plugin (patent pending) Starry Night Video:
https://www.youtube.com/watch?v=IZX-9PH7Erw&feature=youtu.be
Slide 8
The Starry Night Template-Based 3D Model Reconstruction
Reliable - The output is always a fully-formed 3D model with known
feature points despite noisy or partial scans Easy to use Fully
automated process Powerful Known data structure for easy analysis
and measurement Fast Real-time modeling Input Scan (Partial) +
Reference Model = Full 3D Model
Slide 9
3D Object Recognition Algorithm for mobile and embedded
Devices
Slide 10
Challenges - Scene Busy scene, object orientation, and
occlusion
Slide 11
Challenges - Platform Mobile and Embedded Devices ARM A9 or
A15,
Slide 12
Previous Approaches (2D) Texture-Based Methods Color-based
depends heavily on lighting or color of the object Machine learning
robust, but requires training for each object Neither method
provides transform (i.e., orientation) (3D) Methods Hough transform
and geometric hashing slow Geometric hashing even slower Tensor
matching not good for noisy and sparse scene Correspondence-based
methods using rigid geometric descriptors The models must have
distinctive feature points which is not true for most models (i.e.,
cylinder) Tried
Slide 13
General Concept for CV-Based Object Recognition Reference
Object Descriptor Scene Compare Distance & Normal Distance
& Normal of Random Sample Points Match Criteria Fine-Tune
Orientation Location Transpose
Slide 14
Block Diagram
Slide 15
Model Descriptor (Pre-Processed) Sample all point pairs in the
model that are separated by the same distance D Use the surface
normal of the pair to group them into the hash tablet key
(1,1,1)P1, P2P3, P4 (2,2,2)P5, P6P7, P8P9, P10P11, P12 (3,3,3)P13,
P14 Note: In the bear example, D = 5 cm which resulted in 1000
pairs Note: The keys are angles derived from the normal of the
points. alpha() = first normal to second point beta() = second
normal to first point omega() = angle of the plane between two
points
Slide 16
Object Recognition Workflow Grab Scene Sample point pair w/
distance D using RANSAC Generate key using same hash function Use
key to retrieve similarly oriented points in the model & rough
transform Match criteria to find the best match Use ICP to refine
transform Note: The example scene has around 16K points Note: We
iterated this sampling process 100 times Note: Entire process can
be easily parallelized Very Important: Multiple models can be found
using a single hash table, for example, sampled point pair in the
scene
Slide 17
Implementation Result Object Recognition Video:
https://www.youtube.com/watch?v=h7whfei0fTw&feature=youtu.be
Slide 18
Object Recognition Examples * CONFIDENTIAL *18
Slide 19
Adaptive 3D Object Recognition Algorithm Resize and
Reshape
Slide 20
Object Recognition for Different Sizes & Shape Objects in
the real world are not always identical Similarity Factor, S%, can
be used to denote % of shape difference This allows recognition of
object thats similar but does not have the exact shape as the
reference model Size Factor, Z%, can be used to note the % size the
object can recognize This allows recognition of object thats of
different sizes from the reference model
Slide 21
General Approach Dynamically resizes the reference model
Dynamically reshapes the reference model Uses our Shape-based
Registration technique Hence, the reference model is deformed to
match the object in the scene Results in very robust object
recognition The end reference model best represents the object in
the scene both in size and shape
Slide 22
Block Diagram Adaptive Object Recognition with feedback
Reference model is iteratively modified with every new frame until
it converges into the same object in the scene Note: Currently in
the process of being implemented and will be available in Version
1.2 later this year
Slide 23
Object Recognition Performance Numbers
Slide 24
Reliability (w/ bear model) Reliability % false positives
depends on the scene Clean scene: 99% Model facing sideways
(narrower): 85%
Slide 25
Performance - Mobile Performance on Cortex A-15 2GHz ARM (on
Android mobile) Amount of time it takes to find one object Single
thread: 2 seconds Multi-thread & NEON: 0.3 second Amount of
time it takes to find two objects Single thread: 2.5 seconds
Multi-thread & NEON: 0.5 second Note: Effective use of NEON led
to significant performance gains of X2.5 for certain functions
Slide 26
Hardware Acceleration Using FPGA Xilinx Zynq SoC provides 20 to
1,000 parallel voxel processors depending on the size of the FPGA
Zynq ARM FPGA Processor 1 Processor 20+ voxel scan
Slide 27
Hardware Acceleration: FPGA (Xilinx Zynq) Select Functions to
Be Implemented in Zynq FPGA: Matrix operations Dual-core ARM: Data
management + Floating point Entire implementation done in C++
(Xilinx Vivado-HLS)
Slide 28
Performance: Embedded Using FPGA Note: Currently, only 30% of
the computationally intensive functions are implemented on the FPGA
with the rest still running on ARM A9. Speed will be much improved
once the remaining high-intensity functions are transferred to the
FPGA. Performance on Xilinx Zynq (Cortex A-9 800 MHZ + FPGA) Amount
of time it takes to find one object Zynq 7020: 0.7 second Zynq 7045
(est.): 0.1 second No test results for two objects, but should
scale the same way as for the ARM
Slide 29
Future The chosen algorithm works well in most real-world
conditions The chosen algorithm is tolerant to size and shape
differences respect to the reference model The chosen algorithm can
find multiple objects at the same time with minimal additional
processing power Additional improvements in performance are needed
Algorithm Application-specific parameters (e.g., size of the model
descriptor) ARM - NEON Optimize the use of FPGA core
Slide 30
Summary Key implementation issues Model descriptor Data
structure Sampling technique Platform IMPORTANT Both ARM & FPGA
provide the scalability Therefore Real-time 3D object recognition
was very difficult but successfully implemented on both mobile and
embedded platforms! LIVE DEMO AT THE Xilinx BOOTH!
Slide 31
Resources www.vangoghimaging.com Android 3D printing:
http://www.youtube.com/watch?v=7yCAVCGvvsohttp://www.youtube.com/watch?v=7yCAVCGvvso
Challenges and Techniques in Using CPUs and GPUs for Embedded
Vision by Ken Lee, VanGogh Imaginghttp://www.embedded-
vision.com/platinum-members/vangogh-imaging/embedded-vision-
training/videos/pages/september-2012-embedded-vision-summithttp://www.embedded-
vision.com/platinum-members/vangogh-imaging/embedded-vision-
training/videos/pages/september-2012-embedded-vision-summit Using
FPGAs to Accelerate Embedded Vision Applications, Kamalina Srikant,
National Instruments http://www.embedded-
vision.com/platinum-members/national-instruments/embedded-vision-
training/videos/pages/september-2012-embedded-vision-summithttp://www.embedded-
vision.com/platinum-members/national-instruments/embedded-vision-
training/videos/pages/september-2012-embedded-vision-summit
Demonstration of Optical Flow algorithm on an FPGA
http://www.embedded-vision.com/platinum-members/bdti/embedded-
vision-training/videos/pages/demonstration-optical-flow-algorithm-fpg
http://www.embedded-vision.com/platinum-members/bdti/embedded-
vision-training/videos/pages/demonstration-optical-flow-algorithm-fpg
* Reference: An Efficient RANSAC for 3D Object Recognition in Noisy
and Occluded Scenes by Chavdar Papazov and Darius Burschka.
Technische Universitat Munchen (TUM), Germany.