"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation from Qualcomm

1

High-resolution 3D Reconstruction

on a Mobile Processor Michael Mangan

Senior Product Manager

Qualcomm Technologies, Inc.

May 3, 2016

2

30

years of driving the evolution of wireless

#1

in 3G/4G LTE modem

#1

in RF

Source: Qualcomm Incorporated data. Currently, Qualcomm semiconductors are products of Qualcomm Technologies, Inc. or its subsidiaries

IHS, Jan. ’16 (RF); Strategy Analytics, Dec. ’15 (modem, AP)

3

Qualcomm® Snapdragon™ Chipsets drive new experiences

Context aware computing

Machine learning

Computing performance

VR / AR - beyond small screen

360 degree camera

3D and low-light photography

Security

Biometric sensor

Virtual SIM/Multiple devices

Ultra HD VoLTE / audio quality

4G+

Wi-Fi

Superior converged connectivity

Qualcomm Snapdragon is a product of Qualcomm Technologies, Inc.

Gaming

4

What is Active Depth Capture?

Depth provides z-dimension to a scene; a photograph provides only x-y information.

Two ways to capture depth information from a scene or object:

Passive Depth Capture:

(No IR Transmitter)

• Stereo RGB cameras can passively

generate a depth map of a scene.

• Baseline separation between the cameras

causes parallax between the two received

images.

• Parallax can be used to infer a disparity

estimate, which in turn is used to

generate a depth map.

Active Depth Capture:

(IR Transmitter)

• IR laser transmits, various

techniques are used to infer depth

from the reflected laser.

» Time of Flight

» Active Stereo

» Structured Light

5

Depth from Structured Light— Technology Overview

Depth information is generated

using a structured light sensor

• Coded pattern is projected onto the scene

using near infrared (NIR) light

• NIR camera receives the reflected,

distorted pattern

• Codes in the received image are matched

against known codes in the transmitted

pattern

• Depth at each code location estimated

from the disparity between original and

received code positions, leading to

a dense depth map

NIR image

Depth map

coded pattern

transmitter receiver

6

Scanner Flow in Action

3DR_Demo.mp4

7

Scanner Block Diagram

Scan

Starts

Color + Depth

(Structure Light Depth

Based Generation)

Live 3D

Renderer/Viewer

USER MOVES USER STOPS

Scan

Finishes

USE CASE:

3D Printing, Social

Networking, Gaming

Avatars, etc.

Computer Vision Based

Initial Pose Estimation

Inertial Motion

Sensor Fusion

Bundle

Adjustment HD Texture

Generation

3D Mesh

Generation

Color

Correction

TR

AC

KIN

G /

AL

IGN

ME

NT

8

Scanner System Architecture

3D Scanner Application

RGBD Image Grabber

Camera 2 API Depth JNI 3D Scanner JNI

Depth Engine

(DSP/HVX)

RGB

Grabber

NIR

Grabber

3D Scanner Engine

(CPU/GPU)

SysFS Camera HAL Camera HAL

Raw

RGB Data

Raw

NIR Data Driver

Laser NIR Camera RGB

Camera Active Sensing Module

Note: Arrows indicate

dependency, not dataflow

Ap

ps (J

ava

) M

idd

lew

are

(C+

+)

Driv

ers

(C)

Ha

rdw

are

9

3DR Workload Summary— Running on Snapdragon 820

3D Reconstruction requires running

several computational demanding

processes simultaneously:

1. Camera Pose Tracking

2. Sensor Fusion

3. Bundle Adjustment

4. Rendering

5. Mesh Generation

6. Texture Mapping

7. Structured Light Sensor Decoding

Thanks to the heterogeneous computational

framework of the Snapdragon 820, we are able

to do all of this at 15 FPS:

Cryo—CPU/Neon: • Pose Tracking

• Bundle Adjustment

• Sensor Fusion

• Mesh Generation

Adreno—GPU: • Rendering

• Texture Mapping

Hexagon—DSP/HVX: • Depth from Structured

Light

3DR powered by

Snapdragon 820

Spectra ISP: • RGB sensor processing

• Depth sensor interface

10

Highest quality 3DR requires

great HW & SW. Efficient CV

SW algorithms, operating with

accurate depth sensors, &

power efficient processors,

bring commercial grade 3DR

to mobile platforms.

Lessons Learned

Running 3DR on mobile

requires tuning algorithms for

power as well as performance.

Power efficient heterogeneous

processors are mandatory for

3DR to run within mobile power

and thermal envelopes.

The heterogeneous

processing cores on

Snapdragon 820, enable a

high-quality, 3DR experience

on mobile platforms.

11

3DR Algorithmic Details

12

Scanner Block Diagram

Scan

Starts

Color + Depth

(Structure Light Depth

Based Generation)

Live 3D

Renderer/Viewer

USER MOVES USER STOPS

Scan

Finishes

USE CASE:

3D Printing, Social

Networking, Gaming

Avatars, etc.


Initial Pose Estimation

Inertial Motion

Sensor Fusion

Bundle

Adjustment HD Texture

Generation

3D Mesh

Generation

Color

Correction

TR

AC

KIN

G /

AL

IGN

ME

NT

13

Based on the Iterative Closest Point (ICP) Concept, minimize the sum of pixel

intensity differences (errors) and the sum of depth errors to align Images

𝑐𝑜𝑠𝑡 = 𝑃𝑖𝑥𝑒𝑙 𝐼𝑛𝑡𝑒𝑛𝑠𝑖𝑡𝑦 𝐸𝑟𝑟𝑜𝑟 2 + 𝜆 𝑃𝑖𝑥𝑒𝑙 𝐷𝑒𝑝𝑡ℎ 𝐸𝑟𝑟𝑜𝑟 2

Pixel Intensity Error Depth Error

• F. Steinbruecker,et al., “Real-Time Visual Odometry from Dense RGB-D Images”, ICCV 2011

• C. Kerl et al., “Dense Continuous-Time Tracking and Mapping with Rolling Shutter RGB-D Cameras”, ICCV 2015


Pose Estimation (6-DOF)

14

Flow

Reference Image

Current Image

Warp

subtract

Repeat to

Minimize Error

– =

Warped Image Error Image



15

Example



Track.mp4

16

The Vision Pose will likely contain some errors. • One example is lack of geometrical and textural structures

This can be overcome by fusing the vision pose with the Inertial Motion Unit (IMU) of the tablet

Using The Extended Kalman Filter (EKF) concept, one can predict poses from the IMU.

These are then fused in the update step of EKF to obtain the fused pose estimate

Motion Sensor Fusion

• M. Li et al., “3-D motion estimation and online temporal calibration for camera-IMU systems”, ICRA 2013

• S. Weiss et al., “Real-Time Metric State Estimation for Modular Vision-Inertial Systems. in IEEE International Conference on Robotics and Automation ”, ICRA 2011

Extended

Kalman Filter

(Predict)

Vision Based

Pose

Estimation

Extended

Kalman Filter

(Update)

Gyro

Accelerometer

17

Fused Poses need to be refined in order

to reduce the visual errors. • Reason: Poses are being computed locally,

“between consecutive frames”

We use bundle adjustment to find optimal

global or semi-global poses • Construct links (red lines) between captured frames

(blue nodes). Links are established if the re-projection

between captured images is above a certain threshold

• Jointly optimize the connected nodes

Bundle Adjustment

• V. Indelman et al., “Incremental Light Bundle Adjustment for Robotics Navigation”, IROS 2013

• R. Newcombe et al., “KinectFusion: Real-Time Dense Surface Mapping and Tracking”, IEEE ISMAR 2011

• K. Konolige et al., “FrameSLAM: from Bundle Adjustment to Realtime Visual Mappping”. IEEE Transactions on Robotics 2008

-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8-0.2

0

0.2

0.4

0.6

0.8

1

1.2

18

Having computed the 3D points, we need to generate the 3D surface mesh that best describes the

scene while reducing the noise

Many methods are available in the literature for surface reconstruction: Moving Least Squares

(MLS), TSDF & Poisson. Any can be used in theory. TSDF is the least computational demanding,

MLS and Poisson are more demanding

These are then followed by the marching cubes concept to generate the mesh

Surface Reconstruction / Mesh Generation

• S. Fleischmann et al., “Robust Moving Least-squares Fitting with Sharp Features”, ACM SIGGRAPH 2005

• M. Kazdan et al., “Poisson Surface Reconstruction”, Symposium on Geometry Processing 2006

• R. Newcombe et al., “KinectFusion: Real-Time Dense Surface Mapping and Tracking”, IEEE ISMAR 2011

19

Captured color images can suffer from casting due to many reasons like different lighting

sources. We need to correct that so that the overall color of the 3D model is in harmony

Solution: Estimate Color Casts & Remove them • Gray points provide best estimate about color

• Estimate gray pixels & shift the appropriate channel gain to bring them to neutral gray

• Repeat until convergence

Color Correction

• J. Huo et al., ‘”Robust Automatic White Balance Algorithm Using Gray Color Points in Images”, IEEE Trans. Consumer Electronics, 2006

BE

FO

RE

AF

TE

R

20

The captured images need to be joined in one or more images called Texture Maps

Texture mapping can be thought of as “3D stitching of the images on the 3D model”

Obtaining the Texture Map consists in general of two steps:

• Determine where to put the pixels on a 3D model (texture coordinates)

• Determine what is the color of the pixel given a sequence of input images

Texture Mapping

• P. Debevec et al., “Efficient View-Dependent Image-Based Rendering with Projective Texture-Mapping”, Eurographics Rendering Workshop 1998

• M. Waechter et al., “Let There Be Color! Large-Scale Texturing of 3D Reconstructions”, ECCV 2015

Input Camera Images Output Texture Map Colored 3D Model

Using the Texture Map

Rabbit.mp4

21

Some 3DR Examples

22

Using our system we can scan

a small toy, human face/body

or an object

All of this can happen easily

on the Snapdragon 820, thanks

to its powerful heterogeneous

computational framework

Some Results

Sairam.mp4 Suzy.mp4 Printer.mp4 Bunny.mp4

Thank you

Follow us on:

For more information, visit us at:

www.qualcomm.com & www.qualcomm.com/blog

Nothing in these materials is an offer to sell any of the components or devices referenced herein.

©2016 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Qualcomm and Snapdragon are trademarks of Qualcomm Incorporated, registered in the United States and other countries. Why Wait is a trademark of Qualcomm Incorporated. Other products and brand names may be trademarks or registered trademarks of their respective owners.

References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or business units within the Qualcomm corporate structure, as applicable.Qualcomm Incorporated includes Qualcomm’s licensing business, QTL, and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of Qualcomm’s engineering, research and development functions, and substantially all of its product and services businesses, including its semiconductor business, QCT.

23