DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic...

DEEP LEARNING FOR RECOGNITION OF OBJECTS IN AUGMENTED REALITYHenrik Pedersen, PhDSenior Computer Vision Engineer

VISUAL COMPUTING LAB

Visual computing is a generic term for all computer science disciplines handlingimages and 3D models, i.e. computer graphics, image processing, visualization,computer vision, virtual and augmented reality, video processing, but also includesaspects of pattern recognition, human computer interaction, machine learning anddigital libraries. The core challenges are the acquisition, processing, analysis andrendering of visual information (mainly images and video). Application areas includeindustrial quality control, medical image processing and visualization, surveying,robotics, multimedia systems, virtual heritage, special effects in movies and television,and computer games.

[https://en.wikipedia.org/wiki/Visual_computing]

Computer Graphics & Visualization

Computer Vision & Image Analysis

Physics Simulations

High Performance Computing

PEOPLE

Graphics ✓ ✓ ✓ ✓ ✓ ✓Vision ✓ ✓ ✓ ✓Physics ✓ ✓ ✓ ✓ ✓ ✓ ✓HPC ✓ ✓

Aarhus Copenhagen

https://alexandra.dk/dk/om_os/labs/visual-computing-lab

PROJECT EXAMPLES

• Real-time terrain visualization in a web browser– Pointcloud generated using LiDAR (17.6 billion points)– 100 Terabytes of data– 40x40 cm resolution– 1 cm in height– All of Denmark– Overlays from satellite photos and OpenStreetMap.

DENMARK’S ELEVATION MODEL VISUALIZED IN WebGL

https://denmark3d.alexandra.dk

• LEGO Digital Designer, LEGO Universe, LEGO House (Fish Tank)

HIGH GRAPHICS QUALITY AND IMAGE RECOGNITION

https://www.legohouse.com/da-dk/explore/yellow-zone

VISIBLE EAR SIMULATOR

Virtual Reality for surgical training and pre-operative planning

• See and feel the inner ear.• Simulation of bone drilling with haptic feedback. • Realistic visualization based on real anatomy. https://ves.alexandra.dk/

AUGMENTED REALITY

Strategic focus: Industrial training 3D lungs Tracking of book frontpages

COMPUTER VISION / IMAGE ANALYSIS

Medical image registration(from hours to seconds)

Field segmentationfrom satellite images

DEEP LEARNING

DEEP LEARNING IN AUGMENTED REALITY

Camera pose estimation Object detection (where?)

Object classification (what?)Markerless tracking

WHAT IS DEEP LEARNING?

• Neural networks are machine learning algorithms inspired by the structure and function of the brain.

• Interest in Deep Neural Networks has sky-rocketed within the past 5 years.• Big data + GPUs + algorithmic progress

Colorize gray-scale images

Turn horses into zebras Turn images into Van Gogh paintings

”Dream” images of fake celebrities

Image captioning

Detect human body pose

• All you need is lots of training data and computing power.

A car has fourwheels, which

are placedapproximately …

Database of Cars and ”Not cars”

Andrej KarpathyDirector of AI at Tesla

Neural networks are not just another classifier, they represent the beginning of a fundamental shift in how we write software. They are Software 2.0.

The “classical stack” of Software 1.0 is what we’re all familiar with […] It consists of explicit instructions to the computer written by a programmer.

In contrast, Software 2.0 is written in neural network weights. No human is involved in writing this code […] Instead, we specify some constraints on the behavior of a desirable program (e.g., a dataset of input output pairs of examples) and use the computational resources at our disposal to search the program space for a program that satisfies the constraints.

Nov 11, 2017Software 2.0

CONVOLUTIONAL NEURAL NETWORKS

• Look for parts and check if their relative positions in the image are consistentwith the type of object you are looking for.

Simple model of a car with three parts

• But computers don’t ”see” the way humans do.

0 0 0 0 0 0 0 0

0 0 0 1 1 0 0 0

0 0 1 0 0 1 0 0

0 0 1 1 1 1 0 0

0 1 0 0 0 0 1 0

0 0 0 0 0 0 0 0

0 0 0 1 1 0 0 0

0 0 1 0 0 1 0 0

0 0 1 1 1 1 0 0

0 1 0 0 0 0 1 0

0 0 0 0 0 0 0 0

0 0 0 1 1 0 0 0

0 0 1 0 0 1 0 0

0 0 1 1 1 1 0 0

0 1 0 0 0 0 1 0

0 0 0 0 0 0 0 0

0 0 0 1 1 0 0 0

0 0 1 0 0 1 0 0

0 0 1 1 1 1 0 0

0 1 0 0 0 0 1 0

0 0 0 0 0 0 0 0

0 0 0 1 1 0 0 0

0 0 1 0 0 1 0 0

0 0 1 1 1 1 0 0

0 1 0 0 0 0 1 0

0 0 0 0 0 0 0 0

Input image

Convolution (locating parts)

Compression

This configuration is consistent with the letter ”A”

Training

• Look for parts and check if their relative positions in the image are consistentwith the type of object you are looking for.

Input image

Convolutional layer

Feature maps

1st layer feature map:Tells the network whereto find simple features like edges and blobs.

Input image

Convolutional layer

Convolution Pooling

Feature maps

Activation

Input image

Convolutional layer

Convolution Pooling

Feature maps

Convolutional layer

Activation

Convolution PoolingActivation

Input image

Convolutional layer

Convolution Pooling

Feature maps

Convolutional layer

Activation

2nd layer feature map:Tells the network where to find more complex features like eyes, nose, etc.

Input image

Convolutional layer

Convolution Pooling

Feature maps

Convolutional layer

Activation

Fully connected layer(s)

Output

Fully connected layers:A trainable ”program” thatdoes ”something” with the features, like checking if their positions areconsistent with a face.

Input image

Convolutional layer

Convolution Pooling

Feature maps

Convolutional layer

Activation

Output

ENCODER/DECODER PERSPECTIVE

Encoder

Decoder

Output

ENCODER/DECODER PERSPECTIVE

Decoder

ClassifierUses features to distinguish betweentwo or more classes.

Output:Discrete labels(”Dog” or ”Cat”)

RegressorUses features to predict some functional relationship.

Output:Real numbers(”Age” of person in image)

OUR USE OF DEEP LEARNING

OUR APPROACH

• Deep learning needs labeled training data – and lots of it

• Deep learning needs labeled training data – and lots of it– Annotated images– Very time consuming

• We specialize in rendering training data– Drastically reduce the time spent on acquiring and annotating images.– If CAD data and material descriptions are available, much can be automated.

OUR APPROACH

• Given an image, tell which in a number of classes it belongs to– What type of object– Quality control: OK or needs manual inspection– View direction

OBJECT CLASSIFICATION

• Train on synthetic photorealistic images– Render images of objects from CAD files– Random viewing angle and distance to camera– Random lighting conditions

Tetris controller

• Image augmentation– Add a “Background” class consisting of natural images.– Put rendered objects on top of background.– Add a little noise to the colors.

Background

Tetris controller

• AlexNet (A. Krizhevsky 2012)– First Deep Neural Network (trained for 2 weeks!).– Transfer learning (network weights pre-trained on ImageNet).– Training time: 1-2 hours

• Results– Robustly distinguishes between

• Tetris controller• Motor• Background (anything else).

– Only recognizes what it has seen during training• When motor is too small, it is classified as background.

DOES IT SCALE?

• ImageNet has 1000 object classes.• State-of-the-art Deep Learning algorithms perform better than humans!• We successfully tested this approach with up to 150 object classes.

VIEW CLASSIFICATION

• Train a neural network to roughly estimate camera pose.• Formulate as classification problem.

Input image ConvNet Which view is it?

VIEW CLASSIFICATION

• Again, we train on synthetic photorealistic images• 48 different views/classes

• Training image examples

VIEW CLASSIFICATION

• Results– Works fine for rough view classification like up/down or left/right.

• Use case in AR– Instructional assistance– “Flip object left-to-right”– “Turn object upside-down”

VIEW CLASSIFICATION

Input image Closest view

• Can we use Deep Learning to detect objects in images?– Input: Image– Output: Bounding boxes + labels for each object in the image

OBJECT DETECTION

• How it used to be done– Train a classifier such as a deep neural network.– Run sliding window over image at multiple scales.

• Disadvantages– (Hand-crafted features)– Computationally expensive

OBJECT DETECTION

• Region-based Fully Convolutional Net (R-FCN)– Run a region proposal network (RPN) to generate regions of interest.– For each region of interest, look for object parts (3x3 grid)

OBJECT DETECTION

• Train on synthetic photorealistic images– Store bounding boxes of objects + class labels.– Allow multiple objects in same image.– Partially occlude objects with gray square.

OBJECT DETECTION

• Results– Works well on both synthetic and real images.– Detects multiple objects in same image.– Real-time performance.

• Example: Segmentation– Given an image, divide pixels into classes

IMAGE-TO-IMAGE TRANSFORMATION

• Fully Convolutional Networks for semantic segmentation– Input: Image– Output: Image with class labels

• Image to image translation– Add color to an image– CT to histology– Building geometry

IMAGE-TO-IMAGE TRANSFORMATION

IMAGE-TO-IMAGE TRANSLATION

• Tracking object keypointsSelect 3D keypoints in the

CAD model that you want to track

Input image Output image

• Tracking object keypointsSelect a 3D keypoint in the

CAD model that you want to track

• Tracking of 48 object keypoints

• Given an image, compute one or more numbers that describe what you see. – The size of an object– Age of a person– Positions of facial landmarks– How many millimeters of tread depth left on a tire?

REGRESSION NETWORKS

• Predict object keypoint positions using a regressionl network.• Use keypoints to estimate object pose.

KEYPOINT REGRESSION

Predicted keypoints Estimated object pose

POSE ESTIMATION

• Where is the camera relative to the object?• Mapping between 3D object coordinates and 2D image coordinates.

Projection of 3D points (cube)onto an image.

Camera

Object

• Work in progress

FINE-TUNING CAMERA POSE

DEEP LEARNING IN AUGMENTED REALITY

• Our approach– Use synthetic images to train Deep Neural Networks.– Works well for object recognition, detection and markerless tracking.

Markerless tracking Detection and recognition

WHERE TO GO NEXT?

• Using synthetic images for training doesn’t always work that well…• We need to close the ”simulation-to-reality gap”.

Thank you for your attention!

DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic...

Documents

G51CUA - Introduction to Advanced Computing Topics Advanced Computing Visual Computing Neural computing

Visual Languages and Computing Survey: Data Flow Visual ...fab.cba.mit.edu/classes/S62.12/docs/Hils_visual.pdfJournal of Visual Languages and Computing (1992) 3, 69-101 Visual Languages

Computer Vision 2 Tavita Su’a INFO410 & INFO350 S2 2015 INFORMATION SCIENCE Visual Computing

Computer Vision Lecture 18 · g19 Computer Vision –Lecture 18 Repetition 09.07.2019 Bastian Leibe Visual Computing Institute RWTH Aachen University leibe@vision

Larrabee: A Many-Core x86 Architecture for Visual Computing...Keywords: graphics architecture, many-core computing, real-time graphics, software rendering, throughput computing, visual

Visual Computing in OpenCV - Computer Vision Laboratory â€” CVL

Medical image computing and computer-aided medical ... · Medical image computing and computer-aided medical ... Jocelyne Troccaz, Michael ... Medical image computing and computer-aided

Higher Computing Computer Systems Computer Software

CS376 Computer Visionhuangqx/CS376_Lecture_1.pdf · Computer Vision •Automatic understanding of images and video –1. Computing properties of the 3D world from visual data (measurement)

The Visual Computing Database: A Platform for …people.csail.mit.edu/jrk/visualdb_shareable.pdfThe Visual Computing Database: A Platform for Visual Data Processing and Analysis at

Future of Visual Computing as viewed by Intel Visual ... · PDF fileFuture of Visual Computing as viewed by Intel Visual Computing Research Centers ... Justin Rattner Intel Science

Next Generation Visual Computing

Visual Analysis of Hierarchical Management Data Zhao Geng 1, Gaurav Gathania 2, Robert S.Laramee 1 and ZhenMin Peng 1 1 Visual Computing Group, Computer

Computer visual basic

Visual Languages and Computing Survey: Data Flow Visual

The Teacher Computing Computer Languages [Computing]

Computer History Timeline - Personal Computers, Computing ... · Computer History Timeline - Personal Computers, Computing

Visual Computing in Medicine - vsb.czdap.vsb.cz/cass2017/img/VisualComputingMedicine.pdf · 2017-09-08 · 9/7/17 3 Visual Computing Visual computing = allcomputer science disciplines

Visual Computing

Computing Curricula 2001 Computer Science - acm.org · Computing Curricula 2001 Computer Science The Joint Task Force on Computing Curricula IEEE Computer Society Association for