60
DEEP LEARNING FOR RECOGNITION OF OBJECTS IN AUGMENTED REALITY Henrik Pedersen, PhD Senior Computer Vision Engineer

DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

DEEP LEARNING FOR RECOGNITION OF OBJECTS IN AUGMENTED REALITYHenrik Pedersen, PhDSenior Computer Vision Engineer

Page 2: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

VISUAL COMPUTING LAB

Visual computing is a generic term for all computer science disciplines handlingimages and 3D models, i.e. computer graphics, image processing, visualization,computer vision, virtual and augmented reality, video processing, but also includesaspects of pattern recognition, human computer interaction, machine learning anddigital libraries. The core challenges are the acquisition, processing, analysis andrendering of visual information (mainly images and video). Application areas includeindustrial quality control, medical image processing and visualization, surveying,robotics, multimedia systems, virtual heritage, special effects in movies and television,and computer games.

[https://en.wikipedia.org/wiki/Visual_computing]

Computer Graphics & Visualization

Computer Vision & Image Analysis

Physics Simulations

High Performance Computing

Page 3: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

PEOPLE

Graphics ✓ ✓ ✓ ✓ ✓ ✓Vision ✓ ✓ ✓ ✓Physics ✓ ✓ ✓ ✓ ✓ ✓ ✓HPC ✓ ✓

Aarhus Copenhagen

https://alexandra.dk/dk/om_os/labs/visual-computing-lab

Page 4: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

PROJECT EXAMPLES

Page 5: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

• Real-time terrain visualization in a web browser– Pointcloud generated using LiDAR (17.6 billion points)– 100 Terabytes of data– 40x40 cm resolution– 1 cm in height– All of Denmark– Overlays from satellite photos and OpenStreetMap.

DENMARK’S ELEVATION MODEL VISUALIZED IN WebGL

https://denmark3d.alexandra.dk

Page 6: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

• LEGO Digital Designer, LEGO Universe, LEGO House (Fish Tank)

HIGH GRAPHICS QUALITY AND IMAGE RECOGNITION

https://www.legohouse.com/da-dk/explore/yellow-zone

Page 7: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

VISIBLE EAR SIMULATOR

Virtual Reality for surgical training and pre-operative planning

• See and feel the inner ear.• Simulation of bone drilling with haptic feedback. • Realistic visualization based on real anatomy. https://ves.alexandra.dk/

Page 8: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

AUGMENTED REALITY

Strategic focus: Industrial training 3D lungs Tracking of book frontpages

Page 9: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

COMPUTER VISION / IMAGE ANALYSIS

Medical image registration(from hours to seconds)

Field segmentationfrom satellite images

Page 10: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

DEEP LEARNING

Page 11: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

DEEP LEARNING IN AUGMENTED REALITY

Camera pose estimation Object detection (where?)

Object classification (what?)Markerless tracking

Page 12: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

WHAT IS DEEP LEARNING?

• Neural networks are machine learning algorithms inspired by the structure and function of the brain.

• Interest in Deep Neural Networks has sky-rocketed within the past 5 years.• Big data + GPUs + algorithmic progress

Page 14: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

WHAT IS DEEP LEARNING?

• All you need is lots of training data and computing power.

Page 15: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

WHAT IS DEEP LEARNING?

• All you need is lots of training data and computing power.

A car has fourwheels, which

are placedapproximately …

Page 16: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

WHAT IS DEEP LEARNING?

• All you need is lots of training data and computing power.

Database of Cars and ”Not cars”

Page 17: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

WHAT IS DEEP LEARNING?

Andrej KarpathyDirector of AI at Tesla

Neural networks are not just another classifier, they represent the beginning of a fundamental shift in how we write software. They are Software 2.0.

The “classical stack” of Software 1.0 is what we’re all familiar with  […] It consists of explicit instructions to the computer written by a programmer.

In contrast, Software 2.0 is written in neural network weights. No human is involved in writing this code […] Instead, we specify some constraints on the behavior of a desirable program (e.g., a dataset of input output pairs of examples) and use the computational resources at our disposal to search the program space for a program that satisfies the constraints.

Nov 11, 2017Software 2.0

Page 18: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

CONVOLUTIONAL NEURAL NETWORKS

• Look for parts and check if their relative positions in the image are consistentwith the type of object you are looking for.

Simple model of a car with three parts

Page 19: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

CONVOLUTIONAL NEURAL NETWORKS

• But computers don’t ”see” the way humans do.

Page 20: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

CONVOLUTIONAL NEURAL NETWORKS

0 0 0 0 0 0 0 0

0 0 0 1 1 0 0 0

0 0 1 0 0 1 0 0

0 0 1 0 0 1 0 0

0 0 1 1 1 1 0 0

0 1 0 0 0 0 1 0

0 1 0 0 0 0 1 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 1 1 0 0 0

0 0 1 0 0 1 0 0

0 0 1 0 0 1 0 0

0 0 1 1 1 1 0 0

0 1 0 0 0 0 1 0

0 1 0 0 0 0 1 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 1 1 0 0 0

0 0 1 0 0 1 0 0

0 0 1 0 0 1 0 0

0 0 1 1 1 1 0 0

0 1 0 0 0 0 1 0

0 1 0 0 0 0 1 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 1 1 0 0 0

0 0 1 0 0 1 0 0

0 0 1 0 0 1 0 0

0 0 1 1 1 1 0 0

0 1 0 0 0 0 1 0

0 1 0 0 0 0 1 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 1 1 0 0 0

0 0 1 0 0 1 0 0

0 0 1 0 0 1 0 0

0 0 1 1 1 1 0 0

0 1 0 0 0 0 1 0

0 1 0 0 0 0 1 0

0 0 0 0 0 0 0 0

Input image

Convolution (locating parts)

0 1 0

1 0 0

0 0 0

Compression

0 1 0

0 0 1

0 0 0

1 0 1

0 0 0

1 0 1

0 1 0

1 0 0

0 0 0

0 1 0

1 0 0

0 0 0

0 1 0

0 0 1

0 0 0

1 0 1

0 0 0

1 0 1

0 1 0

1 0 0

0 0 0

This configuration is consistent with the letter ”A”

Training

• Look for parts and check if their relative positions in the image are consistentwith the type of object you are looking for.

Page 21: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

CONVOLUTIONAL NEURAL NETWORKS

Input image

Convolutional layer

Feature maps

1st layer feature map:Tells the network whereto find simple features like edges and blobs.

Page 22: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

Input image

Convolutional layer

Convolution Pooling

Feature maps

Activation

CONVOLUTIONAL NEURAL NETWORKS

Page 23: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

Input image

Convolutional layer

Convolution Pooling

Feature maps

Convolutional layer

Activation

Convolution PoolingActivation

CONVOLUTIONAL NEURAL NETWORKS

Page 24: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

Input image

Convolutional layer

Convolution Pooling

Feature maps

Convolutional layer

Activation

Convolution PoolingActivation

CONVOLUTIONAL NEURAL NETWORKS

2nd layer feature map:Tells the network where to find more complex features like eyes, nose, etc.

Page 25: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

Input image

Convolutional layer

Convolution Pooling

Feature maps

Convolutional layer

Activation

Convolution PoolingActivation

Fully connected layer(s)

Output

CONVOLUTIONAL NEURAL NETWORKS

Fully connected layers:A trainable ”program” thatdoes ”something” with the features, like checking if their positions areconsistent with a face.

Page 26: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

CONVOLUTIONAL NEURAL NETWORKS

Page 27: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

Input image

Convolutional layer

Convolution Pooling

Feature maps

Convolutional layer

Activation

Convolution PoolingActivation

Fully connected layer(s)

Output

ENCODER/DECODER PERSPECTIVE

Encoder

Decoder

Page 28: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

Fully connected layer(s)

Output

ENCODER/DECODER PERSPECTIVE

Decoder

ClassifierUses features to distinguish betweentwo or more classes.

Output:Discrete labels(”Dog” or ”Cat”)

RegressorUses features to predict some functional relationship.

Output:Real numbers(”Age” of person in image)

Page 29: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

OUR USE OF DEEP LEARNING

Page 30: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

OUR APPROACH

• Deep learning needs labeled training data – and lots of it

Page 31: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

• Deep learning needs labeled training data – and lots of it– Annotated images– Very time consuming

• We specialize in rendering training data– Drastically reduce the time spent on acquiring and annotating images.– If CAD data and material descriptions are available, much can be automated.

OUR APPROACH

Page 32: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

• Given an image, tell which in a number of classes it belongs to– What type of object– Quality control: OK or needs manual inspection– View direction

OBJECT CLASSIFICATION

Page 33: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

OBJECT CLASSIFICATION

• Train on synthetic photorealistic images– Render images of objects from CAD files– Random viewing angle and distance to camera– Random lighting conditions

Tetris controller

Motor

Page 34: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

OBJECT CLASSIFICATION

• Image augmentation– Add a “Background” class consisting of natural images.– Put rendered objects on top of background.– Add a little noise to the colors.

Background

Tetris controller

Motor

Page 35: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

OBJECT CLASSIFICATION

• AlexNet (A. Krizhevsky 2012)– First Deep Neural Network (trained for 2 weeks!).– Transfer learning (network weights pre-trained on ImageNet).– Training time: 1-2 hours

Page 36: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

• Results– Robustly distinguishes between

• Tetris controller• Motor• Background (anything else).

– Only recognizes what it has seen during training• When motor is too small, it is classified as background.

OBJECT CLASSIFICATION

Page 37: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

DOES IT SCALE?

• ImageNet has 1000 object classes.• State-of-the-art Deep Learning algorithms perform better than humans!• We successfully tested this approach with up to 150 object classes.

Page 38: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

VIEW CLASSIFICATION

• Train a neural network to roughly estimate camera pose.• Formulate as classification problem.

Input image ConvNet Which view is it?

Page 39: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

VIEW CLASSIFICATION

• Again, we train on synthetic photorealistic images• 48 different views/classes

Page 40: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

• Training image examples

VIEW CLASSIFICATION

View 1

View 2

Page 41: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

• Results– Works fine for rough view classification like up/down or left/right.

• Use case in AR– Instructional assistance– “Flip object left-to-right”– “Turn object upside-down”

VIEW CLASSIFICATION

Input image Closest view

Page 42: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

• Can we use Deep Learning to detect objects in images?– Input: Image– Output: Bounding boxes + labels for each object in the image

OBJECT DETECTION

Page 43: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

OBJECT DETECTION

• How it used to be done– Train a classifier such as a deep neural network.– Run sliding window over image at multiple scales.

• Disadvantages– (Hand-crafted features)– Computationally expensive

Page 44: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

OBJECT DETECTION

Page 45: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

• Region-based Fully Convolutional Net (R-FCN)– Run a region proposal network (RPN) to generate regions of interest.– For each region of interest, look for object parts (3x3 grid)

OBJECT DETECTION

Page 46: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

• Train on synthetic photorealistic images– Store bounding boxes of objects + class labels.– Allow multiple objects in same image.– Partially occlude objects with gray square.

OBJECT DETECTION

Page 47: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

OBJECT DETECTION

• Results– Works well on both synthetic and real images.– Detects multiple objects in same image.– Real-time performance.

Page 48: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

• Example: Segmentation– Given an image, divide pixels into classes

IMAGE-TO-IMAGE TRANSFORMATION

Page 49: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

IMAGE-TO-IMAGE TRANSFORMATION

• Fully Convolutional Networks for semantic segmentation– Input: Image– Output: Image with class labels

Page 50: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

• Image to image translation– Add color to an image– CT to histology– Building geometry

IMAGE-TO-IMAGE TRANSFORMATION

Page 51: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

IMAGE-TO-IMAGE TRANSLATION

• Tracking object keypointsSelect 3D keypoints in the

CAD model that you want to track

Input image Output image

Page 52: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

IMAGE-TO-IMAGE TRANSLATION

• Tracking object keypointsSelect a 3D keypoint in the

CAD model that you want to track

Page 53: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

IMAGE-TO-IMAGE TRANSLATION

• Tracking of 48 object keypoints

Page 54: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

• Given an image, compute one or more numbers that describe what you see. – The size of an object– Age of a person– Positions of facial landmarks– How many millimeters of tread depth left on a tire?

REGRESSION NETWORKS

Page 55: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

• Predict object keypoint positions using a regressionl network.• Use keypoints to estimate object pose.

KEYPOINT REGRESSION

Predicted keypoints Estimated object pose

Page 56: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

POSE ESTIMATION

• Where is the camera relative to the object?• Mapping between 3D object coordinates and 2D image coordinates.

Projection of 3D points (cube)onto an image.

Camera

Object

Scene

Page 57: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

• Work in progress

FINE-TUNING CAMERA POSE

Page 58: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

DEEP LEARNING IN AUGMENTED REALITY

• Our approach– Use synthetic images to train Deep Neural Networks.– Works well for object recognition, detection and markerless tracking.

Markerless tracking Detection and recognition

Page 59: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

WHERE TO GO NEXT?

• Using synthetic images for training doesn’t always work that well…• We need to close the ”simulation-to-reality gap”.

Page 60: DEEP LEARNING FOR RECOGNITION OF OBJECTS IN …...VISUAL COMPUTING LAB Visual computing is a generic term for all computer science disciplines handling images and 3D models, i.e. computer

Thank you for your attention!