CONVOLUTIONAL NEURAL NETWORK FOR OBJECT RECOGNITIONon-demand.gputechconf.com/gtc/2018/presentation/s8389-multi-resolution... · Boundary Representation (B-Rep) CAD Models •De-facto

MULTI-LEVEL 3D

CONVOLUTIONAL

NEURAL NETWORK

FOR OBJECT

RECOGNITION

SAMBIT GHADAI

XIAN LEE

ADITYA BALU

SOUMIK SARKAR

ADARSH KRISHNAMURTHY

Object RecognitionMulti-Level Volumetric Representations for

CAD Models

Object Recognition using Dense Voxels

Object Recognition using Multi-level

Voxels

Outline

March 26, 2018 2

Motivation

• Object recognition of 3D models from volumetric data

• Learn volumetric features from CAD models• Local features

• 3D spatial features

• Memory efficient way to learn from volumetric data

March 26, 2018 3

Boundary Representation (B-Rep) CAD Models

• De-facto representation for CAD models

• Can be easily tessellated into triangles for rendering

• Difficult to interpret volumetric information• Size of a feature

• Internal location of a feature

March 26, 2018 4

Voxel Representation

• Binary occupancy information• Augmented with extra geometry

information

• Can be used as direct input to a convolutional neural network

• Dense resolution voxel grid has high memory and computation requirements

March 26, 2018 5

Why we need Multi-Resolution?

• As the resolution increases, the fraction of occupancy reduces• Still need to store empty voxels

• An hierarchical (multi-level) representation is useful to capture key features at a finer resolution

March 26, 2018 6

Level 1 Voxels

Level 2 Voxels

[2] http://openaccess.thecvf.com/content_cvpr_2017/poster/1319_POSTER.pdf

ModelNet10 Dataset

• 3D CAD models for objects

• 10 categories of objects:

March 26, 2018 7

Source: Princeton ModelNet

• Bathtub • Bed

• Chair • Desk

• Dresser • Monitor

• Night Stand • Sofa

• Table • Toilet

[1] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang and J. Xiao, 3D ShapeNets: A Deep Representation for Volumetric Shapes, Proceedings of 28th IEEE Conference on Computer Vision and Pattern Recognition (CVPR2015)

http://vision.princeton.edu/projects/2014/3DShapeNets/paper.pdf


CAD Models



Voxels

Outline

March 26, 2018 8

Volumetric Voxelization of ModelNet10

• Overlay a regular voxel grid on the object

• Test point membership of the voxel bounding-box center points, classify as in or out

March 26, 2018 9

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

Identifying Boundary Voxels

March 26, 2018 10

• Boundary Voxels need to be identified in order to generate fine level voxel grid

• Identify the voxels that contain vertices

• Use separating-axis test for all other voxels within the bound

Classify Vertices

Triangle Box Intersection

Fine Level Voxelization (Level 2)

March 26, 2018 11

• Same method as coarse level

• Clip the model using AABB of boundary voxels

• Perform similar Tri-Box intersection to identify level 2 Boundary voxels

• All the information is stored in a flat data structure


CAD Models



Voxels

Outline

March 26, 2018 12

3D CNN on Dense Voxel Grid

March 26, 2018 13

ConvolutionLayer 1

ConvolutionLayer 2

PoolingLayer

DenseLayer 1

DenseLayer 2

10 ClassesDense

Voxel Grid

• Dense voxel grid as input model

• 3D-CNN with two convolutional layers and a max-pooling layer for feature extraction

• A fully connected dense layer to flatten the data to get 10 class classification

• ModelNet10: 3991 training and 908 testing 3D models

• Dataset size is insufficient to train the parameters of 3D-CNN

• 6 rigid body transformations on voxel grid for data augmentation

• 7x original data size used for training

• Rotation (x, y, z axis)

• Mirroring (x, y, z axis)

• Original model

March 26, 2018 14

Data Augmentation

x

y

x

y

90° Rot-z

Object Recognition

Outline

Multi-Level Volumetric Representations for

CAD Models



Voxels

March 26, 2018 15

Need to learn from Multi-Resolution data

• Learn efficiently from complex and intricate features of a CAD model

• Improve performance with fewer computations

• Amenable to model interpretability by learning finer features at specific spatial locations

• Low memory usage

March 26, 2018 16

• Similar to data augmentation at coarse level voxels

• Rigid body transformation first applied on coarse voxels• Transformation then applied on finer voxels inside each coarse voxel

March 26, 2018 17

Data Augmentation

90° Level 1Rot-z

x

y

x

y90° Level 2

Rot-z

x

y

March 26, 2018 18

Multi-Level 3D CNN

Fine Voxels Convolutionlayers

Pooling Dense SigmoidOutput

Coarse LevelFusion

ConvolutionLayer 1

ConvolutionLayer 2

PoolingLayer

DenseLayer 1

DenseLayer 2

10 Classes

8 x 8 x 8Voxel Grid

4 x 4 x 4Voxel Grid

BoundaryVoxels

Level-2 Forward Linking Level-2 withLevel-1

Level-1 Forward Classification

ComputeLoss

Compute Level-1Gradients

Extract Voxel gradients based on forwards pass

UpdateWeights

Compute Level-2 Gradients

Results

March 26, 2018 19

• Multi-level training parameters:• Batch size: 64 3D models of size 8x8x8 coarse & 4x4x4 fine voxels

• Optimizer: SGD with learning rate of 0.001

• Loss Function: Softmax cross-entropy

• Network (Level-1):• Convolution: 64 filters

• Convolution: 128 filters

• Max Pooling

• Dense Layer: 256 filters

• Network (Level-2):• Convolution: 8 filters


• Max Pooling


Results (Contd.)

March 26, 2018 20

• Dense level training parameters:• Batch size: 64 3D models of size 32 x 32 x 32 voxels

• Optimizer: SGD with learning rate of 0.001

• Loss Function: Softmax cross-entropy

• Network A:• Convolution: 64 filters

• Max Pooling


• Max Pooling


• Network B:• Convolution: 64 filters


• Max Pooling


Results (Contd.)

March 26, 2018 21

1 – Coarse2 – Multi-Level3 – Dense

1 – Coarse 2 – Multi-Level 3 – Dense

Acc

ura

cy

8x8x8 8x8x8 and 4x4x4 32x32x32

Results (Contd.)

March 26, 2018 22

Results (Contd.)

March 26, 2018 23

0

2000

4000

6000

8000

10000

12000

14000

16000

Memory Usage in GPU (MB)

Memory Usage in GPU of Multi-Resolution voxel training & equivalent single resolution training

Multi-Level Dense with MaxPool Dense wihout MaxPool

Conclusions

March 26, 2018 24

• We have developed methods to represent CAD models using a multi-resolution voxel grid

• Developed a multi-level 3D-CNN for object recognition using the multi-resolution voxel grid

• Memory usage by the multi-level 3D-CNN is much lower than the dense voxel 3D-CNN without compromising the accuracy

Future work

March 26, 2018 25

• Efficient training algorithms for Level-2 3D-CNN

• Explore different resolutions’ effect on training 3D-CNN

• Build model interpretability for hierarchical learning

• Experiment the algorithm with different datasets

Acknowledgements

• AI-based Design and Manufacturability Lab (ADAM Lab)• Xian Lee

• Aditya Balu

• Gavin Young

• Funding Sources• National Science Foundation

• CMMI:1644441 – CM: Machine-Learning Driven Decision Support in Design for Manufacturability

• nVIDIA• Titan Xp GPU for Academic Research

March 26, 2018 26

Thank You!

Questions?

March 26, 2018 27

Documents

CONVOLUTIONAL NEURAL NETWORK FOR OBJECT RECOGNITIONon-demand.gputechconf.com/gtc/2018/presentation/s8389-multi-resolution... · Boundary Representation (B-Rep) CAD Models •De-facto