Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
MULTI-LEVEL 3D
CONVOLUTIONAL
NEURAL NETWORK
FOR OBJECT
RECOGNITION
SAMBIT GHADAI
XIAN LEE
ADITYA BALU
SOUMIK SARKAR
ADARSH KRISHNAMURTHY
Object RecognitionMulti-Level Volumetric Representations for
CAD Models
Object Recognition using Dense Voxels
Object Recognition using Multi-level
Voxels
Outline
March 26, 2018 2
Motivation
• Object recognition of 3D models from volumetric data
• Learn volumetric features from CAD models• Local features
• 3D spatial features
• Memory efficient way to learn from volumetric data
March 26, 2018 3
Boundary Representation (B-Rep) CAD Models
• De-facto representation for CAD models
• Can be easily tessellated into triangles for rendering
• Difficult to interpret volumetric information• Size of a feature
• Internal location of a feature
March 26, 2018 4
Voxel Representation
• Binary occupancy information• Augmented with extra geometry
information
• Can be used as direct input to a convolutional neural network
• Dense resolution voxel grid has high memory and computation requirements
March 26, 2018 5
Why we need Multi-Resolution?
• As the resolution increases, the fraction of occupancy reduces• Still need to store empty voxels
• An hierarchical (multi-level) representation is useful to capture key features at a finer resolution
March 26, 2018 6
Level 1 Voxels
Level 2 Voxels
[2] http://openaccess.thecvf.com/content_cvpr_2017/poster/1319_POSTER.pdf
ModelNet10 Dataset
• 3D CAD models for objects
• 10 categories of objects:
March 26, 2018 7
Source: Princeton ModelNet
• Bathtub • Bed
• Chair • Desk
• Dresser • Monitor
• Night Stand • Sofa
• Table • Toilet
[1] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang and J. Xiao, 3D ShapeNets: A Deep Representation for Volumetric Shapes, Proceedings of 28th IEEE Conference on Computer Vision and Pattern Recognition (CVPR2015)
Object RecognitionMulti-Level Volumetric Representations for
CAD Models
Object Recognition using Dense Voxels
Object Recognition using Multi-level
Voxels
Outline
March 26, 2018 8
Volumetric Voxelization of ModelNet10
• Overlay a regular voxel grid on the object
• Test point membership of the voxel bounding-box center points, classify as in or out
March 26, 2018 9
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
Identifying Boundary Voxels
March 26, 2018 10
• Boundary Voxels need to be identified in order to generate fine level voxel grid
• Identify the voxels that contain vertices
• Use separating-axis test for all other voxels within the bound
Classify Vertices
Triangle Box Intersection
Fine Level Voxelization (Level 2)
March 26, 2018 11
• Same method as coarse level
• Clip the model using AABB of boundary voxels
• Perform similar Tri-Box intersection to identify level 2 Boundary voxels
• All the information is stored in a flat data structure
Object RecognitionMulti-Level Volumetric Representations for
CAD Models
Object Recognition using Dense Voxels
Object Recognition using Multi-level
Voxels
Outline
March 26, 2018 12
3D CNN on Dense Voxel Grid
March 26, 2018 13
ConvolutionLayer 1
ConvolutionLayer 2
PoolingLayer
DenseLayer 1
DenseLayer 2
10 ClassesDense
Voxel Grid
• Dense voxel grid as input model
• 3D-CNN with two convolutional layers and a max-pooling layer for feature extraction
• A fully connected dense layer to flatten the data to get 10 class classification
• ModelNet10: 3991 training and 908 testing 3D models
• Dataset size is insufficient to train the parameters of 3D-CNN
• 6 rigid body transformations on voxel grid for data augmentation
• 7x original data size used for training
• Rotation (x, y, z axis)
• Mirroring (x, y, z axis)
• Original model
March 26, 2018 14
Data Augmentation
x
y
x
y
90° Rot-z
Object Recognition
Outline
Multi-Level Volumetric Representations for
CAD Models
Object Recognition using Dense Voxels
Object Recognition using Multi-level
Voxels
March 26, 2018 15
Need to learn from Multi-Resolution data
• Learn efficiently from complex and intricate features of a CAD model
• Improve performance with fewer computations
• Amenable to model interpretability by learning finer features at specific spatial locations
• Low memory usage
March 26, 2018 16
• Similar to data augmentation at coarse level voxels
• Rigid body transformation first applied on coarse voxels• Transformation then applied on finer voxels inside each coarse voxel
March 26, 2018 17
Data Augmentation
90° Level 1Rot-z
x
y
x
y90° Level 2
Rot-z
x
y
March 26, 2018 18
Multi-Level 3D CNN
Fine Voxels Convolutionlayers
Pooling Dense SigmoidOutput
Coarse LevelFusion
ConvolutionLayer 1
ConvolutionLayer 2
PoolingLayer
DenseLayer 1
DenseLayer 2
10 Classes
8 x 8 x 8Voxel Grid
4 x 4 x 4Voxel Grid
BoundaryVoxels
Level-2 Forward Linking Level-2 withLevel-1
Level-1 Forward Classification
ComputeLoss
Compute Level-1Gradients
Extract Voxel gradients based on forwards pass
UpdateWeights
Compute Level-2 Gradients
Results
March 26, 2018 19
• Multi-level training parameters:• Batch size: 64 3D models of size 8x8x8 coarse & 4x4x4 fine voxels
• Optimizer: SGD with learning rate of 0.001
• Loss Function: Softmax cross-entropy
• Network (Level-1):• Convolution: 64 filters
• Convolution: 128 filters
• Max Pooling
• Dense Layer: 256 filters
• Network (Level-2):• Convolution: 8 filters
• Convolution: 16 filters
• Max Pooling
• Dense Layer: 32 filters
Results (Contd.)
March 26, 2018 20
• Dense level training parameters:• Batch size: 64 3D models of size 32 x 32 x 32 voxels
• Optimizer: SGD with learning rate of 0.001
• Loss Function: Softmax cross-entropy
• Network A:• Convolution: 64 filters
• Max Pooling
• Convolution: 128 filters
• Max Pooling
• Dense Layer: 256 filters
• Network B:• Convolution: 64 filters
• Convolution: 128 filters
• Max Pooling
• Dense Layer: 256 filters
Results (Contd.)
March 26, 2018 21
1 – Coarse2 – Multi-Level3 – Dense
1 – Coarse 2 – Multi-Level 3 – Dense
Acc
ura
cy
8x8x8 8x8x8 and 4x4x4 32x32x32
Results (Contd.)
March 26, 2018 22
Results (Contd.)
March 26, 2018 23
0
2000
4000
6000
8000
10000
12000
14000
16000
Memory Usage in GPU (MB)
Memory Usage in GPU of Multi-Resolution voxel training & equivalent single resolution training
Multi-Level Dense with MaxPool Dense wihout MaxPool
Conclusions
March 26, 2018 24
• We have developed methods to represent CAD models using a multi-resolution voxel grid
• Developed a multi-level 3D-CNN for object recognition using the multi-resolution voxel grid
• Memory usage by the multi-level 3D-CNN is much lower than the dense voxel 3D-CNN without compromising the accuracy
Future work
March 26, 2018 25
• Efficient training algorithms for Level-2 3D-CNN
• Explore different resolutions’ effect on training 3D-CNN
• Build model interpretability for hierarchical learning
• Experiment the algorithm with different datasets
Acknowledgements
• AI-based Design and Manufacturability Lab (ADAM Lab)• Xian Lee
• Aditya Balu
• Gavin Young
• Funding Sources• National Science Foundation
• CMMI:1644441 – CM: Machine-Learning Driven Decision Support in Design for Manufacturability
• nVIDIA• Titan Xp GPU for Academic Research
March 26, 2018 26
Thank You!
Questions?
March 26, 2018 27