57
1 / 57 Innfarn Yoo, 3/29/2018 POINT CLOUD DEEP LEARNING

POINT CLOUD DEEP LEARNING - on …on-demand.gputechconf.com/gtc/2018/presentation/s8453-point-cloud... · • Simple & easy to implement neural network models ... • Multi-Layer

  • Upload
    dothu

  • View
    220

  • Download
    0

Embed Size (px)

Citation preview

1 / 57

Innfarn Yoo, 3/29/2018

POINT CLOUD DEEP LEARNING

2 / 572 / 57

AGENDA

• Introduction

• Previous Work

• Method

• Result

• Conclusion

3 / 57

INTRODUCTION

4 / 57

2D OBJECT CLASSIFICATION

• Convolutional Neural Network (CNN) for 2D images works really well

• AlexNet, ResNet, & GoogLeNet

• R-CNN Fast R-CNN Faster R-CNN Mask R-CNN

• Recent 2D image classification can even extract precise boundaries of objects (FCN Mask R-CNN)

Deep Learning for 2D Object Classification

[1] He et al., Mask R-CNN (2017)

5 / 57

3D OBJECT CLASSIFICATION

• 3D object classification approaches are getting more attentions

• Collecting 3D point data is easier and cheaper than before (LiDAR & other sensors)

• Size of data is bigger than 2D images

• Open datasets are increasing

• Recent researches approaches human level detection accuracy

• MVCNN, ShapeNet, PointNet, VoxNet, VoxelNet, & VRN Ensemble

Deep Learning for 3D Object Classification

[2] Zhou and Tuzel, VoxelNet (2017)

6 / 57

GOALS

• Evaluating & comparing different types of Neural Network models for 3D object classification

• Providing the generic framework to test multiple 3D neural network models

• Simple & easy to implement neural network models

• Fast preprocessing (remove bottleneck of loading, sampling, & jittering 3D data)

The goals of our method

7 / 57

PREVIOUS WORK

8 / 57

3D POINT-BASED APPROACHES

• PointNet

• First 3D point-based classification

• Unordered dataset

• Transform Multi-Layer Perceptron (MLP) Max Pool (MP) Classification

3D Points Neural Nets

[5] Qi et al., PointNet (2017)

9 / 57

PIXEL-BASED APPROACHES

• Multi-Layer Perceptron (MLP)

• Convolutional Neural Network (CNN)

• Multi-View Convolutional Neural Network (MVCNN)

3D 2D Projections Neural Nets

[3] Su et al., MVCNN (2015)

[4] Krizhevsky et al., AlexNet (2012)

10 / 57

VOXEL-BASED APPROACHES

• VoxNet

• VRN Ensemble

• VoxelNet

3D Points Voxels Neural Nets

[6] Maturana and Scherer, VoxNet (2015)

[7] Broke et al., VRN Ensemble (2016)

[2] Zhou and Tuzel, VoxelNet (2017)

11 / 57

METHOD

12 / 5712 / 57

PREPROCESSING

• Loading 3D polygonal objects

• Required Operations on 3D objects

• Sampling, Shuffling, Jittering, Scaling, & Rotating

• Projection, & Voxelization

• Python interface is not that good for multi-core processing (or multi-threading)

• # of objects is notoriously for single-core processing

Requirement

13 / 5713 / 57

FRAMEWORKBasic Pipeline

Trainer:C++ program

Load 3D ObjectsSample Points

Loading 3D Object

Sampler Threads

Thread N3D Point Sample

Thread 23D Point Sample

…Thread 33D Point Sample

Call Python NN Model Functions:Train, Test, Eval, Report, & Save

Epoch #i

Increase epoch

Thread 13D Point Sample

ConverterPixel, Point, Voxel

ConverterPixel, Point, Voxel

ConverterPixel, Point, Voxel

ConverterPixel, Point, Voxel

nn3d_trainer (C++)

14 / 57

SHAPENET CORE V2MODELNET40MODELNET10

3D DATASETS

Princeton ModelNet Data

http://modelnet.cs.princeton.edu/

10 Categories

4,930 Objects (2 GB)

OFF (CAD) File Format

ShapeNet

https://www.shapenet.org/

55 Categories

51,191 Objects (90 GB)

OBJ File Format

Princeton ModelNet Data

http://modelnet.cs.princeton.edu/

40 Categories

12,431 Objects (10 GB)

OFF (CAD) File Format

15 / 5715 / 57

NEURAL NETWORK MODELS

Point-Based Models

Pixel-Based Models

Voxel-Based Models

16 / 5716 / 57

POINT-BASED NEURAL NETWORK MODELS

• Preprocessing:

• Rotate randomly

• Scale randomly

• Uniform sampling on 3D object surfaces

• Sample 2048 points

• Shuffle points

Types of Models

• Tested Models

• Multi-Layer Perceptron (MLP)

• Multi Rotational MLPs

• Single Orientation CNN

• Multi Rotational CNNs

• Multi Rotational Resample & Max Pool Layers

• ResNet-like

17 / 5717 / 57

18 / 5718 / 57

POINT-BASED NEURAL NETWORK MODELS

Flatten Vector

Fully Connected Layer

… Class Onehot Vector

MLP

3D points …

Softmax Cross Entropy

ReLU + Dropout

19 / 5719 / 57

POINT-BASED NEURAL NETWORK MODELS

3D points …

Softmax Cross Entropy

Random 3x3 Rotation

3D Conv Layer

Max Pooling Layer

Flatten Vector

Fully Connected Layer

… Class Onehot Vector

Multi Rotational MLPs

ReLU + Dropout

20 / 5720 / 57

POINT-BASED NEURAL NETWORK MODELS

Random 3x3 Rotation

3D Conv Layer

Max Pooling Layer

Flatten Vector

Fully Connected Layer

… Class Onehot Vector

Single Orientation CNN

3D points …

Softmax Cross EntropyReLU + Dropout ReLU + Dropout

21 / 5721 / 57

POINT-BASED NEURAL NETWORK MODELS

3D points…

Softmax Cross Entropy

Random 3x3 Rotation

3D Conv Layer

Max Pooling Layer

Flatten Vector

Fully Connected Layer

… Class Onehot Vector

Multi Rotational CNNs

ReLU + Dropout ReLU + Dropout

22 / 5722 / 57

POINT-BASED NEURAL NETWORK MODELS

3D points…

Softmax Cross Entropy

Random 3x3 Rotation

Resample Layer

Max Pooling Layer

Flatten Vector

Fully Connected Layer

… Class Onehot Vector

Multi Rotational Resample & Max Pool Layers

ReLU + Dropout ReLU + Dropout

23 / 5723 / 57

POINT-BASED NEURAL NETWORK MODELS

Softmax Cross Entropy

Random 3x3 Rotation

Resample Layer

Max Pooling Layer

Flatten Vector

Fully Connected Layer

… Class Onehot Vector

ResNet-like

3D points

ReLU + Dropout ReLU + Dropout ReLU + Dropout

24 / 5724 / 57

PIXEL-BASED NEURAL NETWORK MODELS

• Preprocessing:

• Sample 8192 points

• Same as point-based models

• Depth-only orthogonal projection

• 32x32 or 64x64

• Generating multiple rotations

• 64x64x5 & 64x64x10

Types of Models

• Tested Models:

• MLP

• Depth-Only Orthogonal MVCNN

25 / 5725 / 57

26 / 5726 / 57

PIXEL-BASED NEURAL NETWORK MODELS

Softmax Cross Entropy

MLP

Flatten Vector

Fully Connected Layer

… Class Onehot Vector

Images

(32x32x5)…

27 / 5727 / 57

PIXEL-BASED NEURAL NETWORK MODELS

Images

(32x32x5)…

Softmax Cross Entropy

Depth-Only Orthogonal MVCNN

Image Separation

3D Conv Layer

Max Pooling Layer

Flatten Vector

Fully Connected Layer

… Class Onehot Vector

Concat

28 / 5728 / 57

VOXEL-BASED NEURAL NETWORK MODELS

• Preprocessing:

• Sample 8192 points

• Same as point based models

• Voxelization

• 3D points Voxels

• Each voxel has intensity 0.0 ~ 1.0

• how many points hit same voxel

• 32x32x32 & 64x64x64

Types of Models

• Tested Models:

• MLP

• CNN

• ResNet-like

29 / 5729 / 57

30 / 5730 / 57

VOXEL-BASED NEURAL NETWORK MODELSMLP

Flatten Vector

Fully Connected Layer

… Class Onehot Vector

Softmax Cross EntropyImages

(32x32x5)…

31 / 5731 / 57

VOXEL-BASED NEURAL NETWORK MODELSCNN

Max Pooling Layer

Flatten Vector

Fully Connected Layer

… Class Onehot Vector

3D Conv Layer

Softmax Cross EntropyVoxels

32x32x32

32 / 5732 / 57

VOXEL-BASED NEURAL NETWORK MODELSResNet-like

Softmax Cross Entropy

Voxels

32x32x32

Avg Pooling Layer

Resample Layer

Max Pooling Layer

Flatten Vector

Fully Connected Layer

… Class Onehot Vector

3D Conv Layer

Concat

33 / 5733 / 57

IMPLEMENTATION

• System: Ubuntu 16.04, RAM 32 GB & 64 GB, & SSD 512 GB

• NVIDIA Quadro P6000, Quadro M6000, & GeForce Titan X

• GCC 5.2.0 for C++ 11x

• Python 3.5

• TensorFlow-GPU v1.5.0

• NumPy 1.0

System Setup

34 / 5734 / 57

HYPER PARAMETERS

• Object Perturbation

• Random Rotations: -25 ~ 25 degree

• Random Scaling: 0.7 ~ 1.0

• Learning Rate: 0.0001

• Keep Probability (Dropout layer): 0.7

• Max Epochs: 1000

• Batch Size: 32

• Number of Random Rotations: 20

• Voxel Dim: 32x32x32

• MVCNN Number of Views: 5

35 / 57

RESULT

36 / 57

MODELNET10# OF TEST & TRAIN OBJECTS

0

100

200

300

400

500

600

700

800

900

1000

table toilet monitor bathtub sofa chair desk dresser night_stand bed

# of Test Models # of Train Models

37 / 570

10

20

30

40

50

60

70

80

90

100

PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet

Train Accu Test Accu mAP

MODELNET10 ACCURACYIter: 1000%

38 / 57

MODELNET40# OF TEST & TRAIN OBJECTS

0

100

200

300

400

500

600

700

800

900

# of Test Models # of Train Models

39 / 570

10

20

30

40

50

60

70

80

90

100

PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet

Train Accu Test Accu mAP

MODELNET40 ACCURACYIter: 1000%

40 / 57

MODELNET40 ACCURACY7 CATEGORIES (# OF TRAIN OBJECTS > 400)

0

100

200

300

400

500

600

700

800

900

# of Test Models # of Train Models

41 / 570

10

20

30

40

50

60

70

80

90

100

PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet

Train Accu Test Accu mAP

MODELNET40, 7 CATEGORIES ACCURACYIter: 1000%

42 / 57

MODELNET40 ACCURACY10 CATEGORIES (# OF TRAIN OBJECTS > 300)

0

100

200

300

400

500

600

700

800

900

# of Test Models # of Train Models

43 / 57

MODELNET40, 10 CATEGORIES ACCURACY

0

10

20

30

40

50

60

70

80

90

100

PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet

Train Accu Test Accu mAP

44 / 57

MODELNET40 ACCURACY17 CATEGORIES (# OF TRAIN OBJECTS > 200)

0

100

200

300

400

500

600

700

800

900

# of Test Models # of Train Models

45 / 57

MODELNET40,17 CATEGORIES ACCURACY

0

10

20

30

40

50

60

70

80

90

100

PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet

Train Accu Test Accu mAPIter: 1000%

46 / 57

MODELNET10 PERFORMANCE

0.00

0.50

1.00

1.50

2.00

2.50

3.00

PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PCResNet

PX MLP PXMVCNN

VX MLP VX CNN VXResNet

hours

Total Training Time

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PCResNet

PX MLP PXMVCNN

VX MLP VX CNN VXResNet

Inference Time Per Batch

seconds

47 / 57

MODELNET40 PERFORMANCE

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PCResNet

PX MLP PXMVCNN

VX MLP VX CNN VXResNet

Total Training Time

hours

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PCResNet

PX MLP PXMVCNN

VX MLP VX CNN VXResNet

Inference Time Per Batch

seconds

48 / 5748 / 57

SHAPENET CORE V2 ACCURACY

• ShapeNet has pretty big dataset

• 90 GB of dataset

• Only tested 3 NN models

• Point MP

• Depth-Only MVCNN

• Voxel CNN

Iter: 600

0

10

20

30

40

50

60

70

80

90

100

PC MP PX MVCNN VX CNN

Train Accu Test Accu mAP%

49 / 5749 / 57

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

TRAIN ACCURACY GRAPH

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

PC MLP1 PC MLPs PC CNN1 PC CNNs

PC MP PC ResNet PX MLP PX MVCNN

VX MLP VX CNN VX ResNet

50 / 5750 / 57

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

TEST ACCURACY GRAPH

PC MLP1 PC MLPs PC CNN1 PC CNNs

PC MP PC ResNet PX MLP PX MVCNN

VX MLP VX CNN VX ResNet

51 / 5751 / 570

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

CROSS ENTROPY CONVERGENCE GRAPHPC MLP1 PC MLPs PC CNN1 PC CNNs

PC MP PC ResNet PX MLP PX MVCNN

VX MLP VX CNN VX ResNet

52 / 57

CONCLUSION

53 / 57

CONCLUSION

• Voxel ResNet and Voxel CNN provide best result on 3D object classification

• Unordered vs. Ordered: Unordered data (point cloud) could be learned, but lower accuracy and higher computation cost than ordered data

• Projection vs. Voxelization: Voxelization provides better result with similar computation cost (converge faster, & more precise)

• Strict comparisons with previous methods are not done yet

• Sampling and jittering methods are different (cannot directly compare yet)

Models

54 / 57

CONCLUSION

• ModelNet10 & ModelNet40

• Some categories have too small training and testing objects

• Lowering classification accuracy

• ModelNet40, 7 categories (# of training objects > 400)

• Achieved 98% accuracy

• Partial dataset from ModelNet40 (Categories that have more than 400 3D objects)

• # of train 3D objects in a category matters

• ShapeNetCore.v2

• # of 3D objects are big enough, but many object but many duplicated objects

Dataset Evaluation

55 / 57

FUTURE WORK

• Running entire procedures in CUDA

• Using NVIDIA GVDB Will have advantages of Sparse Voxels

• Strict comparisons with previous methods

• PointNet, VRN Ensemble, etc

• Extending methods to captured dataset (e.g. KITTI)

• Train set is too small

• Need to investigate whether Generative Adversarial Networks (GAN) can alleviate this problem or not

56 / 57

REFERENCE

• [1] He et al., 2017, Mask R-CNN

• [2] Zhou and Tuzel, 2017, VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection

• [3] Su et al., 2015, Multi-view convolutional neural networks for 3d shape recognition

• [4] Krizhevsky et al., 2012, ImageNet Classification with Deep Convolutional Neural Networks

• [5] Qi et al., 2017, Pointnet: Deep learning on point sets for 3d classification and segmentation

• [6] Maturana and Scherer, 2015, VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition

• [7] Broke et al., 2016, Generative and Discriminative Voxel Modeling with Convolutional Neural Networks

• [8] He et al., 2016, Deep residual learning for image recognition

• [9] Goodfellow et al., 2014, Generative adversarial nets

• [10] Girshick, 2015, Fast R-CNN

• [11] Ren et al., 2015, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

• [12] Chang et al., 2015, ShapeNet: An Information-Rich 3D Model Repository

• [13] Wu et al., 2015, 3D ShapeNets: A Deep Representation for Volumetric Shapes

• [14] Wu et al., 2014, 3D ShapeNets for 2.5D Object Recognition and Next-Best-View Prediction

57 / 57