Tutorial: Using Convolutional Neural Networks to Detect Object Keypoints

  • Upload
    azoft

  • View
    13.252

  • Download
    6

Embed Size (px)

Citation preview

rnd.azoft.comTutorial:
Using Convolutional Neural Networks to Detect
Object Keypoints

Using our tutorial, you will learn how to get a convolutional neural network model for the detection of object keypoints in images. For our tutorial weve chosen license plates as the object to be detected in images. However this tutorial can be used as a guide for detecting other objects.

About the project

Choosing images for neural network training

Labeling the keypoints

Data augmentation

Packing a dataset in HDF5

Training a convolutional neural network

Conclusion

Related links

Overview

rnd.azoft.com

rnd.azoft.com

1. Choosing images for neural network training

1. Choosing images for neural network training

The training dataset has to have somewhere from a few hundred to a few thousand original (not augmented) images in total. The more, the better.

2. Labeling the keypoints

2. Labeling the keypoints

If the keypoints in the original image were not labeled, then you need to label them. This means you need to save keypoint coordinates in **.txt or **.csv file format. Each coordinate has two values for the horizontal axis and
for the vertical axis.

Remark: If you decided to label several keypoints, then you should label them in one sequence. For example - labeling a license plate: you might label the left side upper plates angle by the first dot, the right side upper plates angle by the second dot, the left side lower plates angle by the third dotand the right side lower plates angle by the fourth dot. So in future, you should keep the same sequence.

rnd.azoft.com

rnd.azoft.com

3. Data augmentation

3. Data augmentation

For effective training you need to get a dataset with several thousand to tens of thousands of images.
If the initial dataset is not enough, you should apply augmentation of the images.

Remark: Before starting with augmentation split your database into training and control parts. This is required to guarantee that images received by augmentation of one picture will be in the training as well as in the control part. If you miss this step, you can barely follow the retraining of the model.

rnd.azoft.com

3. Data augmentation

Here are the transformations that can be implemented for the augmentation:

Rotations relative to the center

Perspective distortion

Resize

Shifts

Salt-and-pepper noise

Blurring and sharpening

Erosion and dilation

rnd.azoft.com

4. Packing a dataset in HDF5

4. Packing a dataset in HDF5

In order to use the Caffe framework, you need to pack the dataset into the file format HDF5.
You should normalize pixel values from 0 to 1 and coordinate values from -1 to 1.

Remark:If you implemented augmentation for the initial images, the images in HDF5 have to follow in random order.

After packing the dataset in HDF5, you should check the received file using the utility HDF5 Viewer. The data of pixels have to be from 0 to 1, whereas coordinates have to be from -1 to 1, and images must not be distorted.

rnd.azoft.com

4. Packing a dataset in HDF5

rnd.azoft.com

4. Packing a dataset in HDF5

rnd.azoft.com

5. Training the convolutional neural network

5. Training the convolutional neural network

We recommend using the optimization method ADAM to begin training a neural network.

The input layer should look like this:

layer { name: "data" type: "HDF5Data" top: "data" top: "label" hdf5_data_param { source: "/home/user/caffe/examples/regression/regression_train.txt" batch_size: 256 }}

rnd.azoft.com

5. Training the convolutional neural network

The number of outputs at the output layer have to be equal to the number of coordinate values. Its better to use the layer of error EuclideanLoss.

layer { name: "ipout" type: "InnerProduct" bottom: "ip01" top: "ipout" inner_product_param { num_output: 8 weight_filler { type: "msra" } bias_filler { type: "constant" } }}layer { name: "loss" type: "EuclideanLoss" bottom: "ipout" bottom: "label" top: "loss"}

It seems quite complicated to train a qualified and quick neural network with just several attempts. We made about 20 trials before we got the appropriate outcome. If you have some questions regarding the idea, the experiment implementation, or the code, well be glad to answer you in comments below.

Conclusion

We have used these works as the base of the experiment:Using convolutional neural nets to detect facial keypoints tutorial

affe-regression examples ,Kaggle face keypoint detection at GitHub

Related links

rnd.azoft.com

Read the Detailed Convolutional Neural Networks for Object Detection Project Overview