Aerial Video Surveillance System for Small-Scale UAV ...users.dimi.uniud.it/~niki.martinel/wp-content/...through a high-resolution maps generated by SAR. Authors in [10] propose a

Aerial Video Surveillance System for Small-Scale UAV

Environment Monitoring

Danilo Avola1, Gian Luca Foresti1, Niki Martinel1, Christian Micheloni1,Daniele Pannone2, Claudio Piciarelli1

1Department of Mathematics, Computer Science and Physics, University of Udine, Italy2Department of Computer Science, Sapienza University, Italy

{danilo.avola, gianluca.foresti, niki.martinel, christian.micheloni, claudio.piciarelli}@[email protected]

Abstract

Change detection algorithms are commonly used to de-tect novelties for surveillance purposes in public and pri-vate places equipped by static or Pan-Tilt-Zoom (PTZ) cam-eras. Often, these techniques are also used as prerequi-site to support more complex algorithms, including eventrecognition, object classification, person re-identification,and many others. With regard to small-scale UnmannedAerial Vehicles (UAVs) at low-altitude, the change detectiontechniques require further investigation. In fact, most of theworks currently available in the literature process video se-quences acquired at very high-altitude for large-scale op-erations, such as vegetation monitoring, mapping of build-ings, and so on. In a wide range of application contexts thatrequire, for example, frequent monitoring or high spatialresolution for detecting small objects, video sequences ac-quired at high-altitude are not suitable. This paper presentsa change detection system based on histogram equaliza-tion and RGB-Local Binary Pattern (RGB-LBP) operatorfor monitoring of wide areas by small-scale UAVs at low-altitude. Extensive experimental results show the robustnessof the proposed pipeline. These latter were performed byusing challenging video sequences of the public UAV Mo-saicking and Change Detection (UMCD) dataset and mea-sured a set of well-known statistical metrics. Finally, a per-formance analysis of the proposed algorithm is also pro-vided.

1. Introduction

Nowadays, video surveillance systems are widely usedfor controlling restricted areas, streets and to prevent acts ofcrime [13, 21]. The simplest approach in surveillance mon-itoring is to perform a change detection [16,23,25], namelyto find differences between two images of the same area but

acquired at different time instants. Recently, in order to ex-pand change detection application fields, and due to the lowcost and the ease of use, researchers have moved towards toUAVs. UAVs change detection algorithms are usually usedfor large-scale operation such as vegetation monitoring andmapping of buildings [6]. For contexts in which a frequentor continuous check of an interest area is required, such astracking [7, 30] and search and rescue [4, 24], small-scaleUAV are used. This is due to their rapid deployment timeand to their capability to perform low-altitude flights. More-over, in order to increase the precision of these tasks, andto have a rapid overview of the situation, a geo-referencedmosaic of the area may be performed [28,29]. It is possibleto exploit these geo-referenced mosaics in order to improvethe speed and to reduce false positives of change detectionalgorithms applied on images acquired by small-scale UAVsat low-altitudes [18].

In this paper, a novel robust and real-time change de-tection system for low-altitude flights is proposed. Thepipeline, takes as input a geo-referenced mosaic and a videostream with its associate GPS stream sent by a small-scaleUAVs during a second reconnaissance flight of the area ofinterest. Then, the change detection is performed betweenthe frames of the video stream and the corresponding part ofthe geo-referenced mosaic, extracted by comparing the GPScoordinates. The proposed system uses a novel pipelinecomprising sliding window techniques, RGB-LBP operatorand histograms similarity. Differently from other changedetection algorithms, whose aim is to extract the exact sil-houette of the found novelties, in this work the scope isto identify the change (e.g., people, vehicles) among thehighest number of frames. This choice is due to the factthat once the change has been found, advanced algorithmscan be used for its classification (e.g., [12, 14]). In addi-tion, the proposed system has to deal with some well-knownproblems. Firstly, it does not use orthorectified images.

978-1-5386-2939-0/17/$31.00 c�2017 IEEE IEEE AVSS 2017, August 2017, Lecce, ITALY

Figure 1. Logical architecture of the proposed system.

This means that high objects (e.g., trees, buildings) mayintroduce a perspective error that influences the detectionof changes. Secondly, high-altitudes mitigate several fac-tors, such as noise and alignment errors. At low-altitudes,a misalignment between two images can irreversibly com-promise the detection due to the generated image artifacts.Moreover, at high-altitudes tasks like surveillance [15, 19],search and rescue [4,20,24], and tracking [7,8,30] cannot beperformed, while the proposed system is designed to handlesuch situations.

The remainder of this paper is organized as follows. Sec-tion 2 gives an overview of UAVs change detection algo-rithms. In Section 3, the pipeline of the proposed systemis discussed through a running example. In Section 4, theevaluation of the proposed system with a recently releaseddataset and comparisons with state-of-the-art algorithms areshown. Finally, Section 5 concludes the paper.

2. Related Work

Usually, works concerning change detection in aerial im-ages make use of high altitude acquisitions to avoid par-allax errors. This is due to the type of changes that mustbe detected, and in general UAVs are used for tasks suchas vegetation monitoring, building construction and preci-sion agriculture. Regarding these works, an unsupervisedchange detection algorithm based on cross-sharpening andmultitemporal image segmentation is proposed in [26]. Thealgorithm aims to minimize the effects of local displace-ment due to different sensor positions or acquisition anglesin order to increase the detection accuracy. In [27], an ob-ject oriented change detection method is proposed. Themethod is twofold: first, a segmentation of the objects isperformed, then pattern recognition algorithms are used toobtain the objects information. In [6], a coarse-to-fine point

cloud registration method and the RGB-D map acquired byUAV is used to perform a building change detection frame-work. Large illumination changes are handled by the gen-eration of accurate 3D geometry model. Instead, the im-pact of vegetation growth is reduced through an innovative2D-3D joint feature based classification method. Durableand Permanent Changes in Urban Areas are detected in themethod of [9] through the use of multitemporal and polari-metric SAR data. The seasonal and mobile changes arehandled by the use of a monotonic temporal change evo-lution model function, and the urban changes are shownthrough a high-resolution maps generated by SAR. Authorsin [10] propose a novel unsupervised object-based changedetection approach using VHR imagery in a dynamic urbanenvironment at the building level, combining the conceptof multi-temporal objects and an appropriated object-basedfeatures representation. The influence of deviating viewinggeometries of optical VHR satellite systems is quantifiedacross different platforms.

3. Change Detection at Low-Altitudes

In this section, the pipeline of the system shown in Fig-ure 1 is described through a running example. As input, ageo-referenced mosaic of the interest area is required. Forclarity, we will call T and Q, respectively, the train im-age, or the reference image extracted from the mosaic andthe query image, or the image in which we want to detectchanges. The pipeline will output an image O, in whichthere are present the changes and a list L = R1, . . . , Rn

of bounding boxes. For each Ri, in the sub-image Q(Ri)will be present at least one change with respect to the samesub-image T (Ri).

(a) (b) (c) (d)

(e) (f) (g) (h)

Figure 2. Running example on T and Q of the proposed system. Figures (a) and (b) show the histogram equalization. Figures (c) and(d) show the micro-differences removal. Figure (e) and (f) show, respectively, the binary mask obtained from the image difference andisolation removal. Finally, images (g) and (h) show the final mask and its application on Q.

3.1. Image Alignment

The first step of the proposed pipeline is the alignmentbetween T and Q. A first gross alignment is performed byusing the GPS coordinates sent by the UAV in order to ex-tract the corresponding part of the georeferenced mosaic.This allows to avoid the comparison between the frame re-ceived in real-time from the UAV and the whole mosaic.The second step of the alignment consists in using featuresrobust to rotations, scale changes and translations to fur-ther align T and Q. In the pipeline, we chose A-KAZE [1]features since they are computed faster than SURF [3] andSIFT [11] and they are also detected and described withmuch better performance than these methods, includingORB [22].

3.2. Pre-Processing

The pre-processing is performed in order to reduce thenumber of false positives/negatives during the differencebetween images.

The first preprocessing operation is the HistogramEqualization, which is performed to reduce the illumina-tion differences between T and Q. Figures 2(a)(b) show theresult of this operation.

The second pre-processing step is the micro-differencesremoval. This process removes all the small differences be-tween T and Q such as rippling water and grass moved bythe wind. In order to remove these false changes, a localoperator has been implemented. This operator corrects thecolor of a pixel relying on the average color of the neighbor-hood pixels. This is performed by using a sliding windowWa and for each pixel p of T and Q, Wa is translated so thatits center corresponds to p coordinates. Hence, each timeWa is moved, the averages of the RGB channels within thewindow are computed and stored in the RGB channels of p.Figures 2(c)(d) depict the results obtained.

3.3. Difference between Images

At this point, the difference between T and Q is per-formed. In detail, the difference is performed by using athreshold difference and the RGB-LBP operator. Before ap-plying the threshold difference, both T and Q are convertedto grayscale, and we call these images Tgray and Qgray .Then, the threshold difference is performed by respectingthe following condition:

Di,j =

(0 if |Tgrayi,j

�Qgrayi,j| < Tdiff

1 otherwise(1)

where Tdiff is the threshold value used to consider the pix-els Tgray and Qgray different, and Di,j is the pixel resultingfrom this difference. The value Tdiff is chosen according tothe difference of illumination between T and Q. The morethe illumination is different, the more Tdiff will be a highvalue. This is due to the fact that a difference between twovalues having a high distance will produce a high value asresult, easily overcoming a low threshold.

If Di,j is less than Tdiff , no further actions are per-formed. Otherwise, the RGB-LBP is used to check if thepixel (i,j) is a real change. Given an image I three binarystrings, one for each colour channel, are computed. Let SR,SG, and SB be these strings, they are computed in the fol-lowing way. For each pixel pc in I:

SChannelpc =

(0 if I(Wl)j,k[Channel] < pc[Channel]

1 otherwise(2)

where k = ((i � 1) mod WidthWl) + 1, j = b(i �1)/WidthWlc + 1 and Wl are the neighborhood pixels ofpc. The three binary string are computed for both T andQ, and then they are compared by using the Hamming Dis-tance. First, we assure that the strings of T and Q are of the

same length. In the case they are not, the strings are consid-ered different. On the contrary, we proceed with the Ham-ming Distance computation. Also in this case a thresholdis used, since it is nearly impossible to have two identicalbinary strings for two different images. The value of thisthreshold has been chosen with the same criteria of Tdiff .

Let c be the number of changes between two comparedstrings. If c > TH ⇤ |s|, where TH is the Hamming Dis-tance threshold and |s| is the length of a binary string, thenthe strings are considered different. If the pixels are consid-ered changed with both threshold difference and RGB-LBPmethods, it is supposed that a real change has occurred be-tween T and Q. As a result of these steps, we obtain a binarymask Mdiff , as shown in Figure 2(e).

3.4. Post Processing

The last part of the pipeline regards post processing op-erations, which are used to remove any false positive (ifpresent). The first post processing step consists in remov-ing the noise from Mdiff . In Mdiff there could be whitepixels surrounded by a big amount of black pixels and vice-versa. We call these pixels isolations. Usually, isolationsare false positive due to small differences not removed dur-ing the pre-processing step. To remove isolations, a slidingwindow technique is used. In detail, a new binary mask N isused, and each pixel Ni,j is modified in the following way:

Ni,j =

(0 if nblack > Tiso · (WidthWi ·HeightWi)

255 otherwise(3)

where nblack is the number of black pixels surrounding Ni,j

and Tiso 2 [0, 1] is a threshold value. In Figure 2(f), theresult of this operation is shown.

The second post processing step consists in removingsmall areas. Once isolations are removed, the minimalbounding boxes are computed. A minimal bounding boxcontains a set of contiguous white pixels in N , and theyform the set L = R0, . . . , Rn. Let Tarea be a thresholdin the interval [0, 1]. The value Tarea is chosen with re-spect to the size of the acquired frame. The more is thespatial resolution of the image, the more this value shouldbe. All the bounding boxes Ri 2 L having a ratio betweenthe area of Ri and the area of N less than Tarea are removedfrom L. Moreover, all the pixels of N(Ri) are set to black.This parametrization is due to the fact that small areas inbig images have a different weight than small areas in smallimages.

The last post processing step consists in calculating thehistogram similarity. This last step is performed to removefurther false positives. First, the part of image containedin the bounding boxes Ri is extracted from both T and Q,respectively T [Ri] and Q[Ri]. Then, T [Ri] and Q[Ri] areconverted to grayscale and their histograms, HTi and HQi

Table 1. Parameters settings used in the experiments.

Parameter Value

Wa 10x10Wl 15x15

Tgray 40TH 0.3Wi 37x37Tiso 0.25Tarea 0.000648TB 0.3

is computed. Finally, the two histograms are compared byusing the Bhattacharyya distance [5] B, defined as follows:

B(h1, h2) =

r1� 1

h1 · h2 · �2

Pi

ph1(i) · h2(i) (4)

where � is the number of bins and hk =P

i(hk(i))N . The

Bhattacharyya distance provide as result a similarity valueB 2 [0, 1], and the most the similarity value is closer to0, the most the histograms are similar. If B is less than athreshold TB , all the pixel in N [Ri] are set to 0. The bestthreshold value TB for B has been empirically found duringthe experiments.

In Figure 2(g) the result of this last post processing op-eration is shown.

The last step of the pipeline consists in the applicationof the mask N to the image Q, highlighting the detectedchanges. In Figure 2(h), the final result is depicted.

4. Experiments

In this section, experiments performed on UAV imagesare shown. The used hardware configuration is a PC with anAMD Ryzen 7 1700 3.0 GHz / 3.7 Turbo boost CPU, 16GBDDR4 Ram, 500GB SSD and Nvidia GTX 970 GPU. Astesting images, we used the UAV Mosaic and Change De-tection (UMCD) dataset [2], which contains videos of sev-eral environments (i.e. dirt, countryside, urban) paths ac-quired by a small-scale UAV. Each path has 2 related videos:the first contains the interest area with which is possible tocreate a georeferenced mosaic, while the second is the samearea with some changes (e.g., the presence of a box, a car ora person) that must be detected. We used this dataset sincein our knowledge is the only UAV dataset created for bothmosaicking and change detection.

Concerning the used algorithms parameters, in Table 1our settings are reported.

If the parameters values are chosen too small, there couldbe a high number of false positive. Instead, big values couldremove some important details, such as parts of the changethat we want to detect. In the experiments, are set in orderto always detect a change at the expenses of the precision.

4.1. Detection Results

As metrics for measuring the detected changes, we usedthe Precision (P), Recall (R), Accuracy (A), and F1-Score(F1). In our tests, we assume that the Query and the Test im-ages are already aligned with both GPS and features meth-ods. The tests have been performed on 13871 images (i.e.,the frames of the UMCD dataset videos), and in Table 2 thenumber of true/false positives and negatives are shown.

As it is possible to see, the system has a good detectionrate and a low number of false detection, due to the severaloperator used in the pipeline. In Figure 3, an example offalse positive is shown. The false detection has occurredsince the object has been acquired with different perspec-tives, and this is due to the object dimensions. In detail,when an object is particularly small, the UAV acquires onlyone face of the object (i.e., the top). This is not true inour case, where the low-altitude makes the object not smallenough, so more than one face are acquired.

(a) (b)

Figure 3. Example of false positive due to the different perspec-tives of the object within the red square in image (a) and (b).

In order to provide a general comparison result, in Table4 per-pixel comparison with the state-of-the-art works arereported. In Table 3, the Precision, Recall, Accuracy and

Table 2. Number of true/false positives and negatives obtained dur-ing the experiments.

Parameter Value

FP 884FN 41TP 3167TN 9779

F1-Score values obtained are shown. As expected, theprecision value per-pixel is not high (i.e., 67.15%). This isdue to the fact that,as mentioned before, the system aims todetect the whole change to provide a reliable input to morecomplex algorithms such as classification ones [17], so ahigher accuracy value is preferable.

4.2. Performance Evaluation

The proposed system has been implemented for support-ing both single and multi thread execution. This twofold

Table 3. Precision, Recall, Accuracy and F1-Score values obtainedwith the proposed system.

Metrics Value

Precision 78.18%Recall 98.72%

Accuracy 93.33%F1 87.26%

Table 4. Per-pixel comparison with the state-of-the-art.

Change Detection System Precision Recall Accuracy F1

B. Wang et al. [26] 48% 76% 91.27% -Q. Wang et al. [27] 83.66% - - -

Proposed 67.15% 95.88% 99.54% 78.98%

implementation allowed to simulate the behaviour of thepipeline on an embedded system, since most of the small-scale UAV have a CPU computational power amenable toa single thread desktop CPU. In detail, the multithreadedsteps are the micro-differences removal, the image differ-ence and the noise removal from the binary mask. In av-erage, for a single change detection we obtained an exe-cution time of 0.47 seconds in multithread implementationand 2.883 seconds in single thread implementation. Withthese performances, this system can be used for real-timeapplications in both embedded and desktop-based systems.

5. Conclusions

In this paper, a novel change detection system for small-scale UAVs flying at low-altitudes is proposed. The sys-tem pipeline is based on RGB-LBP operator and histogramcomparison. The aim is to detect the changes among thehighest number of frames, allowing to use its output withmore complex algorithms in order to perform a classifica-tion the changes. Moreover, the proposed system workswithout orthorectified images, resulting robust to parallaxerrors. Finally, the performance measured during the exper-iments suggest the application of the system for real-timesystems.

Acknowledgments

The work was partially supported by Regione FriuliVenezia-Giulia under the “Proactive Vision for advancedUAV systems for the protection of mobile units, control ofterritory and environmental prevention (SUPReME)” FVGL.R. 20/2015 project.

References

[1] P. F. Alcantarilla, J. Nuevo, and A. Bartoli. Fast explicit dif-fusion for accelerated features in nonlinear scale spaces. InBritish Machine Vision Conf. (BMVC), 2013.

[2] D. Avola, G. L. Foresti, N. Martinel, D. Pannone, and C. Pi-ciarelli. The UMCD Dataset. ArXiv e-prints, Apr. 2017.

[3] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool. Speeded-uprobust features (surf). Computer Vision and Image Under-standing, 110(3):346 – 359, 2008.

[4] M. B. Bejiga, A. Zeggada, A. Nouffidj, and F. Mel-gani. A convolutional neural network approach for assistingavalanche search and rescue operations with uav imagery.Remote Sensing, 9(2):1–22, 2017.

[5] A. Bhattacharyya. On a measure of divergence between twomultinomial populations. Sankhy: The Indian Journal ofStatistics (1933-1960), 7(4):401–406, 1946.

[6] B. Chen, Z. Chen, L. Deng, Y. Duan, and J. Zhou. Buildingchange detection with rgb-d map generated from uav images.Neurocomputing, 208:350–364, 2016.

[7] C. Fu, R. Duan, D. Kircali, and E. Kayacan. Onboard robustvisual tracking for uavs using a reliable global-local objectmodel. Sensors, 16(9):1–22, 2016.

[8] A. Gaszczak, T. P. Breckon, and J. Han. Real-time peopleand vehicle detection from uav imagery. In Proceeding ofSPIE: Intelligent Robots and Computer Vision XXVIII: Al-gorithms and Techniques, volume 7878, pages 1–13, 2011.

[9] D. j. Kim, S. Hensley, S. H. Yun, and M. Neumann. Detec-tion of durable and permanent changes in urban areas usingmultitemporal polarimetric uavsar data. IEEE Geoscienceand Remote Sensing Letters, 13(2):267–271, 2016.

[10] T. Leichtle, C. Gei, M. Wurm, T. Lakes, and H. Tauben-bck. Unsupervised change detection in vhr remote sensingimagery an object-based clustering approach in a dynamicurban environment. International Journal of Applied EarthObservation and Geoinformation, 54:15 – 27, 2017.

[11] D. G. Lowe. Distinctive image features from scale-invariantkeypoints. International Journal of Computer Vision,60(2):91–110, 2004.

[12] N. Martinel, C. Micheloni, and G. L. Foresti. The Evolutionof Neural Learning Systems: A Novel Architecture Combin-ing the Strengths of NTs, CNNs, and ELMs. IEEE Systems,Man, and Cybernetics Magazine, 1(3):17–26, jul 2015.

[13] N. Martinel, C. Micheloni, and C. Piciarelli. Pre-Emptivecamera activation for Video Surveillance HCI. In Intena-tional Conference on Image Analysis and Processing, pages189–198, Ravenna, RA, sep 2011.

[14] N. Martinel, C. Piciarelli, and C. Micheloni. A supervisedextreme learning committee for food recognition. ComputerVision and Image Understanding, 148:67–86, 2016.

[15] X. Meng, W. Wang, and B. Leong. Skystitch: A coop-erative multi-uav-based real-time video surveillance systemwith stitching. In Proceedings of the 23rd ACM InternationalConference on Multimedia, pages 261–270, 2015.

[16] M. Michael, C. Feist, F. Schuller, and M. Tschentscher.Fast change detection for camera-based surveillance sys-tems. In 2016 IEEE 19th International Conference on In-telligent Transportation Systems (ITSC), pages 2481–2486,2016.

[17] C. Micheloni, A. Rani, S. Kumar, and G. L. Foresti. A bal-anced neural tree for pattern classification. Neural Networks,27:81 – 90, 2012.

[18] C. Piciarelli, C. Micheloni, N. Martinel, M. Vernier, andG. L. Foresti. Outdoor environment monitoring with un-manned aerial vehicles. In International Conference onImage Analysis and Processing, pages 279–287. Springer,2013.

[19] A. Price, J. Pyke, D. Ashiri, and T. Cornall. Real time ob-ject detection for an unmanned aerial vehicle using an fpgabased vision system. In Proceedings of the IEEE Inter-national Conference on Robotics and Automation (ICRA),pages 2854–2859, 2006.

[20] J. Qi, D. Song, H. Shang, N. Wang, C. Hua, C. Wu, X. Qi,and J. Han. Search and rescue rotary-wing uav and its ap-plication to the lushan ms 7.0 earthquake. Journal of FieldRobotics, 33(3):290–321, 2016.

[21] P. Remagnino, S. A. Velastin, G. L. Foresti, and M. Trivedi.Novel concepts and challenges for the next generation ofvideo surveillance systems. MVA, 18(3):135–137, 2007.

[22] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski. Orb:An efficient alternative to sift or surf. In Proceedings of the2011 International Conference on Computer Vision, pages2564–2571, 2011.

[23] A. Shimada, H. Nagahara, and R. i. Taniguchi. Change de-tection on light field for active video surveillance. In 201512th IEEE International Conference on Advanced Video andSignal Based Surveillance (AVSS), pages 1–6, 2015.

[24] J. Sun, B. Li, Y. Jiang, and C.-y. Wen. A camera-based targetdetection and positioning uav system for search and rescue(sar) purposes. Sensors, 16(11):1–24, 2016.

[25] G. Tzanidou, P. Climent-Prez, G. Hummel, M. Schmitt,P. Sttz, D. N. Monekosso, and P. Remagnino. Telemetry as-sisted frame registration and background subtraction in low-altitude uav videos. In 2015 12th IEEE International Con-ference on Advanced Video and Signal Based Surveillance(AVSS), pages 1–6, 2015.

[26] B. Wang, S. Choi, Y. Byun, S. Lee, and J. Choi. Object-basedchange detection of very high resolution satellite imageryusing the cross-sharpening of multitemporal data. IEEEGeoscience and Remote Sensing Letters, 12(5):1151–1155,2015.

[27] Q. Wang, X. Zhang, Y. Wang, G. Chen, and F. Dan. TheDesign and Development of Object-Oriented UAV ImageChange Detection System, pages 33–42. Springer BerlinHeidelberg, 2013.

[28] D. Wischounig-Strucl and B. Rinner. Resource aware and in-cremental mosaics of wide areas from small-scale uavs. Ma-chine Vision and Applications, 26(7):885–904, 2015.

[29] S. Yahyanejad, D. Wischounig-Strucl, M. Quaritsch, andB. Rinner. Incremental mosaicking of images from au-tonomous, small-scale uavs. In 2010 7th IEEE InternationalConference on Advanced Video and Signal Based Surveil-lance, pages 329–336, 2010.

[30] M. Zhang and H. H. T. Liu. Cooperative tracking a movingtarget using multiple fixed-wing uavs. Journal of Intelligent& Robotic Systems, 81(3):505–529, 2016.

Documents

Aerial Video Surveillance System for Small-Scale UAV ...users.dimi.uniud.it/~niki.martinel/wp-content/...through a high-resolution maps generated by SAR. Authors in [10] propose a