Conservation Drones for Animal Monitoring · 2020-03-01 · (1) To investigate how current object detection techniques as developed for human-centred imagery scale to drone-centred

Conservation Dronesfor

Animal Monitoring

Camiel R. Verschoor

Conservation Dronesfor

Animal Monitoring

C A M I E L R O E L V E R S C H O O R

10017321

A thesis submitted in conformity with the requirements for the degree of

MSc. in Artificial Intelligence

Master ThesisTrack: Intelligent Systems

Credits: 12 EC

Supervisors:

D R . J A N V A N G E M E R T

Informatics Institute, Faculty of Science,University of Amsterdam

Science Park 904, 1098 XH Amsterdam

D R S . G E R A L D P O P P I N G A

Defense Systems, Aerospace Systems,National Aerospace Laboratory

Anthony Fokkerweg 2, 1059 CM Amsterdam

Defended on 21st of January, 2016

A B S T R A C T

This thesis investigates automatic monitoring of the animal distribution and abundance fornature conservation. Traditionally, these conservation tasks are performed on foot, by car orby manned aircraft, which are expensive, slow and labour-intensive. This thesis investigatesthe combination of drones with automatic object detection techniques as a viable solution tomanual animal surveying. As no controlled data is publicly available, a animal conservationdataset is recorded with a quadcopter drone. Subsequently two nature conservation tasks (i)animal detection and (ii) animal counting are evaluated using three object detection methodsthat are well-suited for on-board detection. The evaluation results show that automatic objectdetection techniques are promising for nature conservation tasks, but also that object detectionmethods designed for human-scale images are not directly applicable to drone imagery.

The results of this thesis were published in September 2014 at the European Conference forComputer Vision in the workshop Computer Vision in Vehicle Technology:

J. C. van Gemert, C. R. Verschoor, P. Mettes, H. K. Epema, L. P. Koh, and S. Wich. Nature conserva-tion drones for automatic localization and counting of animals. September 2014

I

A C K N O W L E D G E M E N T S

I would like to thank my supervisors, Jan van Gemert and Gerald Poppinga, for their supportand guidance. Without their input and efforts, the research conducted in this thesis wouldnot be same. Due to Jan van Gemert’s efforts, I was able to publish and present my work - aachievement in my academic career and a valuable experience.

Thanks to Christian Muller his remote piloting skills, I was able to record the dataset presentedin this thesis. Furthermore, I am grateful to the National Aerospace Laboratory for providingme with this internship

Last but not least, I am thankful to my family for their patience, support and encouragement.

III

C O N T E N T S

Abstract I

Acknowledgements III

Contents V

1 Introduction 1

2 Background 52.1 Drones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Drones for Nature Conservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Organisations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Related Research 9

4 Methodology 114.1 Evaluation of Nature Conservation methods . . . . . . . . . . . . . . . . . . . . . 114.2 Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.3 Object detection methods suitable for drones . . . . . . . . . . . . . . . . . . . . 124.4 Animal counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.5 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5 Results 17

6 Conclusion 21

Bibliography 23

V

CH

AP

TE

R

1I N T R O D U C T I O N

Have you ever heard of the Formosan clouded leopard (Neofelis nebulosa brachyura), or thePinta Island tortoise (Chelonoidis nigra abingdonii)? Have you ever seen one of them (seeFigure 1.1)? If so, congratulations, as sadly these are animals that recently became officiallyextinct on mother earth. Extinction is a natural process and scientists have estimated that up to98% of all the animal species that have ever lived are now extinct [Fichter and Kest, 2001]. Mostof these became extinct before the arrival of humans over a period of hundreds of millions ofyears. However, after the arrival of the human numerous animal species became extinct due tohuman activities like poaching and overfishing [Vitousek et al., 1997].

Figure 1.1: On the left the Formosan clouded leopard (Neofelis nebulosa brachyura) and onthe right the Pinta Island tortoise (Chelonoidis nigra abingdonii), which both recently becameofficially extinct. Photography by Peter Weimann and Mark Putney.

Nature conservation is the practice of protecting endangered plant and animal species andtheir habitats. For successful nature conservation information on possible threats to plantsand animals is necessary. These threats can be divided into three categories, namely, habitatloss, poaching, and disease. Poaching has become a serious problem and for various iconicanimals such as the elephant, the rhino, and the tiger, poaching is happening at alarming ratesthat places them at a high risk for local extinctions or total extinction. For instance, in the case ofelephants [Bouché et al., 2010, UNEP, 2013], or rhinos where poaching dramatically increasedover the past few years [Nellemann et al., 2014].

1

2 CHAPTER 1. INTRODUCTION

Figure 1.2: Various images taken from a conservation drone. From left to right: humans, rhinos,and zebras.

For successful nature conservation accurate monitoring of the distribution and abundance ofspecies over time is one of the essential activities [Buckland et al., 2001, 2004]. Animal monitor-ing methods mostly include both direct animal counts and indirect counts of animal signs suchas nests, dung, and calls. Traditional ground surveys can be expensive, labour-intensive, andnearly impossible to achieve in remote areas. For instance, surveying orangutan populationsin Sumatra, which is located in Indonesia, costs up to $250,000 for a three-year survey cycle[Koh and Wich, 2012, van Gemert et al., 2014]. For this reason, most surveys are not conductedat a rate required for proper statistical analysis of population trends. Additionally, there remaindozens of remote areas that have never been surveyed. Aerial surveys overcome some of theseconstraints, however, aerial surveys have their own set of limitations. These limitations includethe high cost of buying or renting small planes or helicopters, the lack of their availability inremote areas, and the risks involved with flying at a low altitude over landscapes in which land-ing is difficult such as forests and mountains. Therefore, there is a need for alternative methodsfor animal surveys.

This is where nature conservation crossed my path, since at that time I was looking for a sub-ject for my Master thesis and developing software for Unmanned Aerial Vehicles (UAV) in theNational Aerospace Laboratory (NLR) in Amsterdam. My passion lies in applying innovativetechnologies to solve societal problems. Therefore, the combination of nature conservation, ar-tificial intelligence and unmanned aerial vehicles was a perfect fit for the subject of this thesis.

Unmanned aerial vehicles or drones are small flying robots deployable for applications ran-ging from entertainment to security. Nature conservation is one possible practice for dronesthat has been researched over the past few years. Conservation workers have started usingsmall drones or “conservation drones” both for obtaining information on threats and to de-termine animal abundance [Jones IV et al., 2006, Koh and Wich, 2012]. Conservation drones areeasy to assemble and relatively cheap, which makes drones accessible and affordable for manyresearchers in developing countries. These drones are capable of flying fully autonomous mis-sions to obtain high-resolution still images and videos. The images and videos obtained fromsuch drones can be used to detect, large animal species (e.g. elephants, rhinos and whales),animal tracks (e.g. orangutan nests, chimpanzee nests, turtle tracks), and threats to animals(e.g. signs of human activity) [Hodgson et al., 2013, Koski et al., 2009, Mulero-Pázmány et al.,2014, Vermeulen et al., 2013]. Figure 1.2 shows some examples of objects in footage taken froma conservation drone. Almost all current drone systems record data onboard and thereafterthis data is visually inspected once the drone has landed. For animal abundance surveys, theamount of data to be processed quickly leads to thousands of photos and hundred of hoursof video. Manually processing this data in search of animals reduces the potential time andcost reduction that these systems could yield. Therefore, there is a strong need to automate thedetection of relevant objects in still and video images. Previous efforts that combine humanlabelling and automatic recognition seem encouraging [Chen et al., 2014], nonetheless thisfield is still immature.

3

For anti-poaching efforts, automated detection of objects in imagery collected by a drone isalso a prerequisite. Ideally, drones would do onboard object detection in order to only send therelevant images with a high probability of positive identification of the object of interest (e.g.human, rhino, fire) down to the ground station. Subsequently, rangers can do a manual visualinspection using the ground station in order to take appropriate actions. Due to this approachthe rangers only have to inspect a small subset of all the data collected by the drone resultingin a major time reduction. This thesis investigates automatic object detection algorithms as asolution towards detection and counting of animals and humans from images obtained fromdrones.

The computer vision field has advanced sufficiently to automatically find objects in imageswith an adequate accuracy. Generally, computer vision methods are designed and evaluated ongeneral purpose objects by employing photographs from the Internet [Krizhevsky et al., 2012].Therefore, these computer vision methods are skewed towards human photographs, whichare typically taken from a height of 1-2 metres and contain human-scale objects. Such objectscan safely be assumed to be found by object-saliency methods (so called “object proposals”),tuned to human scale [Girshick et al., 2014, Uijlings et al., 2013] or to consist of observableparts [Felzenszwalb et al., 2010b]. However, drone imagery is typically taken at a higher altitude(10-100 metres), which causes the object of interest to be relatively small. Furthermore, thishigher altitude causes the vantage point of drone imagery to be significantly skewed whencompared to human photographs. Thus, the suitability of current methods that employ indi-vidual parts or objects saliency can be questioned. It can therefore not be taken for granted thatcurrent object detection methods for human-centred imagery find a one-to-one application inconservation drones. Therefore, the goal of this thesis is twofold:

(1) To investigate how current object detection techniques as developed for human-centredimagery scale to drone-centred nature conservation tasks, and

(2) To create an annotated and benchmarked dataset to foster research for nature conservationdrones.

Thus, the main research question of this thesis is:

How do current object detection techniques as developed for human-centred imagery scale todrone-centred nature conservation tasks?

This main research question is divided into the following sub-questions:

1. Are object proposal based object detection methods capable of detecting animals in droneimagery?

2. What is the performance of current object detection techniques on animal detection insingle images?

3. What is the performance of current object detection techniques on counting animals invideo?

The remainder of this thesis is organised as follows. Chapter 2 gives background informationon the usage of drones in nature conservation and the organisations involved in this thesis.Thereafter, an overview of the related research in nature conservation drones and automaticobject detection is given in chapter 3. Subsequently, the methodology employed for findinganimals and counting animals as well as the creation of the dataset is explained in chapter 4.In chapter 5, the experiments and results are explained and discussed. Finally, this thesis isconcluded in chapter 6.

CH

AP

TE

R

2B A C K G R O U N D

Drones are new in the nature conservation field [Jones IV et al., 2006, Koh and Wich, 2012]. Forthis reason, this chapter will elaborate on conservation drones by giving background informa-tion on drones and their application for nature conservation. Finally, at the end of this chaptersome background information is provided on the organisations involved in writing this thesis.

2.1 Drones

Drones are aircraft without an onboard human pilot. The flight is controlled either via theremote control of a pilot or autonomously by onboard computers. Drones are one of the mosthyped technologies of the past decade, promising to revolutionize the way humankind gathersinformation, transport goods or automate processes [Momont, 2014]. New drone technologyhas great potential to address major global challenges, from disaster response to the speedydelivery of goods to the world’s hardest-to-reach places. IDEO, a international design firm, hasclassified four generic types of drones that are each distinct in their functionality and behaviour[Hariri et al., 2014]. Each of these generic types can have a number of species that performspecific functions and combinations of generic types will lead to hybrid types. The identifiedgeneric types as shown in Figure 2.1 are as follows:

Glider The glider drone flies long distances at both low and high altitudes. The glider drone isgood for visual feedback, repetitive tasks, and has the potential to deploy smaller vehicles.

Float The float drone stays within a predetermined region and moves slowly and can be selfpowered. The float drone is good for creating communication networks, visual feedback,and cooperating with other drones.

Carrier The carrier drone delivers packages across short distances. The carrier drone is good foraccessing hard to reach places, filling in holes in supply chains, and running errands.

Bug The bug drone is lightweight and small in size and deployed locally to pick up and trans-mit data. The bug drone is good to gather data.

5

6 CHAPTER 2. BACKGROUND

Figure 2.1: The glider, float, carrier and bug are generic types of drones distinct in functionalityand behaviour. Illustrations by IDEO.

The applications of these drones vary from the military to private consumers. Some examplesof applications include inspections surveys for security and monitoring [Allen and Walsh, 2008,Eschmann et al., 2012], disease detection for agriculture [Berni et al., 2009] and home deliveryfor logistics by companies such as Amazon, DHL and Google. Based on the requirements of thedrone application, the appropriate generic type of drone is chosen.

2.2 Drones for Nature Conservation

The type of drones that are specifically useful for nature conservation are the glider and thecarrier. Gliders yield long flying times and larger forward speed to cover more ground. In con-trast, carriers yield great control of the position and orientation of the camera as well as verticaltake off and landing capabilities. Combine this with the bird’s eye view for the camera andthese drones are perfect for nature conservation. These types are both affordable and easilyconverted into a autonomous drones by equipping them with a highly affordable open sourceautopilot system like Ardupilot [Koh and Wich, 2012] or Paparazzi [Gati, 2013]. By combiningthe autopilot system with an open source mission planner, the flight path can be programmedfor each mission by creating GPS waypoints on the map in the interface. The drone can be pro-grammed to take off and land autonomously, and circle over a waypoint for a specified numberof turns or duration. Flight parameters such as altitude of each waypoint and ground speedcan be adjusted. The final mission is uploaded to the drone and thereafter the drone performsthe programmed mission autonomously. During flight the mission can be adjusted accordingto the needs of the user.

2.3. ORGANISATIONS 7

Drones are already employed for nature conservation for various tasks. Drone technology isan affordable method opening up a wide range of applications in nature conservation. Someexamples of conservation tasks where drones are already employed are as follows:

Forest monitoring Deforestation is a major contributor to greenhouse gas emissions and biodiversity loss[Baccini et al., 2012]. Forest conversion to plantations such as oil palm, rubber and ca-cao has resulted in deforestation and forest degradation [Koh and Wilcove, 2008, Kohet al., 2011]. Drones are a inexpensive method for conservation workers to overcome thechallenge to accurately assess and monitor changes in forest cover [Jones IV et al., 2006,Horcher and Visser, 2004, Koh and Wich, 2012, Paneque-Gálvez et al., 2014]. The droneis employed for terrain mapping in order to monitor land use change. In the photos andvideos obtained by the drone larger crops (i.e. oil palm trees) and small crops (i.e. maizestands) can easily be distinguished. Furthermore, the drone is able to acquire evidence ofhuman activities in the landscape, such as logging, forest trails, and forest fires. The lowcost of operating the drone allow conservation workers to survey target areas repeatedlyat high frequency to monitor potential land use changes and human activities.

Animal monitoring Accurate monitoring of the distribution and abundance of species over time is one ofthe activities required for successful nature conservation [Buckland et al., 2001, 2004].Current wildlife surveys are costly and therefore rarely occur at the frequency needed tostatistically evaluate population trends. These costs are high because planes, helicopters,and ships are often used as an observation platform for such studies. Furthermore sur-veys are also conducted on the ground by car or on foot, which is time-consuming andcostly. Drones are a new inexpensive method for conservation workers to overcome thechallenge of monitoring animals in enormous nature reserves.

Anti-poaching Wildlife crime is one of the major problems countries in Africa face. Illegal wildlife tradehas grown exponentially over the past decade to meet the increasing demand for ele-phant ivory, rhino horns, and tiger products especially in Southeast Asia, where peoplebelieve these products are status symbols or medicine [Bouché et al., 2010, UNEP, 2013,Nellemann et al., 2014]. The illegal wildlife trade controlled by dangerous crime syn-dicates is trafficked much like drugs or weapons. Conservation workers have to protectenormous nature reserves with limited manpower and resources against poachers thatare often armed with high caliber weapons. A drone helps the conservation worker byproviding frequent information on the location of the animals and potential threats.

All these conservation tasks for drones results in large amounts of video data. Manually pro-cessing this data in search of animals reduces the potential time and cost-reduction that thesesystems could yield. Therefore, there is a strong need to automate the detection of relevantobjects on still and video images.

2.3 Organisations

This thesis was conducted in collaboration with multiple organisations that together have allnecessary expertise for this thesis. The organisations that were involved are as follows:

University of Amsterdam The Informatics Institute at the Univer-sity of Amsterdam performs fundamental, applied and spin-offresearch. We define intelligence as observing and learning; ob-serving the world by video, still pictures, signals and text and ab-stracting knowledge or decisions to act from these observations.The University of Amsterdam provided the expertise on computervision.

8 CHAPTER 2. BACKGROUND

National Aerospace Laboratory The National Aerospace Laborat-ory (NLR) is the independent knowledge enterprise in the Neth-erlands on aerospace. The overall mission is making air trans-port and space exploration safer, more sustainable and more ef-ficient. NLR’s multidisciplinary approach focuses on developingnew and cost effective technologies for aviation and space. TheNLR provided expertise on small drones.

Dutch UAS Dutch UAS develops computer vision and artificial in-telligence algorithms to automatically analyse aerial imagery. Oursoftware helps farms and nature reserves manage their resourcesby locating resources (e.g. animals, cars, vegetation) and extract-ing health information (e.g. biomass, disease, temperature). DutchUAS provided expertise on the use case scenarios.

CH

AP

TE

R

3R E L A T E D R E S E A R C H

This chapter gives an overview of the related research that has been done regarding automaticobject detection. First the various related state-of-the-art algorithms are discussed includingan argumentation for the approach proposed in this thesis.

Large convolutional neural networks [Girshick et al., 2014, Sermanet et al., 2014] are the cur-rent state-of-the-art in automatic object detection. These convolutional neural networks aretrained with deep learning methods, which attempt to model high-level representations indata. For deep learning the convolutional neural networks consist of five to ten hidden layersto model these high-level abstractions and these type of networks are often referred to as deepneural networks. Deep learning techniques that train these deep neural networks are basedon backpropagation and gradient descent. Deep neural networks have proven to be highlysuccessful on global image classification [Krizhevsky et al., 2012] by learning discriminativeimage features from bottom-up. The high performance of deep neural networks in the objectrecognition task, where the class of the object in the image is depicted, also benefits the relatedobject detection task, where next to the class of the object, the location of the object has tobe depicted by a bounding box. Typically, approaches obtain the bounding box by applyingadvanced object-saliency methods [Alexe et al., 2012, Uijlings et al., 2013] that generate a set ofa few thousand bounding box proposals (so called “object proposals") that have a high likeli-hood to contain any type of object. The set of class-independent object proposals is the inputto the deep neural network that yields state-of-the-art accuracy. Unfortunately, the accuracy ofa deep neural network relies heavily on modern computer hardware including top of the billCPU and parallel GPU implementations. The computational time using modern hardware is53 seconds per image for R-CNN [Girshick et al., 2014] using the CPU and 13 seconds per imageusing the GPU, whereas OverFeat [Sermanet et al., 2014] operates at 2 seconds per image on anheavy-weight GPU (see Table 3.1). These hardware requirements are currently not feasible in alight-weight drone, where every gram of weight reduces the flight time. Since fast response timeis essential for the nature conservation tasks examined in this thesis, convolutional networksare as of yet computationally too demanding for timely detection results on a drone.

The competitors of deep neural networks are based on the bag-of-words (BOW) model [Uijlingset al., 2013, van Gemert et al., 2009, Vedaldi et al., 2009] or the related Fisher vector [Cinbis et al.,2013, van de Sande et al., 2014]. These methods also use object-saliency methods [Alexe et al.,2012, Uijlings et al., 2013] to limit the search space by generating a bounded set of object pro-posals. The object proposals are each represented with a histogram of prototype counts of localfeatures, which are for instance sampled by interest points [Everts et al., 2014]. These methods

9

10 CHAPTER 3. RELATED RESEARCH

Method Performance (seconds per image)R-CNN CPU [Girshick et al., 2014] 53R-CNN GPU [Girshick et al., 2014] 13OverFeat GPU [Sermanet et al., 2014] 2DPM CPU [Felzenszwalb et al., 2010a] 0.2Exemplar-SVM CPU [Malisiewicz et al., 2011] 0.9

Table 3.1: Computational performance of multiple object detections methods.

generally yield the best results with larger prototype vocabularies resulting in features sizes ofover 170,000 [Uijlings et al., 2013] for BOW or over 300,000 for the Fisher vector [van de Sandeet al., 2014] per bounding box. Unfortunately, most of the BOW and Fisher vector methods heav-ily rely on large memory requirements, which are not feasible on a light-weight drones withlimited internal memory. Both the BOW methods and the deep neural networks heavily rely onhigh quality object proposals tuned to a human scale. Therefore, this thesis first evaluates thesuitability of object proposals for drone imagery.

The best low-memory and CPU-friendly object methods use simple and fast image featurescombined with a cascade of classifiers. The cascade rejects the obvious non-matching can-didates in an early stage, so that more computation time is allocated to promising candid-ates. The seminal Viola and Jones boosting method [Viola and Jones, 2001] is an exampleof such an successful method and is widely used in embedded face detection algorithms forconsumer electronics such as cameras, phones and tablets. Other implementations of the cas-cade of classifiers yield impressive speed-up results for a range of object detection methods.Felzenszwalb et al. [2010b] model objects as a composition of parts, which are often referredto as the Deformalble Part-based Model (DPM). The DPM method with a cascade of classifiersin combination with a coarse-to-fine search has reduced the computation time to 0.2 secondsper image [Felzenszwalb et al., 2010a, Pedersoli et al., 2011]. Equivalently, the exemplar-SVMapproach of Malisiewicz et al. [2011] for object detection can be sped up to 0.9 seconds perimage [Li et al., 2014]. Cascade of classifiers methods are fast (see Table 3.1), while retaining areasonable accuracy, therefore, cascade of classifiers methods are the most suitable methodson a drone. For this reason, this thesis focuses on the evaluation of the DPM and exemplar-SVMdetecion methods.

CH

AP

TE

R

4M E T H O D O L O G Y

This chapter gives an overview of the methodology used in this thesis. First the two computervision tasks evaluated for conservation drones are discussed. Thereafter, the pipeline imple-mented to evaluate the computer vision tasks is illustrated. Subsequently, the detection meth-ods and counting method are thoroughly explained. Finally, the dataset recorded for this thesisis presented.

4.1 Evaluation of Nature Conservation methods

In this thesis two types computer vision tasks are evaluated for conservation drones, namely,(i) animal detection and (ii) animal counting. The first task, the automatic detection of animals,results in location information of the animal. Over time this location information will revealpatterns of the animal itself and the herd of the animal. This knowledge is valuable for conser-vation workers [Kays et al., 2015, Giuggioli and Bartumeus, 2012], in order to identify diseasesand protect animals against poachers. The second task, counting the number of animals, willgive the animal abundance over time coupled to the detection regions. The animal abundancegives the conservation worker information about the health of the animal populations andwhere and how many animals disappear. This is essential information for nature conservation,as it allows the conservation worker to take the appropriate actions.

4.2 Pipeline

The computer vision pipeline implemented in this thesis for the evaluation tasks is visualised inFigure 4.1. The area containing the object of interest (i.e. animals) is surveyed and recorded onvideo by a camera. The resulting images serve as a input for the detector that analyses the im-ages and yields a bounding box per animal per image. Subsequently the individual detectionsin every image are merged together by tracking shared features over time in order to obtain anautomatic estimate on the number of unique animals.

11

12 CHAPTER 4. METHODOLOGY

Figure 4.1: Computer vision pipeline. The area containing the object of interest is recorded bya camera. The object of interest is automatically detected, yielding a bounding box per animalper image. Individual detections are merged together by tracking shared features over time toobtain an automatic estimate on the number of animals.

4.3 Object detection methods suitable for drones

Cascade of classifiers methods are fast while retaining a reasonable accuracy. For this reason,this thesis focuses on the evaluation of the deformable part-based model and exemplar-SVMdetection methods.

Deformable part-based model

The deformable part-based model (DPM) introduced by Felzenszwalb et al. [2010b] is an ob-ject detection method based on the pictorial structure representation [Fischler and Elschlager,1973]. In the pictorial representation, an object is modelled as a flexible composition of parts.The parts are connected in a star-structure to a coarse root node. The parts and the root nodeare represented by Histogram of Oriented Gradients (HOG) features (see Figure 4.2). In thismodel, the quality of a new object proposal is calculated by adding up the score of the rootnode features to the sum of the maximum over the part placements minus a deformation costbased on the deviation of the parts from the location in the trained model. A DPM is trained onbounding box annotations of the whole object, where the location of the object parts are notlabelled. In this semi-supervised setup, the latent Support Vector Machine (SVM) is applied totrain the model, as the part-locations are unknown. The labelled bounding boxes x1, x2, . . . , xnwhere each box has a class label yi being either +1 or −1. A latent SVM scores an example x as

fβ(x) = maxz∈Z (x)

β ·Φ(x, z) (4.1)

where β is a vector of model parameters, Z (x) is the set of all possible latent variables (objectconfigurations), andΦ(x, z) is a feature vector. The β vector of model parameters is trained byminimising the objective function

L(β) = 1

2||β||2 +C

n∑i=1

max(0,1− yi fβ(xi )) (4.2)

where max(0,1−yi fβ(xi )) is the standard hinge loss and constant C controls the relative weightof the regularisation term. Hinge loss is the loss function representing the costs of predictioninaccuracy.

4.3. OBJECT DETECTION METHODS SUITABLE FOR DRONES 13

Figure 4.2: On the left example detections of the deformable part-based model trained on aperson. On the right three visualisations of the DPM person model [Felzenszwalb et al., 2010b].(a) A coarse root filter visualising the positive weights at different orientations. (b) The higherresolution part filters visualising the weights for histogram of oriented gradients features. (c) Aspatial model for the location of each part relative to the root visualising the cost of placing thecentre of a part at different locations. Images by Felzenszwalb et al. [2010b]

The original DPM approach [Felzenszwalb et al., 2010b] is extended by Khan et al. [2012], whoadd colour information to the obtained features. The approach adds human assigned colournames to the HOG feature vector to include appearance and colour in the DPM. Colour namesare linguistic colour labels that humans assign to colours. In computer vision, colour namesinvolves the assignment of linguistic colour labels to RGB values in an image. This mappingis often learned from a dataset. By including colour features throughout the learning of theDPM both appearance and colour are included in the model. The introduction of colour leadsto models which can significantly differ from the models learned on only luminance-basedappearance. The colour information provides a significant improvement of 2.5% mean averageprecision [Khan et al., 2012] over the standard HOG-based approach [Felzenszwalb et al., 2010b]on the PASCAL VOC 2007 dataset.

The DPM approach of Felzenszwalb et al. [2010b] can be sped-up substantially through a part-based cascade [Felzenszwalb et al., 2010a]. The cascade orders part hypotheses based on theirscores and prunes low scoring hypotheses, resulting in extra computation time spent on prom-ising candidates. Furthermore, the DPM approach can be sped-up through the hierarchicalcourse-to-fine feature matching approach introduced by Pedersoli et al. [2011]. The speed-upis based on the fact that most of the computational cost of DPM is created by matching eachpart to the image. Part matching can be substantially reduced through a course-to-fine infer-ence strategy. These two optimisation methods yield a speed-up of one upto a two orders ofmagnitude giving detection rates of 0.2 seconds per image [Pedersoli et al., 2011]. These speedsare acceptable for animal monitoring on a low-cost drone.

Exemplar-SVM

The Exemplar-SVM introduced by Malisiewicz et al. [2011] is an SVM object detection methodcombining the effectiveness of a discriminative object detector with the explicit correspond-ence offered by a nearest-neighbour approach. The method trains a separate linear SVM clas-sifier for each exemplar in the training set. Every exemplar-SVM is defined by a single positiveinstance and millions of negatives (see Figure 4.3). The method performs on a similar level asDPM [Felzenszwalb et al., 2010b] on the PASCAL VOC 2007 detection task. The main benefit ofthis approach is that it creates an explicit association between detection and a single trainingexemplar. Since detections have a good alignment to their associated exemplar, it is possibleto transfer any available exemplar meta-data such as pose, geometry, and layout, directly.

The Exemplar-SVM approach represents each training exemplar E via a rigid HOG template xE .Furthermore, it creates negative samples xE by extracting negative windows NE from imagesnot containing any objects from the exemplar’s category. Each Exemplar-SVM, (wE ,bE ), tries


Figure 4.3: Instead of training a single per-category classifier, the exemplar SVM approach trainsa separate linear SVM classifier for each exemplar in the dataset with a single positive exampleand millions of negative windows. Negatives come from images not containing any instancesof the exemplar’s category. Images by Malisiewicz et al. [2011]

to separate xE from all windows in NE by the largest possible margin in the HOG feature space.Learning the weight vector wE equals optimising the following convex objective:

Ω(w,b) = ||w ||2 +C1h(wT xE +b)+C2∑

x∈NE

h(−wT x −b) (4.3)

where h(x) = max(0,1− x) is the hinge loss and C1 and C2 are regularisation parameters. Asigmoid function is fitted on the hold-out data, in order to calibrate each SVM. This results in acomparable SVM output between 0 and 1.

The Exemplar-SVM approach of Malisiewicz et al. [2011] can be sped-up substantially throughboosting approach [Li et al., 2014]. In the boosting approach, each exemplar is represented asa weak classifier and the linear combinations of these weak classifiers builds a strong classifier.The weak classifiers are iteratively selected to optimise the miss-classification rate. This iter-ative approach performs feature selection using only the best T weak classifiers. By selectingfeatures, the number of exemplars is significantly reduced to only 500 exemplars while yield-ing state of the art performance [Li et al., 2014]. Furthermore, Li et al. [2014] propose efficientfeature-sharing across image pyramid scales. These speed ups result in a detection speed of 0.9seconds per image, which is similarly acceptable for on a drone.

4.4 Animal counting

Unique animals are counted by merging the detections generated by the detection methodtogether. A counting algorithm needs to keep track of each unique animal and make sure thatevery unique animal is only counted once. Animal counting is a challenging task, since thedetection method may miss an animal completely or only see the animal in some frames. Thecounting system is also prone to false positives when the detection algorithm detects an an-imal when there is none present, which increases the number of counted animals. Furthermore,the survey strategy of the drone also effects the performance of the counting method, as an-imals may appear and disappear from the drone camera completely. These problems makethe animal counting task an interesting challenge for the community. Note that animal count-ing task is different from a spatio-temporal localisation task, as for example proposed in Tianet al. [2013], Jain et al. [2014]. In a localisation task the objective is to carefully and completelyidentify what, when and where each object is in a video. In the animal counting task, the exactposition is not as relevant to the conservation worker, which is only interested in how manyunique animals are correctly found.

To tackle the animal counting task, this thesis uses a counting algorithm that combines a detec-tion and tracking method. First of all, the detection method processes every frame resulting ina set of detections for every frame. Subsequently, the counting algorithm determines whethertwo detections in subsequent frames belong to the same unique animal by tracking the de-tections over multiple frames, even when one or a few detections are missing. Detections aremerged by tracking salient points. The salient points are tracked by employing the KLT tracker

4.5. DATASET 15

Figure 4.4: An example of face point tracking by Everingham et al. [2009]. Several trajectories ofpoints tracked on the objects are shown as curves in the video. The tracks that do not intersectthe objects are removed for clarity. Image by Everingham et al. [2009]

of Lucas and Kanade [1981], which uses optical flow to track salient for a length of parameterL frames. Thereafter, the counting algorithm determines whether two subsequent detectionbounding boxes A and B belong to the same unique animal by taking the intersection overunion measure A∩B

A∪B > 0.5 of the set of point tracks through A and through B similar to Evering-ham et al. [2009] (see Figure 4.4).

4.5 Dataset

Recording nature in a controlled setting is a challenging task. In this thesis, a realistic conser-vation task is approximated by using a quadcopter drone to record a dataset above animals ofa cattle farm. In figure 4.5(b,c) some examples of the recorded dataset are shown. The datasetcontains elements that are not common in nature such as the exact type of animal (cow), thepresence of man-made structures, and the lack of camouflage like shelter in the open fields.Nonetheless, the dataset retains important properties that match a realistic conservation scen-ario. For instance, the use of a quadcopter drone which is often used in nature because of itsmanoeuvrability and its ability to take off from dense areas. Moreover, this type of drone givesthe opportunity to record under a wide variation of positions, heights, and orientations of thecamera. Therefore, the recording setup matches closely as experienced in nature. Furthermore,the animals are smaller and of a similar size and build as many conservation animals like therhino or the elephant. The dataset provides an excellent first opportunity to evaluate computervision algorithms for conservation drones.

(a) (b) (c)

Figure 4.5: The recorded dataset. (a): the Pelican quadcoptor drone used to record the dataset(b): an example image from the train-set (c): an example image from the test-set. Note theskewed vantage point and tiny animals.


0 1000 2000 3000 4000 5000 6000Time

0

5

10

15

20

25

30

35

40

Coun

t

Unique CowsCows

0 200 400 600 800 1000 1200Time

0

2

4

6

8

10

12

14

16

Coun

t

Unique CowsCows

0 500 1000 1500 2000 2500 3000Time

0

5

10

15

20

25

30

Coun

t

Unique CowsCows

0 500 1000 1500 2000 2500Time

0

5

10

15

20

25

30

Coun

t

Unique CowsCows

0 500 1000 1500 2000 2500 3000 3500 4000Time

0

5

10

15

20

25

30

35

40

Coun

t

Unique CowsCows

0 500 1000 1500 2000 2500 3000Time

0

5

10

15

20

25

30

Coun

t

Unique CowsCows

Training videos Testing videos

Figure 4.6: Number of animals per frame in each video, the green line shows the number ofunique animals per frame and the blue line the cumulative total. The two left columns representthe training videos, the right column the test videos.

The dataset was recorded by the Ascending Technologies Pelican (quadcopter drone) with amounted GoPro HERO 3: Black Edition action camera. In Figure 4.5(a) the drone is performinga survey for the dataset. A 3D printed custom-made mount was manufactured to attach thecamera to the drone. The mount is filled with foam to counter vibration of the camera duringflight. The camera recorded videos at a quality of 1080p (1920 x 1080 pixels) having a mediumfield of view (55 vertical and 94.4 horizontal) with 60 frames per second.

Two separate flights were performed to obtain a set for training and a disjoint set for testing. Allanimals in the dataset were manually annotated with the Video Annotation Tool from Irvine,California (VATIC) [Vondrick et al., 2013]. Large portions of the videos that did not contain anyanimals were removed, which resulted in 6 videos obtained from the two seperate flights. Thefirst 4 videos from the first flight are used as training video, and the latter 2 videos from thesecond flight are used as for testing. In total there are 12,673 frames in the training set and 5,683frames in the test set. There are 30 unique animals present in the dataset. In Figure 4.6, theappearance and disappearance of animals during the flight is visualised.

CH

AP

TE

R

5R E S U LT S

This chapter elaborates on the experiments conducted and the accompanying results. Thedifferent conducted experiments and results aim to answer the sub-questions of the main re-search question stated in the introduction (see chapter 1) of this thesis. First, the proposalquality of object proposals methods in drone imagery is evaluated. Thereafter, the perform-ance of three high-speed object detection methods (DPM, color-DPM, and exemplar-SVM) isevaluated on the constructed dataset. Finally, the quality of animal counting is evaluated basedon frame-based detections in combination with point tracks.

Experiment 1: Evaluating Object Proposal Quality

The current state-of-the-art methods steer away from sliding window detection by evaluatinga limited set of object proposals. By evaluating a few thousand bounding boxes over tens orhundreds of thousands bounding boxes, more computationally demanding algorithms can beapplied. The object detection search space can be significantly reduced by locating a limitedcollection of bounding boxes with a high likelihood containing any type of object. This thesisfocuses on the selective search object proposal algorithm of Uijlings et al. [2013], because of thehigh-quality proposals this algorithm produces. Selective search produces object proposals bymerging super-pixels using a range of colour, shape and other super-pixel similarity measures.The super-pixels [Felzenszwalb and Huttenlocher, 2004] are obtained by an over-segmentation.This thesis evaluates both the fast and quality settings of selective search [Uijlings et al., 2013].

Setting ABO Recall Proposals / frame Time / frame

Fast 0.635 0.873 ca. 18,369 ca. 31 sec.Quality 0.740 0.976 ca. 64,547 ca. 140 sec.

Table 5.1: Performance of the selective search method of Uijlings et al. [2013] on the dronedataset presented in this thesis. Overlap between proposals A and ground-truth boxes B ismeasured as A∩B

A∪B . Average Best Overlap (ABO) is the best overlapping proposal per frame,averaged over all frames. Recall is measured as the subset of ground truth boxes that have anoverlap greater than 0.5.

Table 5.1 shows an overview containing the results achieved on a set of sampled video frames.In the above table, the average best overlap scores and recall scores on the dataset of this thesisare different from the results reported in Uijlings et al. [2013]. For the fast setting of selective

17

18 CHAPTER 5. RESULTS

search 87% of the objects are found in the set of proposals and the average best overlap is 63.5%.In contrast with the results on the PASCAL VOC 2007, where the same setting has an ABO of80.4% and nearly all the objects can be found in the proposals. Equivalently, for the qualitysetting of selective search only has an ABO of 74%. Next to a low detection and overlap rate,there is a significant increase in the number of generated object proposals and the proposalgeneration time per frame. The quality generates more than 64.000 proposals and takes nearlytwo and half minute per frame to generate proposals.

In this thesis, we are interested in lightweight solutions for drone imagery. The evaluation timeof the selective search algorithm poses serious practical problems, as selective search takesat least 30 seconds to generate a set of proposals with an acceptable recall rate. Furthermore,features have to be extracted and a classifier has to be applied on tens of thousands of proposals.Therefore, the proposals of selective search do not significantly reduce the search time, andtherefore, we conclude that object proposal-based detection systems are not suitable from acomputational standpoint.

Experiment 2: Animal detection

Three high-speed object detection methods: DPM, color-DPM and exemplar-SVM are eval-uated on the dataset created for this thesis. All three methods use the hard negative miningprocedure to generate a final model for a specific object class. For this reason, a diverse set ofnegatives is required. Therefore, the setup of the PASCAL VOC challenge is mimicked by repla-cing the images of the cow class with randomly sampled images from the train and test videofor respectively the train and test PASCAL VOC dataset. The train images of the other 19 objectclassses are used for discriminative learning. These classes include people, animals (cats, dogs,horses, sheep), vehicles (aeroplanes, bikes, boats, busses, cars, motorbikes, trains), and indoorobjects (bottles, chairs, dining tables, potted plants, sofa’s, tv’s) taken from a human perspective.

The resulting trained models are applied to all the frames in the test set containing a totalof 1.227 ground truth bounding boxes. The result of this evaluation is a list of bounding boxdetections, ranked by confidence value based on precision and recall.

Figure 5.1 shows the precision-recall curves for exemplar-SVM, DPM, and color-DPM. Analys-ing the graph shows that exemplar-SVM outperforms both other terms in terms of precision andrecall. Furthermore, the exemplar-SVM method performs best at the average precision score,scoring 0.66 compared to 0.30 for DPM and 0.26 for color-DPM. These results are suprisingwhen compared to reported results on standard object detection datasets. For instance, Khanet al. [2012] report that color-DPM is a preferred over standard DPM, while Felzenszwalb et al.[2010b] report better results than exemplar-SVM. Another interesting fact is that the curves of

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0Recall

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Prec

isio

n

DPMExemplar SVMColor DPM

Figure 5.1: Precision-recall curves for DPM, color-DPM and exemplar SVM on the test imagesof the dataset.

19

0 100 200 300 400 500 600 700

0

100

200

300

400

500

0 100 200 300 400 500 600 700

0

100

200

300

400

500

0 100 200 300 400 500 600 700

0

100

200

300

400

500

0 100 200 300 400 500 600 700

0

100

200

300

400

500

0 100 200 300 400 500 600 700

0

100

200

300

400

500

0 100 200 300 400 500 600 700

0

100

200

300

400

500

Exemplar-SVM DPM color-DPM

Figure 5.2: Examples of detected bounding boxes for a test frame produced by exemplar-SVM, DPM, and color-DPM methods. The top row shows the 10 highest ranked detectionsper method, while the bottom row shows all positive detections.

both the DPM models in Figure 5.1 reach a final recall of approximately 0.4, while exemplar-SVM reaches a final recall of approximately 0.72. The main reason for this difference lies inthe total number of detected objects for the methods. While there are in total only 1227 pos-itive instances of the object to be discovered, DPM detects 2673 and color-DPM detects 4156bounding boxes. In contrary exemplar-SVM detects a total of 40654 bounding boxes. The recallis bound to be higher with such a large number of object detections. Nevertheless, the highnumber of object detections do not explain the high precision of the exemplar-SVM method.

There are two reasons why the exemplar-SVM method outperforms the DPM methods. Firstof all, the use of a joint global and part-based model in DPM does not work on drone images,where the scale of the objects is small. Figure 5.2 shows that individual objects are generally tiny,due to the high altitude of drones. This means that when an animal is visible in a window of 25by 25 pixels, there is not sufficient gradient information for a reliable global and a part-basedmodel. Secondly, the high result of exemplar-SVM is caused by the dataset. The evaluation ofthe dataset is aimed at detecting cows, there is limited discrepancy between the instances todiscover during the training and testing phase. As we are in a practical scenario also interestedin a limited range of wildlife animals, the use of animal exemplar for detection is beneficial.

Figure 5.2 also shows qualitative detection results for a single test frame of all the methods.The top ranked detections in the top row look promising for every method. When analysingall the results the results become cluttered. For exemplar-SVM, it is even unknown what thepromising locations of objects are. Despite the high number of detections, the methods arecapable of highly ranking the correct bounding boxes, but also tend to fire on high contrastcorner areas and cluttered locations. For instance, areas containing white lines or humans inFigure 5.2. The outcome of this experiment indcates that the results yielded on human-scaleimages are not directly applicable to drone imagery.

20 CHAPTER 5. RESULTS

Experiment 3: Animal Counting

Lastly, the quality of animal counting based on frame-based detections in combination withpoint tracks obtained by a KLT tracker [Lucas and Kanade, 1981] is evaluated. The countingalgorithm either stitches a frame-based object detection to an existing group of detections orsees a frame-based object detection as a new unique group of detections, where ideally eachgroup of detections represents a single unique algorithm. The KLT algorithm generates pointtracks of length L and L was chosen out of the 5, 10, 15,20 as a range of stable values.

Animal counting is evaluated by precision-recall curves including special considerations. Therecall is defined as all unique animals. The precision is computed based on the correctness ofa stitched group of detections. For counting, this thesis considers a count correct if it adheresto these three strict rules:

• There are no multiple animals in a single set of detections.

• All individual detections are animals.

• The found animal is unique and has not been counted before.

With these strict criteria, the precision-recall curve is generated, where we sort stitched detec-tions based on the number of aggregated detections.

First the quality of the tracking algorithm is evaluated. To this end, a perfect detector is simu-lated by sampling ground truth boxes for every 5 frames in the video. The average precision is0.216 and in Figure 5.3 on the left the precision-recall curve is shown. A track-length of L = 15is performing best, although the difference with other lengths is minimal.

Next, the counting results are evaluated using the automatically generated bounding box de-tection, using an empirically set threshold of -0.8 to discard false positives. In Figure 5.3 on theright the corresponding precision-recall curves is shown. Compared to the results of Figure 5.3on the left, the curves are lower, while also not all animals are found. Similarly, the averageprecision score is lower, with a score of 0.193.

Based on the above yielded results, animal counting turns out to be a challenging problem.Even on ground truth detections, there is plenty of room for improvement, which has to befocused on in future research.

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0Recall

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Prec

isio

n

Length = 5Length = 10Length = 15Length = 20

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0Recall

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Prec

isio

n

Length = 5Length = 10Length = 15Length = 20

Figure 5.3: On the left precision-recall curves for different point track sizes on the ground truthwhere bounding boxes are sampled every 5 frames from the annotations. On the right precision-recall curves for different point track sizes on the DPM detections with a threshold of -0.8.

CH

AP

TE

R

6C O N C L U S I O N

In this thesis, the suitability of current automatic object detection methods as designed forhuman-centred objects for nature conservation on a drone is investigated. Imagery taken froma conservation drone is typically much smaller and has a viewpoint from above. Two tasks: (i)animal detection and (ii) animal counting are defined, which are both important for monit-oring animal distribution and animal abundance as typically required for successful natureconservation.

These tasks were evaluated by manually recording and annotating a new dataset with a quad-copter drone and conducting three experiments. First of all, the suitability of proposal-baseddetection methods was evaluated. The proposal-based detection methods are unsuitable froma computational standpoint, due to the computation time and high number of generated pro-posals. Secondly, the animal detection task was benchmarked with three light-weight objectdetection algorithms. These algorithms are suitable for on-board implementation on a drone,since the detection speed is less than 1 second per image. According to literature, the color-DPM method should outperform the standard DPM method, which in turn should outperformexemplar-SVM. The results show the exact opposite of this ordering. This indicates that the res-ults yielded on human-scale images are not directly applicable to drone imagery. Nevertheless,detection results are promising, showing that automatic animal conservation with drones is afruitful combination of aerospace engineering and computer vision. Lastly, the animal countingtask was evaluated. The animal counting task is based on the detection task for which an eval-uation protocol is defined. The results show that counting is a challenging task, and as such aninteresting research question.

All in all, the experiments show that computer vision can play an essential role in nature con-servation. However, the results of state-of-the-art object detection algorithms such as DPM,color-DPM and exemplar-SVM also show that algorithms designed for human-centred objectsdo not translate to aerial imagery. Therefore, new algorithms should be designed to detect andcount objects in aerial imagery.

21

B I B L I O G R A P H Y

B. Alexe, T. Deselaers, and V. Ferrari. Measuring the objectness of image windows. PatternAnalysis and Machine Intelligence, IEEE Transactions on, 34(11):2189–2202, 2012.

J. Allen and B. Walsh. Enhanced oil spill surveillance, detection and monitoring through theapplied technology of unmanned air systems. In International oil spill conference, volume2008, pages 113–120. American Petroleum Institute, 2008.

A. Baccini, S. J. Goetz, W. S. Walker, N. T. Laporte, M. Sun, D. Sulla-Menashe, J. Hackler, P. S. A.Beck, R. Dubayah, M. A. Friedl, S. Samanta, and R. A. Houghton. Estimated carbon dioxideemissions from tropical deforestation improved by carbon-density maps. Nature ClimateChange, 2(3):182–185, 2012.

J. Berni, P. J. Zarco-Tejada, L. Suárez, and E. Fereres. Thermal and narrowband multispectralremote sensing for vegetation monitoring from an unmanned aerial vehicle. Geoscience andRemote Sensing, IEEE Transactions on, 47(3):722–738, 2009.

P. Bouché, P. C. Renaud, P. Lejeune, C. Vermeulen, J. M. Froment, A. Bangara, O. Fiongai, A. Ab-doulaye, R. Abakar, and M. Fay. Has the final countdown to wildlife extinction in northerncentral african republic begun? African Journal of Ecology, 48:994–1003, 2010.

S. T. Buckland, D. R. Anderson, K. P. Burnham, J. L. Laake, D. L. Borchers, and L. Thomas. Intro-duction to Distance Sampling. Oxford University Press, 2001.

S. T. Buckland, D. R. Anderson, K. P. Burnham, J. L. Laake, D. L. Borchers, and L. Thomas. Ad-vanced Distance Sampling: Estimating Abundance of Biological Populations. Oxford Univer-sity Press, 2004.

Y. Chen, H. Shioi, C. F. Montesinos, L. P. Koh, S. A. Wich, and A. Krause. Active detection viaadaptive submodularity. In Proceedings of The 31st International Conference on MachineLearning, pages 55–63, 2014.

R. G. Cinbis, J. Verbeek, and C. Schmid. Segmentation driven object detection with fishervectors. In International Conference on Computer Vision (ICCV), 2013.

C. Eschmann, C.M. Kuo, C.H. Kuo, and C. Boller. Unmanned aircraft systems for remote build-ing inspection and monitoring. In 6th European workshop on structural health monitoring,2012.

M. Everingham, J. Sivic, and A. Zisserman. Taking the bite out of automated naming of charac-ters in tv video. Image and Vision Computing, 27(5):545–559, 2009.

I. Everts, J. C. van Gemert, and T. Gevers. Evaluation of color spatio-temporal interest points forhuman action recognition. IEEE Transactions on Image Processing, 23(4):1569–1580, 2014.

P. F. Felzenszwalb and D. P. Huttenlocher. Efficient graph-based image segmentation. Interna-tional Journal of Computer Vision, 59(2):167–181, 2004.

P. F. Felzenszwalb, R. B. Girshick, and D. McAllester. Cascade object detection with deformablepart models. In Computer vision and pattern recognition (CVPR), 2010a.

23

24 BIBLIOGRAPHY

P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with dis-criminatively trained part based models. IEEE Transactions on Pattern Analysis and MachineIntelligence, 32(9):1627–1645, 2010b.

G. S. Fichter and K. Kest. Endangered Animals. A Golden Guide from St. Martin’s Press. St.Martin’s Press, 2001. ISBN 9781582381381.

M. A. Fischler and R. A. Elschlager. The representation and matching of pictorial structures.IEEE Transactions on Computers, 22(1):67–92, 1973.

B. Gati. Open source autopilot for academic research - the paparazzi system. In AmericanControl Conference (ACC), 2013, pages 1478–1481, 2013.

R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate objectdetection and semantic segmentation. In Computer Vision and Pattern Recognition (CVPR),2014.

L. Giuggioli and F. Bartumeus. Linking animal movement to site fidelity. Journal of mathemat-ical biology, 64(4):647–656, 2012.

B. Hariri, A. Reineck, and J. Won. Drones for good, 2014. URLhttps://www.ideo.org/projects/drones-for-good/completed. Retrieved at 2016-01-04.

A. Hodgson, N. Kelly, and D. Peel. Unmanned aerial vehicles (uavs) for surveying marine fauna:A dugong case study. 2013.

A. Horcher and R. J. M. Visser. Unmanned aerial vehicles: Applications for natural resourcemanagement and monitoring. Proceedings of the Council of Forest Engineering: “Machinesand People, The Interface”, 2004.

M. Jain, J. C. van Gemert, P. Bouthemy, H. Jegou, and C. G. M. Snoek. Action localization bytubelets from motion. In Computer Vision and Pattern Recognition (CVPR), 2014.

G. P. Jones IV, L. G. Pearlstine, and H. F. Percival. An assessment of small unmanned aerialvehicles for wildlife research. Wildlife Society Bulletin, 34:750–758, 2006.

R. Kays, M. C. Crofoot, W. Jetz, and M. Wikelski. Terrestrial animal tracking as an eye on life andplanet. Science, 348(6240):aaa2478, 2015.

F. S. Khan, R. M. Anwer, J. van de Weijer, A. Bagdanov, Vanrell M., and A. M. Lopez. Colorattributes for object detection. In Computer Vision and Pattern Recognition (CVPR), 2012.

L. P. Koh and S. A. Wich. Dawn of drone ecology: low-cost autonomous aerial vehicles forconservation. Tropical Conservation Science, 5(2):121–132, 2012.

L. P. Koh, J. Miettinen, S. C. Liew, and J. Ghazoul. Remotely sensed evidence of tropical peatlandconversion to oil palm. Proceedings of the National Academy of Sciences, 108(12):5127–5132,2011.

Li. Pin Koh and D. S. Wilcove. Is oil palm agriculture really destroying tropical biodiversity?Conservation letters, 1(2):60–64, 2008.

W. R. Koski, T. Allen, D. Ireland, G. Buck, P. R. Smith, A. M. Macrander, M. A. Halick, C. Rushing,D. J. Sliwa, and T. L. McDonald. Evaluation of an unmanned airborne system for monitoringmarine mammals. Aquatic Mammals, 35(347), 2009.

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutionalneural networks. In Advances in neural information processing systems (NIPS), pages 1097–1105, 2012.

BIBLIOGRAPHY 25

H. Li, Z. Lin, J. Brandt, X. Shen, and G. Hua. Efficient boosted exemplar-based face detection.In Computer Vision and Pattern Recognition (CVPR), 2014.

B. D. Lucas and T. Kanade. An iterative image registration technique with an application tostereo vision. In International Joint Conference on ArtiïnAcial Intelligence, volume 81, 1981.

T. Malisiewicz, A. Gupta, and A. A. Efros. Ensemble of exemplar-svms for object detection andbeyond. In International Conference on Computer Vision (ICCV), 2011.

A. Momont. Drones for good. Master’s thesis, TU Delft, 2014.

Mulero-Pázmány, Stolper M. R., L. van Essen, J. J. Negro, and T. Sassen. Remotely piloted aircraftsystems as a rhinoceros anti-poaching tool in africa. PloS one, 9, 2014.

C. Nellemann, R. Henriksen, P. Raxter, N. Ash, and E. Mrema. The environmental crime crisis -threats to sustainable development from illegal exploitation and trade in wildlife and forestresources. a unep rapid response assessment. www.grida.no, 2014. United Nations Environ-ment Programme, GRID-Arendal.

J. Paneque-Gálvez, M. K. McCall, B. M. Napoletano, S. A. Wich, and L. P. Koh. Small dronesfor community-based forest monitoring: An assessment of their feasibility and potential intropical areas. Forests, 5(6):1481–1507, 2014.

M. Pedersoli, A. Vedaldi, and J. Gonzalez. A coarse-to-fine approach for fast deformable objectdetection. In Computer Vision and Pattern Recognition (CVPR), 2011.

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Integrated recog-nition, localization and detection using convolutional networks. In International Conferenceon Learning Representations, 2014.

Y. Tian, R. Sukthankar, and M. Shah. Spatiotemporal deformable part models for action detec-tion. In Computer Vision and Pattern Recognition (CVPR), pages 2642–2649, 2013.

J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders. Selective search forobject recognition. International journal of computer vision, 104(2):154–171, 2013.

UNEP. Elephants in the dust - the african elephant crisis. a rapid response assessment.www.grida.no, 2013. United Nations Environment Programme, GRID-Arendal.

K. E. A. van de Sande, C. G. M. Snoek, and A. W. M. Smeulders. Fisher and vlad with flair. InComputer Vision and Pattern Recognition (CVPR), 2014.

J. C. van Gemert, C. J. Veenman, and J. M. Geusebroek. Episode-constrained cross-validationin video concept retrieval. IEEE Transactions on Multimedia, 11(4):780–786, 2009.

J. C. van Gemert, C. R. Verschoor, P. Mettes, H. K. Epema, L. P. Koh, and S. Wich. Nature conser-vation drones for automatic localization and counting of animals. September 2014.

A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman. Multiple kernels for object detection. InInternational Conference on Computer Vision (ICCV), 2009.

C. Vermeulen, P. Lejeune, J. Lisein, P. Sawadogo, and P. Bouché. Unmanned aerial survey ofelephants. PloS one, 8, 2013.

P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. InComputer Vision and Pattern Recognition (CVPR), 2001.

P. M. Vitousek, H. A. Mooney, J. Lubchenco, and J. M. Melillo. Human domination of earth’secosystems. Science, 277(5325):494–499, 1997.

C. Vondrick, D. Patterson, and D. Ramanan. Efficiently scaling up crowdsourced video annota-tion. International Journal of Computer Vision, pages 1–21, 2013. ISSN 0920-5691. URLhttp://dx.doi.org/10.1007/s11263-012-0564-1. 10.1007/s11263-012-0564-1.

Documents

Conservation Drones for Animal Monitoring · 2020-03-01 · (1) To investigate how current object detection techniques as developed for human-centred imagery scale to drone-centred