30
Accepted Manuscript Global localization with non-quantized local image features F.M. Campos, L. Correia, J.M.F. Calado PII: S0921-8890(12)00071-1 DOI: 10.1016/j.robot.2012.05.015 Reference: ROBOT 1985 To appear in: Robotics and Autonomous Systems Received date: 19 January 2012 Revised date: 28 March 2012 Accepted date: 11 May 2012 Please cite this article as: F.M. Campos, L. Correia, J.M.F. Calado, Global localization with non-quantized local image features, Robotics and Autonomous Systems (2012), doi:10.1016/j.robot.2012.05.015 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

Accepted Manuscript

Global localization with non-quantized local image features

F.M. Campos, L. Correia, J.M.F. Calado

PII: S0921-8890(12)00071-1DOI: 10.1016/j.robot.2012.05.015Reference: ROBOT 1985

To appear in: Robotics and Autonomous Systems

Received date: 19 January 2012Revised date: 28 March 2012Accepted date: 11 May 2012

Please cite this article as: F.M. Campos, L. Correia, J.M.F. Calado, Global localization withnon-quantized local image features, Robotics and Autonomous Systems (2012),doi:10.1016/j.robot.2012.05.015

This is a PDF file of an unedited manuscript that has been accepted for publication. As aservice to our customers we are providing this early version of the manuscript. The manuscriptwill undergo copyediting, typesetting, and review of the resulting proof before it is published inits final form. Please note that during the production process errors may be discovered whichcould affect the content, and all legal disclaimers that apply to the journal pertain.

Page 2: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

Entropy and discriminativity significantly correlate in the NQ representation. The NQ representation provides higher robustness in the global localization task. Contribution of features is modulated with an entropy-based relevance factor. Run time of the localization method can be reduced by an order of magnitude.

*Research Highlights

Page 3: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

Global Localization with Non-Quantized Local Image Features F. M. Campos a,b,*, L. Correiaa, J. M. F. Caladob,c a LabMAg, Computer Science Department, University of Lisbon, 1749-016 Lisboa, Portugal. b Mechanical Engineering Department, Instituto Superior de Engenharia de Lisboa, 1959-007 Lisboa, Portugal. c IDMEC, Instituto Superior Técnico, 1049-001 Lisboa, Portugal. * Corresponding address: Departamento de Engenharia Mecânica, Instituto Superior de Engenharia de Lisboa, Rua Conselheiro Emídio Navarro, 1, 1959-007 Lisboa. Tel.: +351 218 317 000. E-mail addresses: [email protected] (F. M. Campos), [email protected] (L. Correia), [email protected] (J. M. F. Calado).

Abstract In the field of appearance-based robot localization, the mainstream approach uses a quantized representation of local image features. An alternative strategy is the exploitation of raw feature descriptors, thus avoiding approximations due to quantization. In this work, the quantized and non-quantized representations are compared with respect to their discriminativity, in the context of the robot global localization problem. Having demonstrated the advantages of the non-quantized representation, the paper proposes mechanisms to reduce the computational burden this approach would carry, when applied in its simplest form. This reduction is achieved through a hierarchical strategy which gradually discards candidate locations and by exploring two simplifying assumptions about the training data. The potential of the non-quantized representation is exploited by resorting to the entropy-discriminativity relation. The idea behind this approach is that the non-quantized representation facilitates the assessment of the distinctiveness of features, through the entropy measure. Building on this finding, the robustness of the localization system is enhanced by modulating the importance of features according to the entropy measure. Experimental results support the effectiveness of this approach, as well as the validity of the proposed computation reduction methods. Keywords: Topological localization; Appearance-based methods; Feature selection; Information content, Entropy.

1. Introduction Self- to the environment, is a key requirement in enabling mobile robots to accomplish real world tasks. The availability of a position estimate is relevant to crucial tasks such as map building, path planning, mission monitoring and situation aware interaction with humans. The two main instances of the self localization problem are position tracking and global localization, differing essentially in the kind of information that is provided to the robot. Given the initial robot location, position tracking maintains an estimate of

*ManuscriptClick here to view linked References

Page 4: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

its position, by compensating for small errors which arise, for instance, from odometry readings [1]. This paper addresses global localization, which aims at estimating the robot position in a map, when initial information about its location is non-existent or erroneous. Under these constraints, global localization enables a robot to deal with the initialization and kidnapped robot problems. Currently, the problem of global localization, as addressed in this paper, can be solved by inference methods such as Markov localization [2 4], and particle filter algorithms [5 7], which converge to a correct estimate when afforded a sufficient number of inference steps. Unfortunately, the convergence process is hindered by phenomena such as sensor aliasing and environmental changes, which reduce the discriminativity of incoming sensor signals. Under such factors, more inference steps are necessary - a greater body of evidence is required to arrive at a correct estimate. The realization that these effects can be reduced by resorting to more accurate and discriminative sensory information, has led researchers to focus on more complex sensor models [8], new features or feature combinations [9 11], and features invariant to environmental factors, such as lighting [12]. In order to increase scalability, global localization systems often rely on topologic maps, where sensory information is encoded in appearance signatures [13]. Appearances are constructed as compressed representations of sensor data, needing to be as discriminative as possible in order to distinguish between different but similarly looking places. In this respect, vision is most appealing as a sensor modality, given the vast and detailed data streams that it provides. In exploiting these properties, many systems extract local image features, which are encoded in high dimensional vectors, most commonly as SIFT descriptors [14]. These features are typically used in conjunction with the Bag of Words (BoW) model [15], which improves efficiency and compactness by quantizing descriptors as a vocabulary . This set of descriptors is intended to be representative of a problem and is

obtained by clustering (usually by k-means) data from a training set. In recent years it has been acknowledged that the desired properties of the BoW model come at the cost of some loss in discriminative power, mainly due to a difficulty in finding appropriate clustering algorithms to generate the vocabulary and due to quantization errors [16,17]. In the context of large scale image retrieval, a number of works have been devoted to improve the BoW model, by exploiting soft assignment, Hamming embedding and hierarchical clustering [18 20]. Although these methods somewhat reduce the detrimental impact of quantization, robot localization poses different demands, which justify more accurate solutions. Higher precision methods are preferable and their higher computational costs are acceptable in small to medium sized environments, where the reference database is smaller than the databases used in large scale image retrieval. In order to achieve better performance, an alternative method, which discards quantization, can be exploited [2,9]. This approach is inherently more accurate, albeit presenting shortcomings in terms of run time and memory requirements.

Page 5: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

In this work we investigate the non-quantized representation as a solution to the global localization problem. In particular we focus on performance gains this representation offers over the BoW model and on the potential to improve efficiency and memory size at a reduced accuracy loss. As a first contribution, this paper presents a comparative evaluation of quantized (Q) and non-quantized (NQ) representations in a robot localization task. To the best of our knowledge, such a study has not been carried out before. We also investigate the potential to improve the NQ method, by exploiting the entropy-discriminativity relation. The idea behind this approach is that the NQ representation, apart from being more discriminative, facilitates the assessment of the distinctiveness of features, through the entropy measure. As a second contribution, we propose modulating the importance of features according to the entropy measure, which is experimentally shown to benefit localization accuracy. This method can be thought of as a smooth selection of features, which assigns relevance to features according to their discriminativity. As a third contribution, we propose two approaches to speed up the NQ method at run time. In the first approach, we propose a hierarchical localization scheme performed at two stages. First, the appearances are compared with a fast computing global image feature, gist, resulting in the elimination of a significant number of candidate places. Secondly, the set of local features is treated in a step-by-step process, during which the candidate places are eliminated on the basis of their assigned probability. In the second approach we capitalize on the specificities of the training data for localization. Given that images describing a place are captured in a sequence, two assumptions can be made regarding their features: i) similar descriptors extracted from the sequence are likely to refer to the same visual feature and ii) features that do not find a match in the sequence are not robustly identified by the region of interest detector. Building on these assumptions, we reduce the number of features needed to be retained, thus reducing memory requirement and consequently, the number of comparisons performed at run time. This paper presents an in depth study of the NQ model advanced in [21]. It includes a comprehensive evaluation of the methods under consideration and introduces an additional technique to reduce the NQ method computational burden at run time. The rest of the paper is organized as follows: section 2 summarizes recent work relating to the present approach; section 3 outlines the proposed localization method and procedures to reduce computation cost; section 4 analyses the quantized and non-quantized representations with regard to their discriminativity and information content properties; section 5 presents localization results obtained for the two representations and compares them in terms of performance and run time and section 6 discusses the results in light of the state-of-the-art in appearance based localization. Section 7 draws the conclusions of the paper and outlines directions for future work.

Page 6: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

2. Related work The use of local features in describing appearance raises the question of how to combine them in the computation of image similarity. Some authors have associated image similarity with the number of descriptor matches found between two images [9,22]. In recent years, the most common approach relies on the BoW model, introduced in [15]. In this model, each descriptor is quantized to the closest visual word from a pre-defined vocabulary. Following feature quantization, images may be represented in a compact form, by histograms of visual word occurrence. Despite the fact that the BoW model has been successful in robot localization tasks [23 25], the last few years have identified some shortcomings in this approach. Jurie and Triggs [16] focus on the negative effects of constructing the vocabulary by k-means clustering. In their work they observe that features found in natural images are non-uniformly distributed in the feature space (i.e. some features are much more frequent than others). Due to this fact, cluster centers resulting from k-means clustering will be concentrated in the densest areas of the feature space, leaving low-to-medium density areas under-represented. It was shown that features in these areas are generally highly discriminative, implying that vocabularies constructed by k-means are suboptimal for classification purposes. In a similar vein, Boiman and colleagues [17], quantify the errors due to quantization and conclude that its detrimental impact increases with feature discriminativity. Addressing these issues, previous works devoted to image retrieval have achieved some reduction in the quantization errors, employing methods that carried either higher memory requirements [18], or additional work at run time [19]. There have been several recent works focusing on how the feature detection stage impacts localization performance [11,26 28]. These works typically evaluate the performance of different feature detectors, e.g. SIFT, SURF, MSER and salience detectors, when applied individually [26,27] or in combination [11], [28]. While these detectors, which employ generic rules in the selection of features from the image, have often been studied in this context, task-dependent selection of features has rarely been addressed. In works where this issue has been studied, the modulation of feature contribution results from reformulating the problem after discarding unlikely locations [29], and from the application of information content measures [30], this being the approach we follow in this work. Information theory concepts have proven useful before in several visual classification and robot tasks. In early work [31], the concept of mutual information was used to select the next most informative view in an object classification task. In the same work the concept was applied for the selection of visual features extracted by densely sampling the image in space-scale. In [32], the authors make use of the entropy concept in the selection of the most informative components from a Principal Component Analysis (PCA) image representation, to localize a robot with eigenimages. Another application of the concept of mutual information in object classification is provided in [33]. In that work, images are densely sampled through a fixed size window and the extracted imagettes are projected on a PCA basis. By estimating probabilities through a

Page 7: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

Parzen window estimator [34], the authors are able to quantify the information content of features extracted from a test image. In a later work [35], the authors use the same approach to identify the most relevant areas in the image, subsequently to be segmented and encoded in a SIFT descriptor. In contrast to [31,33,35], the present approach extracts SIFT features by sparsely sampling the image through a general region of interest (ROI) detector and then evaluates the information content of these features. In this way, we avoid the computational burden of processing a high number of features resulting from dense sampling, and guarantee the expressiveness of features extracted at different scales. A similar approach has been employed recently in [36]. Their method involves extracting SIFT features through a general ROI detector and to quantize them with a predefined vocabulary. The approach fundamentally differs from ours in the kind of representation used for the features, which relies on quantization. Other localization works discarding quantization can be found [2,9,37,38], which nevertheless contrast to ours, in the use of a crisp matching rule and a flat evaluation of feature contribution to similarity. In addition to local image features, the method proposed in this paper also makes use of global features, which provide a means for reducing the processing requirements on the former. Global image features refer to statistical measures of the image, which may include rough spatial information. The latter can be obtained by evenly dividing the image and computing statistics for each window. A category of global image features, which is based on the responses to texture filters, has been coined as gist [39], since it provides a short description of the essence of the scene. In several computer vision works, the gist feature has proven useful in object detection and in predicting human eye saccades [40,41]. The underlying assumption of these works is that scene context information, in the form of gist, provides a strong prior about the presence and location of objects. It may therefore be used to enhance object detection, or produce saliency maps which allow narrowing the search in image space. Some previous works have exploited gist in localization tasks. In [42] a biologically inspired localization system is proposed, where gist is combined with a salience detector in a Monte Carlo localization algorithm. Gist panoramas are introduced in [43] to describe omni-directional image content. Low computation cost is achieved in that work through gist descriptor quantization, while localization performance is improved by a panorama alignment technique. In [44] gist is used with the epitome concept to enhance the translation and scale invariance properties of place representation. In contrast to those works, we are concerned with exploiting gist to improve the efficiency of the localization system. In other words, gist, extracted at an initial step, provides cues to exclude hypotheses being considered in the time consuming analysis of local features. This proves to be an important computation reduction approach.

3. G lobal localization with the N Q representation Global localization can be addressed as a recognition problem, where the current sensor data should be ascribed to a place from a map of the environment. In the case of visual

Page 8: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

localization, it takes the form of an image query, which should retrieve a place of similar appearance. The result of the query is based on the probability of the places given the image or, more generally, on an appearance similarity measure. Since we cast the localization problem as a recognition task, places in the environment are addressed as classes. Thus, the environment where localization is performed is modeled as a collection of classes, each one corresponding to a distinct place of the environment. In order to build this model, the path taken at the training stage is segmented and each subsequence of sensor readings is used to model a class. The training path is segmented through a mechanism defined as follows: at each moment the robot is assumed to be at a place j, which is described by a gist descriptor gj and a position pj; during the training tour, the current gist and odometry estimated position are compared to those of place j ; if either the gist distance or position distance is greater than a predefined threshold a new place j +1 is initialized, otherwise, gj and pj are updated through a running average of the gist and position vectors of the robot poses associated with place j. This section defines the proposed method for similarity computation with the NQ representation and the strategies for reducing the computational burden resulting from the query.

3.1 Computation of similarity The proposed method essentially consists of extracting SIFT features from the test image, computing their likelihood through an approximation of kernel density estimation, and then evaluating the similarity between the test image and the classes. The similarity computation resorts to the information content of features, so that relevance is given to the more discriminative ones (see Fig. 1). Each step of the process is detailed below. Firstly, a ROI detector is applied in the identification of stable patterns on the test image. The extracted image patches are codified in the form of SIFT descriptors,

resulting in a set of nf vectors fnddd ,...,, 21 .

In order to evaluate appearance similarity, the likelihood of the descriptor may be estimated using a kernel density estimator. Representing each class cj by a collection of

L descriptors, jL

jj ddd ,...,, 21 , extracted during the training stage, this estimator yields L

m

jmiji dddistK

LcdP

1

,1

)|( (1)

where K(.) is the Gaussian Kernel function, with standard deviation , and dist is the euclidean distance (commonly applied for SIFT descriptors comparison). As demonstrated in [17], a good approximation of that distribution is achieved by using exclusively the nearest neighbour to descriptor di. Accordingly, we use

jmimji dddistKcdP max)|( (2)

as an approximation of the kernel density estimator.

Page 9: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

ClassesTest image

c1 c2

Extract features

Collections of features

d1

d2

d3

d4di

Featuredescriptors

11d

12d

13d

21d

22d

23d

cnd1

cnd2cnd3

Density estimation

P(di |c1), P(di |c2) P(di |cn )

Information content

H(C |di)

P(C |di)

Similarity computationfn

iijij dCHcdPS

1

)|()|(

cnccnc

c

Fig. 1. Overview of the proposed method.

The amount of information conveyed by descriptor di may be measured by the mutual information between the feature and the classes as

)|()(),( iii dCHCHCdI (3)

As the entropy H(C) is fixed, features conveying more information are those producing lower values of the entropy H(C|di) , given by

jijiji dcPdcPdCH |log.|)|( (4)

where P(C|di) is computed applying rule:

)(

)()|()|(

i

jjiij dP

cPcdPdcP (5)

The entropy of this distribution, H(C|di), is interpreted as a measure of the information content of feature di, with a lower entropy implying more information. In eq. (5), a uniform prior for P(C) is used, as no previous knowledge is assumed. In Fig. 2, examples of low and high information content features are represented on a test image. The similarity Sj, between the test image and class cj, was defined as a function that accumulates evidence from the extracted features, but also balances their information content. Accordingly, in our experiments, the best results were obtained when similarity was computed as

fn

iijij dCHcdPS

1

)|()|( (6)

Page 10: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

1 2 3 40

0.5

1

1 2 3 40

0.5

1

1 2 3 40

0.5

1

1 2 3 40

0.5

1

C1 C2 C3 C4

C1 C2 C3 C4

c1 c2 c3 c4

c1 c2 c3 c4

Fig. 2. Examples of low and high information content features (best viewed in color). Above, test image depicting low (blue) and high (red) information content features. The bar charts represent probability of features in the 4 closest classes (with respect to gist distance). Below, instances of the training images of the 4 classes considered.

where parameter is set to adjust the significance of the entropy weight (see section 5 for values used).

3.2 Computation reduction methods In typical images, the number of local features may exceed a hundred, implying that the model of a place, trained with a few dozen images, will be represented by millions of features. The magnitude of this number raises the problem of the computation time needed for probability estimation (eq. 2), which requires the pairwise comparison of features. Defining the involved quantities as

nf nº of features in the test image nc nº of classes nfc nº of features in a class model

the computational complexity of an image query is O(nf ncnfc). In typical environments, this expression gives rise to computation times much higher than those of the BoW model, for which the computational complexity is O(nf nw), where nw denotes the number of words in the vocabulary. Below, two categories of methods are proposed to speed up the NQ method, by respectively lowering the values of nc and nfc, rather than changing the computational complexity of the algorithm. The first category of methods can be regarded as a hierarchical strategy to exclude classes being considered at different stages of the algorithm. Two methods are outlined next. Gist Selection (GS). In order to achieve a reduction in nc, some classes may be excluded based on an initial test of appearance similarity. Here, this is attained by resorting to the gist feature, which can be extracted at a low computational cost and provides a first

Page 11: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

impression of image content. For this purpose, each class model includes a gist descriptor, gj, which is compared with the one extracted from the test image, g. The likelihood of g is computed by evaluating the zero-centered Gaussian function (standard

deviation g) at a distance resulting from the comparison. The selection of classes cj for further processing is based on the condition

cgPcgPcgP j ))|()|((21

)|( 21 (7)

where classes c1 and c2 are the ones with highest values of P(g|ci). The constant is chosen as half the peak value of the Gaussian function used to estimate the likelihood. Progressive Selection (PS). In this method, classes are excluded whilst local features are included in a stepwise computation of similarity. For this purpose, descriptors are randomly ordered and grouped in sets of five descriptors. Similarity computation is performed gradually, with each step being defined as the inclusion of an additional set of features, followed by the updating of class probability. At each step, classes with a probability falling below a varying threshold are excluded from further processing. Let D denote the set of descriptors processed at step k and before, and nck the number of classes remaining at step k. Then, defining P(C|D)k as the corresponding probability distribution over classes, those classes cj for which the following condition holds are excluded:

c k

kj nDcP

12.0)|( (8)

Here, the probability distribution over classes is taken as the similarity function from eq. (6), after normalization. In the second category of methods, we achieve a reduction of nfc, by exploiting the specificities of the models construction process, which differs from general visual classification tasks. Given that the features in a class are extracted from a video sequence, two assumptions are made: i) similar descriptors extracted from the sequence are likely to refer to the same visual feature and do not need to be replicated in the class model, and ii) features that do not find a match in the sequence refer to image locations that are not robustly identified by the ROI detector. Assuming the probability of detecting these regions on the test image to be low, the corresponding descriptor may be eliminated from the class model, with low expected accuracy loss in the classifier. Following these assumptions, two methods are outlined next for the reduction of nfc. F eature Merging (FM). Each feature extracted from a new image is compared with those already included in the class model and, should a match be found, it is merged with its match through averaging. Features for which a match is not found are simply added to the model. Feature matching is determined through the comparison of the descriptor distances and a selected threshold (in this case, the standard deviation used in eq. 2).

Page 12: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

F eature Deletion (F D). The second method leads to the elimination of features which did not find a match after the application of the FM method. In our experiments, this rule is applied on classes with a number of training images exceeding 8, since classes trained with few images do not allow asserting, with confidence, how robustly a region of interest can be detected. It should be noted that the assumptions underlying these simplifications are specific to this problem, and when extended to general visual classification tasks, the same results can not be expected. For example, in object classification tasks, the training data includes images from different instances of an object class. In this case, to ensure generality, training images of the same class may differ strongly. Consequently, they may not produce the same feature matching rate that is found in a video sequence of a single place. Additionally, the rarity of a feature cannot be associated with robustness with respect to detection.

4. Discr iminativity and Information Content This section analyses the quantized and non-quantized representations with respect to their discriminativity and information content properties. In section 5, the two approaches will be evaluated in the context of place recognition. The chosen measure of feature i discriminativity is similar to the one used in [17], and is defined as

)|(max

)|(

, zitzz

tii cdP

cdPdisc (9)

where P(di|ct) is the feature di likelihood in the target class ct, and the denominator is the maximum of its likelihood in all other classes. In the case of the non-quantized representation, the likelihood is estimated as described in section 3.1 (eq. 2). For the estimation of the quantized features likelihood, a vocabulary of visual words was constructed by applying k-means clustering on the descriptors in the training data. In this case, visual words are used as features in eq. (9), and the probability of each word wi in class cj is estimated through Laplace smoothing, as suggested in [45]:

Iv

iji NN

NcwP

1| (10)

Here, Ni is the number of occurrences of word i in cj, Nv is the size of the vocabulary and NI is the number of features extracted from cj. In order to compare the two representations, for each one the probability distribution of discriminativity was estimated by a histogram (Fig. 3). In Fig. 3, a discriminativity greater that 1 corresponds to features contributing to a correct classification, whereas values less than 1 arise from misleading features. As a measure of the expected performance of each representation, the mean values of the distributions were computed. These values, represented in the figure, suggest the superiority of the NQ

Page 13: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

0.01

0.02

0.03

0.04

0.05

Feat

ure

frequ

ency

Discriminativity10-4 10-3 10-2 10-1 1 10 102 103 104

Q representation

NQ representation

Fig. 3. Frequency of features vs discriminativity (in logarithmic scale). Dashed lines indicate the mean values of the distributions.

Fig. 4. Scatter plot of discriminativity (logarithmic scale) vs entropy for a) NQ representation and b) Q representation. The gray lines depict mean value of discriminativity.

representation. Also, the distribution profiles show that, in the Q representation, the range of discriminativity is narrower than that obtained when no quantization is applied, confirming that the Q representation does not benefit from the differentiating power of highly discriminative features. In the NQ case, the distribution is more spread, towards both (i) and 0 (ii), revealing that although i) highly discriminative features can be represented, ii) misleading features can emerge too. In order to assess if information content can be used to filter out misleading features, in Fig. 4, we plot the discriminativity of features versus entropy. In the NQ representation (Fig. 4.a) there is a general tendency for features providing lower entropy to be more

discriminative, yielding a correlation of r 0.49 (p 0.0001), while in the Q

representation (Fig. 4.b) there is no such trend (r 0.005, p>0.05). These results suggest that, with the NQ representation, classification will improve through filtering the contribution of those features that produce higher entropy, however there is no evidence that such a strategy would benefit the Q representation approach.

Page 14: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

5. Results In order to test the methods under analysis, localization experiments were performed on the IDOL dataset [46] and the Freiburg section of the COLD dataset [47]. The two datasets share traits which make them suitable for testing vision-based global localization algorithms: each is comprised of several video sequences collected in the same environment, taking approximately the same path, but at different time instances and under different illumination conditions. The authors of the datasets empirically identified three lighting conditions, labeled as cloudy, sunny and night. They also included position ground truth to facilitate the evaluation of localization algorithms. For the purpose of our experiments, all images are processed for extraction of the visual features that feed the classification algorithms. Local features are identified through the scale-space extrema detector and encoded in SIFT descriptors. Following the practice that was tested in the robot localization work of Valgren and Lilienthal [26] , features are extracted at a constant angle, thus eliminating rotation invariance. Additionally, in the construction of the gist descriptor, local binary patterns (LBP) [48] are computed for all pixels of the image, using 8 sample points on a radius of 3 pixels. In order to increase discriminativity of the gist feature, coarse spatial information is extracted, by vertically dividing images in two sub-images and computing histograms of LBP occurrence for each. The concatenation of these vectors yields a gist descriptor of dimension 118 [49]. The comparison of these descriptors is performed through the Chi-Square distance. Three different classifiers are considered in the localization experiments:

i) Q - a Naive Bayes classifier based on the quantized representation [45], with the likelihood given by eq. (10),

ii) NQ - the proposed method (non-quantized representation), defined by eq. (6), with the entropy weighting removed,

iii) NQ + Hweight - the proposed method, with the entropy weighting included. The parameters used in these methods were empirically set for best performance, taking

the values =100, g=2000, =2 and nw=5500. In the evaluation of a localization method, the place with the highest score is returned by the classifier and then acceptance of this place is conditioned on a two step procedure: first, the set of robot poses that were selected to train the retrieved class is searched for the pose closest to the current one. Notice that poses are obtained from the ground truth values in the dataset; the class is accepted as correct if the metric and angular distances between the two poses are within the thresholds used to segment the training path, in our experiments, 3 meters and 55º respectively. These parameters define the granularity of the map and were adjusted in order to obtain a concise model of the environment. For each test sequence, recognition rate is computed as the ratio of correctly classified images to total images in the sequence.

Page 15: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

0

10

20

30

40

50

60

70

80

90

100

Training data conditions

Rec

ogni

tion

rate

[%]

Cloudy

Sunny

Night

Cloudy

Sunny

Night

Cloudy

Sunny

Night

cloudy sunny night

Rec

ogni

tion

rate

[%

]

Test conditions

NQ+HweightNQQ

Training conditionsa)

0

10

20

30

40

50

60

70

80

90

100

Training data conditions

Rec

ogni

tion

rate

[%]

Cloudy

Sunny

Night

Cloudy

Sunny

Night

Cloudy

Sunny

Night

cloudy sunny night

Test conditions

Training conditions

Rec

ogni

tion

rate

[%

]

NQ+HweightNQQ

b)

Fig. 5. Recognition rates under different illumination conditions for the a) IDOL and b) COLD-Freiburg datasets.

In the next subsection, the localization methods are compared with regard to their recognition performance and in subsection 5.2 we assess the outcomes of the application of the computation reduction methods.

5.1 Recognition performance of the two representations As the datasets provide data for three illumination conditions, nine different settings can be used to test the localization methods, by changing the training and test data conditions. For each pair of training/test illumination conditions, recognition performance was computed on all the combinations of the video sequences corresponding to the illumination pair. Since 4 videos are available in the IDOL dataset for each condition, this gives 16 test results for combinations of different illumination conditions and 12 test results for equal illumination. The final results, presented in Fig. 5.a are computed by averaging over the individual results obtained under the same illumination pair. A global analysis of Fig. 5.a indicates that illumination changes produce qualitatively similar performance variations across methods. As Pronobis et. al [50] pointed out, the cloudy condition seems to be intermediary between sunny and night, as it provides the training data with correspondingly higher performances for

Page 16: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

varying illumination as test data. Clearly, the strongest illumination variations occur between night and sunny conditions, followed by changes between cloudy and night conditions. A prominent characteristic of these results is the consistent relation amongst the performances of the three methods: the NQ method outperforms the Q method in all cases, and the NQ+Hweight method shows recognition rates higher or equal to those of the NQ method. These differences are more pronounced in the presence of strong illumination variations, where the NQ method reaches a maximum increase in performance of 15% over the Q method and the NQ+Hweight method a maximum increase of 5% over NQ method. Fig. 5.b. shows place recognition results obtained from the COLD dataset. In this case, the number of sequences available for each condition varies from 3 (for cloudy and night illumination) to 4 (for sunny illumination), implying that the number of combinations ranges from 6 to 12. The performances here are generally higher than those found in the results on the IDOL dataset, probably being due to the fact that, here, a large portion of the environment (the corridor part) is only slightly affected by illumination variations. For most illumination pairs, relative performances of the methods follow the same pattern found in the IDOL dataset, with the NQ method outperforming the Q method and the NQ+Hweight method being superior to the NQ method. However, the differences in performance are modest, which can be explained by the fact that, in this case, where changes in illumination are present, they tend to severely affect the image content, due to the auto-exposure mode of the camera. In order to gain further insight about the similarity distribution produced by each method, we analyze it under a precision-recall formulation. For this purpose, the similarity distribution over classes, given by the NQ and NQ+Hweight methods are normalized and considered as a probability distribution. The Naive Bayes classifier, used in the Q method produces a probability distribution, therefore no additional processing of results was needed. With a probability distribution over classes at hand, the retrieval of classes is dependent on a probability threshold, varying from 0 to 1. Each plot in Fig. 6 depicts the precision-recall curves for the three methods obtained from tests with the IDOL dataset, using cloudy, sunny and night test data, respectively, and cloudy training data, chosen for being intermediary between the other two. The figures show that, throughout the recall range, the precision of the non-quantized approaches is higher than that of the quantized approach. Furthermore, the NQ+Hweight method precision is the highest in almost all cases. Also noteworthy is the fact that the NQ+Hweight method shows significantly larger recall values for a precision of 1, in the cases of sunny and night test data.

5.2 Computation reduction methods In this subsection we analyze the influence of the computation reduction methods on the performance and computation times of the non-quantized representation approach, through tests performed on the IDOL dataset. All algorithms were programmed in

Page 17: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cisi

on

Recall0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.2

0.4

0.6

0.8

1

Pre

cisi

on

Recall0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cisi

on

Recall

Cloudy test data Sunny test data Night test dataNQ+Hweight

NQ

Q

a) b) c)

Pre

cisi

on

Pre

cisi

on

Pre

cisi

on

Recall Recall Recall Fig. 6. Precision-recall curves obtained for all test conditions and constant (cloudy) training data with the IDOL dataset.

85

90

95

100

Initial

FM

FM+FD

FM+FD+GS

FM+FD+GS+PS

Rec

ogni

tion

perfo

rman

ce [%

]

0

0.1

0.2

0.3

0.4

Com

puta

tion

time

[s]

45

50

60

70

80

90

100

Initial

FM

FM+FD

FM+FD+GS

FM+FD+GS+PS

Rec

ogni

tion

perfo

rman

ce [%

]

0

0.1

0.2

0.3

0.4

Com

puta

tion

time

[s]

Recognition rate

Comp. time

Recognition rate

Comp. time

Stable illumination (cloudy-cloudy) Variable illumination (sunny-night)a) b)

Rec

ogni

tion

rate

[%

]

Rec

ogni

tion

rate

[%

]

Fig. 7. Recognition rate and computation time obtained with the application of the computation reduction methods, on the IDOL dataset (NQ+Hweight method).

Matlab and executed on a core 2 duo CPU, running at 2.2 GHz. Fig. 7 presents the recognition rate and the computation time of the NQ+Hweight method for combinations of the computation reduction methods. Fig. 7.a depicts results obtained under stable illumination conditions (cloudy), while results in Fig 7.b were taken under large illumination variations (sunny and night data used both as training and test data). The figures present mean values obtained over the possible combinations of sequences under the mentioned illumination conditions. Recognition rate under stable illumination is only slightly affected by the application of the computation reduction methods: when all methods were applied, a decrease of 0.4% in performance was observed. In the case of strong illumination variations, a decrease of 2% in performance, caused by the application of the FD method, is observed. The application of subsequent methods results in slight gains in performance. With respect to the computation times that were measured, the three first methods behave similarly in both cases. Reduction in computation time is about 55%, 45% and 40% with the FM, FD and GS methods, respectively. In contrast to this, results from the PS method seem to be more dependent on the dissimilarity between test and training data: a 58% reduction is achieved under stable illumination, while a 28% reduction was measured under varying illumination. This may be explained by the fact that when the test and the training data are similar, some features in the models resemble closely those

Page 18: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

of the test image. As these features are processed, the probability distribution over classes rapidly becomes unbalanced, favouring the correct classes. This allows discarding unlikely classes earlier than under varying illumination. In the worst case, with all the methods applied, we obtain a reduction in computation time of one order of magnitude. Table 1 synthesizes the results obtained for the Q method and the NQ+Hweight method with all the proposed speed up strategies applied. In addition, techniques such as KD-trees and ANN (Approximate Nearest Neighbor) search could be applied both on the Q and NQ representations, reducing the computation time of the corresponding localization methods. However, since our purpose is to analyze the relative run time of the Q and NQ approaches, only the proposed speed up methods were applied. The methods are evaluated under stable (cloudy training and test data, first row) and variable illumination (sunny and night, used both as training and test data, second row). Results show that when the computation reduction strategies are applied, the NQ+Hweight method recognition rate still compares favourably against the quantized approach. With respect to the computation times, the NQ+Hweight method yields a significantly lower value under stable illumination and a slightly higher value in the case of variable illumination, explained by the behaviour of the PS method under this circumstance. Table 1 Recognition rate and computation time obtained with the two representations, under stable and varying illumination.

Illumination Q NQ+Hweight

Recognition rate [%]

Computation time [s]

Recognition rate [%]

Computation time [s]

Cloudy-cloudy 94.89 0.034 96.28 0.019

Sunny-night 48.36 0.034 64.09 0.036

6. Discussion In recent years, the use of local features has become a popular approach to describe image appearance in localization studies. Similarly to our approach, some authors devised systems where the number of model features is allowed to grow with the environment size and complexity [2,9,37,38]. However, in spite of the fact that the fixed vocabulary approach has been discarded in those studies, to our knowledge an assessment of the benefits of such approach had not been performed before. The evaluation carried in this work substantiates a higher robustness of the non-quantized approach, which may significantly outperform the quantized representation under variable illumination. Furthermore, the analysis of the entropy and discriminativity properties has shown a significant relation between these variables in the non-quantized descriptors, which was not observed in the quantized representation. This finding is supported by the localization results of the NQ+Hweight method, where the entropy-discriminativity relation is used to predict which features are more discriminative.

Page 19: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

The analysis of precision-recall curves suggests further benefits of the non-quantized representation. When comparing the Q and NQ methods, the latter results in a higher value of recall at a precision of 1, a relation that is even stronger when the NQ+Hweight method is considered. In the context of Self-Localization And Mapping (SLAM), false positives are highly undesired, because the assignment of a wrong loop closure may have ruinous consequences on the topology of the map. Thus, as high recall for low false positives is a determining factor to the success of loop closure detectors, we believe this approach to be promising in improving systems that tackle this problem. Previously, the drawbacks of using the non-quantized representation have been the size of the model of the environment and the computational burden of the localization algorithm. Hierarchical schemes of image analysis can be employed to reduce that burden. In a first step, these schemes usually apply a fast appearance matching test, which allows narrowing the range of candidate locations evaluated in the more expensive similarity computations (see, for instance, [48,49]). Results in this work also confirm the effectiveness of that approach, in this case by resorting to a global image descriptor based on LBP features, which is very simple to compute. In addition, we introduced a progressive selection method, which gradually discards candidate places as the localization algorithm is processed. Results obtained through this method suggest that it offers a good balance between computation reduction and recognition rate, as it seems to adapt the complexity of the computation to the difficulty of the problem. In our case, a significant reduction in computation time was observed under varying illumination whilst a substantial reduction was achieved under stable illumination. Computation times of localization algorithms can be further reduced through a simplified model of the environment. Results in subsection 5.2 have shown that such

simplification can be used, without a significant loss in performance an overall reduction in computation time of about 75% was achieved, with only a 2% decrease in recognition rate being observed under strong illumination variation. The simplifications applied are possible through the use of the concept of place in the model of the environment, which allows treating features from consecutive images as describing the same entity. The approach taken in this paper compares the two extreme representations in what could be understood as a spectrum going from pure quantization to non-quantization. In recent years, studies in large scale image retrieval have shown that it is possible to reduce the performance gap between the two representations, with modifications of the BoW model which carried either higher memory requirements [18], or additional work at run time [19]. Thus, while the approach taken in our work reveals the maximum benefits of the non-quantized representation, alternative solutions exist which should offer intermediate performance and computational requirements between the Q and NQ methods. In particular, the problem of localizing in large scale environments may force to relax the non-quantization property. In this case, methods such as soft-assignment [18] and hamming embedding [19] can be advantageously used to achieve an adequate space-speed-accuracy trade off.

Page 20: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

7. Conclusions In this paper we investigated the performance gap between the quantized and non-quantized representations of features in the global localization problem. It was experimentally demonstrated that the non-quantized representation provides higher robustness, displaying a considerably better performance under strong illumination variations. A method to further improve the non-quantized representation approach was introduced. It resorts to the finding that, in this representation, a significant relation between the entropy of features and their discriminativity exists. Building on this relation, an image similarity measure was proposed, which assigns relevance to features according to their discriminativity. Experimental results obtained with this method support its superiority in the global localization task and suggest that performance gains can be achieved in the loop closure problem. Two strategies for reducing the computational requirements of the non-quantized representation approach were also introduced. The first is a hierarchical strategy to exclude classes at different stages of the algorithm. Initially, appearances are compared with a fast computing global image feature, resulting in the elimination of a significant number of candidate places. Subsequently, the set of local features is treated in a step-by-step process, during which places are eliminated according to their current probability. The second strategy aims at a simplification of the models of places, by reducing the number of features in the models. This reduction makes two assumptions about the training data, namely that, given a sufficient number of images of a place, features are either redundant or inconsistently detected. The simultaneous application of the speedup methods resulted, with low performance degradation, in computation times lower or comparable to those of the quantized method, in a small scale environment. Building on our current results, future work will extend our implementation of the NQ representation to the problem of loop closure detection. Furthermore, we plan to integrate appearance information from 3D data in our localization system. With this approach, we aim at exploiting the complementary nature of 3D data and vision, in order to increase localization robustness and alleviate the computation requirements in handling visual data.

References

[1] K.S. Chong, L. Kleeman, Accurate odometry and error modelling for a mobile robot, in: IEEE International Conference on Robotics and Automation, 1997, pp. 2783-2788.

[2] F. Li, X. Yang, J. Kosecka, Global Localization and Relative Positioning Based on Scale-Invariant Keypoints, Robotics and Autonomous Systems. 52 (2005) 27-38.

Page 21: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

[3] S. Koenig, R.G. Simmons, Xavier: A Robot Navigation Architecture Based on Partially Observable Markov Decision Process Models, in: D. Kortenkamp, R. Bonasso, R. Murphy (Eds.), Artificial Intelligence Based Mobile Robotics: Case Studies of Successful Robot Systems, MIT Press, 1998, pp. 91-122.

[4] D. Fox, W. Burgard, S. Thrun, Markov Localization for Mobile Robots in Dynamic Environments, Journal of Artificial Intelligence Research. 11 (1999) 391-427.

[5] S. Thrun, D. Fox, W. Burgard, F. Dellaert, Robust Monte Carlo Localization for Mobile Robots, Artificial Intelligence. 128 (2001) 99-141.

[6] A. Milstein, J.N. Sánchez, E.T. Williamson, Robust global localization using clustered particle filtering, in: Proceedings of the National Conference on Artificial Intelligence, 2002, pp. 581-586.

[7] C. Guanghui, N. Matsuhira, J. Hirokawa, H. Ogawa, I. Hagiwara, Mobile robot global localization using particle filters, in: International Conference on Control, Automation and Systems, 2008, pp. 710-713.

[8] M. Cummins, P. Newman, FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance, The International Journal of Robotics Research. 27 (2008) 647-665.

[9] T. Goedemé, M. Nuttin, T. Tuytelaars, L. Van Gool, Omnidirectional Vision Based Topological Navigation, International Journal of Computer Vision. 74 (2007) 219-236.

[10] H. Tamimi, A. Zell, Global visual localization of mobile robots using kernel principal component analysis, in: IEEE/RSJ International Conference on Intelligent Robots and Systems, 2007, pp. 1896-1901.

[11] A. Ramisa, A. Tapus, D. Aldavert, R. Toledo, Robust vision-based robot localization using combinations of local feature region detectors, Autonomous Robots. 27 (2009) 373-385.

[12] G. Steinbauer, H. Bischof, Illumination insensitive robot self-localization using panoramic eigenspaces, in: RoboCup 2004, Lecture Notes in Computer Science, 2005, pp. 84-96.

[13] P. Lamon, A. Tapus, E. Glauser, N. Tomatis, R. Siegwart, Environmental modeling with fingerprint sequences for topological global localization, in: IEEE/RSJ International Conference on Intelligent Robots and Systems, 2003, pp. 3781-3786.

[14] D.G. Lowe, Object Recognition from Local Scale-Invariant Features, in: IEEE International Conference on Computer Vision, 1999, pp. 1150-1157.

Page 22: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

[15] J. Sivic, A. Zisserman, Video Google: a text retrieval approach to object matching in videos, in: IEEE International Conference on Computer Vision, 2003, pp. 1470-1477.

[16] F. Jurie, B. Triggs, Creating efficient codebooks for visual recognition, in: IEEE International Conference on Computer Vision, 2005, pp. 604-610.

[17] O. Boiman, E. Shechtman, M. Irani, In defense of Nearest-Neighbor based image classification, in: IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1-8.

[18] J. Philbin, O. Chum, M. Isard, J. Sivic, A. Zisserman, Lost in quantization: Improving particular object retrieval in large scale image databases, in: IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1-8.

[19] H. Jegou, M. Douze, C. Schmid, Hamming embedding and weak geometric consistency for large scale image search, in: European Conference on Computer Vision, 2008, pp. 304-317.

[20] D. Nister, H. Stewenius, Scalable recognition with a vocabulary tree, in: IEEE Conference Computer Vision and Pattern Recognition, 2006, pp. 2161-2168.

[21] F.M. Campos, L. Correia, J.M.F. Calado, Mobile robot global localization with non-quantized SIFT features, in: The 15th International Conference on Advanced Robotics, 2011, pp. 582-587.

[22] J. Kosecká, F. Li, Vision based topological Markov localization, in: IEEE International Conference on Robotics and Automation, 2004, pp. 1481 1486.

[23] F. Fraundorfer, C. Engels, D. Nister, Topological mapping, localization and navigation using image collections, in: IEEE/RSJ International Conference on Intelligent Robots and Systems, 2007, pp. 3872-3877.

[24] A. Angeli, D. Filliat, S. Doncieux, J. Meyer, A Fast and Incremental Method for Loop-Closure Detection Using Bags of Visual Words, IEEE Transactions on Robotics, Special Issue on Visual Slam. 24 (2008) 1027-1037.

[25] J. Wang, R. Cipolla, H. Zha, Vision-based Global Localization Using a Visual Vocabulary, in: IEEE International Conference on Robotics and Automation, 2005, pp. 4230-4235.

[26] C. Valgren, A.J. Lilienthal, SIFT, SURF & seasons: Appearance-based long-term localization in outdoor environments, Robotics and Autonomous Systems. 58 (2010) 149-156.

[27] A.C. Murillo, J.J. Guerrero, C. Sagues, SURF features for efficient robot localization with omnidirectional images, in: IEEE International Conference on Robotics and Automation, 2007, pp. 3901-3907.

Page 23: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

[28] L. Kunze, K. Lingemann, A. Nüchter, J. Hertzberg, Salient Visual Features to Help Close the Loop in 6D SLAM, in: Proceedings of the ICVS Workshop on Computational Attention & Applications, 2007.

[29] H. Kang, A.A. Efros, M. Hebert, T. Kanade, Image matching in large scale indoor environment, in: IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2009, pp. 33-40.

[30] V. Pradeep, G. Medioni, J. Weiland, Visual loop closing using multi-resolution SIFT grids in metric-topological SLAM, in: IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 1438-1445.

[31] B. Schiele, J.L. Crowley, Transinformation of object recognition and its application to viewpoint planning, Robotics and Autonomous Systems. 21 (1997) 95-106.

[32] B. Krose, N. Vlassis, R. Bunschoten, Y. Motomura, A probabilistic model for appearance-based robot localization, Image and Vision Computing. 19 (2001) 381-391.

[33] G. Fritz, C. Seifert, L. Paletta, H. Bischof, Attentive object detection using an information theoretic saliency measure, Attention and Performance in Computational Vision. 3368 (2005) 29-41.

[34] E. Parzen, On estimation of a probability density function and mode, The Annals of Mathematical Statistics. 33 (1962) 1065-1076.

[35] G. Fritz, C. Seifert, L. Paletta, H. Bischof, Learning Informative SIFT Descriptors for Attentive Object Recognition, in: 1st Austrian Cognitive Vision Workshop, 2005, pp. 67-74.

[36] E. Fazl-Ersi, J. Elder, J. Tsotsos, Hierarchical appearance-based classifiers for qualitative spatial localization, in: IEEE/RSJ International Conference on Intelligent Robots and Systems, 2009, pp. 3987-3992.

[37] D. Filliat, A visual bag of words method for interactive qualitative localization and mapping, in: IEEE International Conference on Robotics and Automation, 2007, pp. 3921-3926.

[38] C. Valgren, A. Lilienthal, T. Duckett, Incremental Topological Mapping Using Omnidirectional Vision, in: IEEE/RSJ International Conference on Intelligent Robots and Systems, 2006, pp. 3441-3447.

[39] A. Oliva, A. Torralba, Modeling the shape of the scene: a holistic representation of the spatial envelope, International Journal of Computer Vision. 42 (2001) 145-175.

[40] A. Torralba, Contextual Priming for Object Detection, International Journal of Computer Vision. 53 (2003) 169-191.

Page 24: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

[41] A. Torralba, A. Oliva, M. Castelhano, J.M. Henderson, Contextual guidance of eye movements and attention in real-world scenes: The role of global features in object search, Psychological Review. 113 (2006) 766-786.

[42] C. Siagian, L. Itti, Biologically Inspired Mobile Robot Vision Localization, IEEE Transactions on Robotics. 25 (2009) 861-873.

[43] A.C. Murillo, J. Kosecka, Experiments in place recognition using gist panoramas, in: IEEE Workshop on Omnidirectional Vision, Camera Netwoks and Non-Classical Cameras, ICCV, 2009, pp. 2196-2203.

[44] K. Ni, A. Kannan, A. Criminisi, J. Winn, Epitomic location recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence. 31 (2009) 2158-2167.

[45] G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray, D. Maupertuis, Visual Categorization with Bags of Keypoints, in: ECCV Workshop on Statistical Learning in Computer Vision, 2004, pp. 59-74.

[46] J. Luo, A. Pronobis, B. Caputo, P. Jensfelt, The KTH-IDOL2 Database. Technical Report CVAP304, Kungliga Tekniska Hoegskolan, CVAP/CAS, Stockholm, Sweden, 2006.

[47] A. Pronobis, B. Caputo, COLD: The CoSy Localization Database, The International Journal of Robotics Research. 28 (2009) 588-594.

[48] T. Ojala, M. Pietikainen, T. Maenpaa, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns, IEEE Transactions on Pattern Analysis and Machine Intelligence. 24 (2002) 971-987.

[49] M. Topi, O. Timo, P. Matti, S. Maricor, Robust texture classification by subsets of local binary patterns, in: International Conference on Pattern Recognition, 2000, pp. 935 938.

[50] A. Pronobis, B. Caputo, P. Jensfelt, H.I. Christensen, A Discriminative Approach to Robust Visual Place Recognition, in: IEEE/RSJ International Conference on Intelligent Robots and Systems, 2006, pp. 3829-3836.

[51] A. Murillo, C. Sagues, J.J. Guerrero, T. Goedemé, T. Tuytelaars, L. Van Gool, From omnidirectional images to hierarchical localization, Robotics and Autonomous Systems. 55 (2007) 372-382.

[52] K. Konolige, J. Bowman, J.D. Chen, P. Mihelich, M. Calonder, V. Lepetit, et al., View-based maps, The International Journal of Robotics Research. 29 (2010) 941-957.

Page 25: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

F rancisco M ateus M arnoto de O livei ra Campos (born, 1975) obtained a degree in Mechanical Engineering from Instituto Superior Técnico of Technical University of Lisbon in 1998, and a Master Degree in Mechanical Engineering from the same Institution in 2003. Since 1999, he has been with the Polytechnic Institute of Lisbon Instituto Superior de Engenharia de Lisboa (ISEL), initially as an assistant and since 2005 as auxiliary professor in the Mechanical Engineering Department Control Systems Group. In the past few years he has developed work in the area of appearance-based methods for robot localization and navigation.

*Biography of each author

Page 26: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

*Biography of each author

Page 27: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

João M . F . Calado (born, 1962, Portugal) got the degree in Electrical and Electronics Engineering at Technical

University of Lisbon in 1986, a Diploma of Graduate Studies in Maritime Machinery Control Systems at Escola

Náutica Infante D. Henrique (Paço de Arcos, Portugal) in 1990 and the PhD degree in Control Engineering at the

City University (London, United Kingdom) in 1996. Since 1998, he has been with the Polytechnic Institute of

Lisbon Instituto Superior de Engenharia de Lisboa (ISEL) in a permanent position as Associate Professor, being

since June 2009, Full Professor in the Mechanical Engineering Department Control Systems Group. Actual main

research interests are artificial intelligence techniques applied to FDI/FTC, namely multi-agent architectures and

automatic control. He is also interested in collaborative approaches, process identification, mobile robotics, remote

supervision and control, artificial intelligence techniques applied to image processing.

*Biography of each author

Page 28: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

*Photo of each author

Page 29: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

*Ph

oto

of

each

au

tho

r

Page 30: Global localization with non-quantized local image featureslcorreia/papers/RAS-2012-online.pdf · Global Lo calization with Non-Quantiz ed Local Imag e Features F. M. Campos a,b,*,

*Ph

oto

of

each

au

tho

r