Combining Classiﬁcation and Regression for WiFi …robotics.ucmerced.edu/sites/robotics.ucmerced.edu/files/...X (m) Y (m) 6 10 14 18 16 18 20 Fig. 1. Illustrative representation

Combining Classification and Regression for WiFi Localization ofHeterogeneous Robot Teams in Unknown Environments

Benjamin Balaguer, Gorkem Erinc, and Stefano Carpin

Abstract— We consider the problem of team-based robotmapping and localization using wireless signals broadcast fromaccess points embedded in today’s urban environments. Wemap and localize in an unknown environment, where the accesspoints’ locations are unspecified and for which training data is apriori unavailable. Our approach is based on an heterogeneousmethod combining robots with different sensor payloads. Thealgorithmic design assumes the ability of producing a map inreal-time from a sensor-full robot that can quickly be sharedby sensor-deprived robot team members. More specifically, wecast WiFi localization as classification and regression problemsthat we subsequently solve using machine learning techniques.In order to produce a robust system, we take advantage of thespatial and temporal information inherent in robot motion byrunning Monte Carlo Localization on top of our regressionalgorithm, greatly improving its effectiveness. A significantamount of experiments are performed and presented to provethe accuracy, effectiveness, and practicality of the algorithm.

I. INTRODUCTION

As a result of the evident necessity for robots to localizeand map unknown environments, a tremendous amount ofresearch has focused on implementing these primordial abil-ities. Localization problems have been extensively studiedand a variety of solutions have been proposed, each assum-ing different sensors, robotic platforms, and scenarios. Theincreasingly popular trend of employing low-cost multi-robotteams [14], as opposed to a single expensive robot, providesadditional constraints and challenges that have received lessattention. A tradeoff naturally arises, because reducing thenumber of sensors will effectively decrease the robots’ pricewhile making the localization problem more challenging.We anticipate that team-based robots will require WiFitechnology to exchange information between each other. Wealso foresee robots will continue to supply rough estimationsof local movements, via odometry or similar inexpensive lowaccuracy sensors. These team-based robots have the advan-tage of being very affordable. It is clear, however, that theserobots would not be practical in unknown environments dueto their lack of perception abilities and, as such, we embracean heterogeneous setup pairing a lot of these simple robotswith a single robot capable of mapping an environment bytraditional means (e.g., SLAM using a laser range finder orother sophisticated proximity sensors). Within this scenario,our goal is to produce a map of an unknown environmentin real-time using the more capable robot, so that the lesssophisticated robots can localize themselves.

Given the sensory constraints imposed on the robots, weexploit wireless signals from Access Points (APs) that have

School of Engineering, University of California, Merced, CA, USA,{bbalaguer,gerinc,scarpin}@ucmerced.edu.

become omnipresent in today’s urban infrastructure. Wirelesssignals are notoriously difficult to work with due to non-linear fading, obstructions, and multipath effects, but a lo-calizer based on WiFi offers its share of advantages. Indeed,WiFi APs are uniquely identifiable, can be used indoor oroutdoor, and are already part of the unknown environmentto be mapped. Additionally, transmitted signals are availableto anyone within range, allowing robots to exploit APswithout ever connecting to them. We cast the problem ofWiFi localization as a machine learning problem, whichwe initially solve using classification and regression theory.Then, thanks to a Monte Carlo Localization (MCL) step, weimprove the localizer by exploiting the spatial and temporalinformation inherently encoded within the robots’ odometry.In this manuscript we offer the following contributions:

• we implement and contrast six algorithms, three pre-viously published and three unpublished in the WiFilocalization literature, that solve the problem cast asclassification;

• we propose a novel regression algorithm that buildsupon the best classification algorithm;

• we develop an end-to-end WiFi localization algorithmthat adds odometry for robustness, implemented asMCL;

• we outline crucial design choices with respect to thealgorithm’s applicability to unknown environments andreal-time performance;

• we evaluate and compare numerous algorithms acrossdifferent datasets, which is, to the best of our knowl-edge, the first attempt to unequivocally establish the bestmethod for WiFi localization.

The rest of the paper is organized as follows. SectionII highlights previous work on WiFi localization, coveringboth the classification and regression approaches. In sec-tion III we describe the problem statement and introducemathematical definitions used throughout the paper. Sinceeach component of our localizer depends on the results ofthe previous one, we structure the algorithm’s descriptionsomewhat unorthodoxly. More specifically, we present ourfirst component, classification, in Section IV-A followeddirectly by its results in Section IV-B. In Section V-A, ourfindings from classification are exploited for our secondcomponent, regression, the results of which are providedin Section V-B. We finalize the algorithm in Section VI-A, presenting the results in Section VI-B. Final remarks andfuture work conclude the paper in Section VII.

II. RELATED WORK

The utilization of wireless signals transmitted from APsor home-made sensors has enjoyed great interest from therobotics community, in particular for its applicability to thelocalization problem. In this area, the application of data-driven methods has prevailed thanks to two distinct butpopular approaches. On one hand, the modeling approachattempts to understand, through collected data, how thesignal propagates under different conditions, and the goalis to generate a signal model that can then be exploitedfor localization. On the other hand, the mapping approachdirectly uses collected data by combining spatial coordinateswith wireless signal strengths to create maps from which arobot can localize. Since signals are distorted due to “typicalwave phenomena like diffraction, scattering, reflection, andabsorption” [3], practical implementations of signal modelingare not yet available for unknown environments since theyneed to be trained in similar conditions to what will beencountered (i.e., they require at least some a-priori infor-mation about the environment) [21]. Moreover, it has beenshown in [11] that a signal strength map should yield betterlocalization results than a parametric model. Consequently,the rest of this section highlights related works regardingthe mapping approach, which we employ for our algorithm.Interested readers are referred to [9] for more informationon signal modeling techniques.

During a training phase, signal mapping techniques tiea spatial coordinate with a set of observed signal strengthsfrom different APs, essentially creating a signal map. Given anew set of observed signal strengths acquired at an unknowncoordinate, the goal is to use the map to retrieve the correctspatial coordinate. A variety of methods have been devised tosolve this problem. Two of the earliest solutions used nearestneighbor searches [1] and histograms of signal strengths foreach AP [20]. Starting from the observation that histogramsof signal strengths are generally normally distributed [8],various other methods have recently been proposed usingGaussian distributions to account for the inevitable variancein received signal strengths [11], [15], [2]. Specifically, foreach location and each AP, a Gaussian distribution is derivedfrom training data. An unknown location described by a newset of observed signal strengths is then determined usingBayesian filtering. These methods exploit the inherentlyavailable spatial and temporal information of a moving robotthrough a Hidden Markov Model (HMM) [15] or MCL[2], which, unlike the HMM, requires additional sensoryfeedback (e.g., odometry). Another recent approach, fromthe wireless network community, exploits Support VectorMachines by formulating the problem, similarly to what wepropose in this manuscript, as a classification instance [19].

For completeness, we briefly mention the works of Duval-let et al. [6], Ferris et al. [7], and Huang et al. [12] that arebased on Gaussian processes. Unfortunately, they require along parameter optimization step that makes them difficultto use in our scenario, where real-time operation is requiredand computational power limited.

III. SYSTEM SETUP AND PROBLEM DEFINITION

We consider a team of r robots, one of which is a“sensor-full” robot, the mapper, capable of building a mapby traditional means (e.g., SLAM, GPS). The remaining r−1are “low-cost robots”, the localizers, whose only sensorsare odometry and a WiFi card. The goal is to developa system where the mapper creates a WiFi map that thelocalizers can successfully use to localize, strictly utilizingtheir limited sensors. For experimental purposes, we use aMobileRobots P3AT equipped with an LMS200 Laser RangeFinder (LRF) as our mapper and various iRobot Createplatforms as our localizers. The mapper robot uses gmapping[10] to solve the SLAM problem. All of the code runs on atypical consumer laptop, without relying on multiple cores,GPUs, or extensive memory. Our approach is split into amapping phase, involving the mapper, and a localizationphase involving the localizers. During the mapping phase, themapper periodically collects WiFi signal strengths from allthe APs in range and associates them with current Cartesiancoordinates provided by gmapping. In terms of notation,for every sample Cartesian position Cp = [Xp, Yp], anobservation Zp = [z1p, z

2p, . . . , z

|a|p ] is acquired using a WiFi

card, where |a| is the total number of APs seen throughoutthe environment. Each signal strength zap is measured in dBm,the most commonly provided measurement from hardwareWiFi cards that typically range from -90 to -10 dBm withlower values (e.g., -90) indicating worst signal strengths [13].We note that since the environment’s APs cannot all beseen from a single location, the observation vector, Zp, isdynamically increased as the robot moves in the environmentand identifies previously unseen APs. Additionally, sincesome APs cannot be seen from certain locations, we setzap = −100 for any AP a that cannot be seen from locationp. In order to obtain an indication of the signal strengthnoise and increase the algorithm’s robustness, we collectmultiple observations at each location, resulting in a vectorof observations for each location, Zp = [Z1

p , Z2p , . . . , Z

|s|p ],

where |s| is the total number of observations performed ateach location p. The entire data used to build a completeWiFi map in real-time, acquired by the mapper, can then berepresented as a matrix T of locations and observations

T =

C1 Z1

1 Z21 . . . Z

|s|1

C2 Z12 Z2

2 . . . Z|s|2

. . .

C|p| Z1|p| Z2

|p| . . . Z|s||p|

where |p| is the total number of positions for which WiFisignals were acquired. Therefore, T is comprised of |s|observations for each of the |p| locations and each of the |a|access points in our map, where each observation is labeledwith its Cartesian coordinates acquired from gmapping, Cp.A typical sample of the data collected for the WiFi map isshown in Figure 1.

Two parameters are evidently important, the number oflocations to consider for the WiFi map, |p|, and the numberof observations performed at each of those locations, |s|.

X (m)

Y (m

)

6 10 14 18

16

18

20

Fig. 1. Illustrative representation of the WiFi readings for 7 APs,superimposed on the map. The size of each ring represents the AP’s signalstrength (the bigger, the better) and each color indicates a unique AP.

We have no control over the parameter |a|, the total num-ber of APs, because it is dictated by the structure of theunknown environment. Evidently, |p| will be proportionalto the mapped environment’s size and dependent on themeasurements’ density. We have experimentally found, ashave authors of [2], [15], that recording observations approx-imately every meter yields accurate maps, but no restrictionsare placed on the locations’ alignment (i.e., they do not needto be grid-aligned or follow any particular structure). Theparameter |s| encompasses an important tradeoff that weanalyze in Section IV-B since the higher the |s|, the betterthe noise model but the longer the mapping process. We con-clude this section by mentioning two important observationsfrom analyzing the data collected for this paper. First, wehave found signal strength readings to be consistent acrossdifferent robots having similar hardware, a fact corroboratedby [11] that implies the possibility of sharing WiFi mapsamongst robots. Second, signal strength measurements takenat different days or times were, although different, notsignificantly so, indicating that WiFi maps acquired at acertain time can still be effective for robots operating in thesame environment at a later time.

IV. CLASSIFICATION

A. Description

Figure 1 gives a good qualitative indication that it ispossible to differentiate between different locations by onlyconsidering signal strengths received from APs. We startdeveloping the WiFi localizer by casting it as a machinelearning classification problem. Mathematically, we producea function f : Z → p from the training data T acquiredby the mapper. Function f takes a new observation Z =[z1, . . . , z|a|] acquired by one of the localizers and returns thelocation p of the robot, from which we can easily look up theCartesian coordinate, Cp. We note that casting the problemin such manner is a simplification, since a robot needs to bepositioned exactly at one of the locations in the training datain order for f to return the exact coordinate Cp (i.e., classifi-cation does not perform interpolation or regression). Solvingthe classification problem is nevertheless an important stepsince the effectiveness of various algorithms can be evaluatedand applied to build a better localizer, as will be shown in thenext sections. Computing f from T can be achieved using

different techniques and to the best of our knowledge, nocomparison has been made in the literature to find the bestperforming algorithm. Consequently, we implement a totalof six algorithms, three of which have been published inthe WiFi localization literature (Gaussian Model, SupportVector Machine, and Nearest Neighbor Search) and threeothers that have enjoyed popularity in machine learning tasks(Decision Tree, Random Forest, and Multinomial Logit). Webriefly present each algorithm in the context of our prob-lem’s definition, the description of which are kept short dueto space restrictions, and provide references for interestedreaders looking for more details.

Decision Tree [5]: a decision tree is a binary tree con-structed automatically from the training data T . Each nodeof the tree corresponds to a decision made on one of theinput parameters, zap , that divides the node’s data into twonew subsets, one for each of the node’s sub-trees, in such away that the same target variables, p, are in the same subsets.The process is iterated in a top-down structure, workingfrom the root (whose data subset is T ) down to the leaves,and stops when each node’s data subset contains one andonly one target variable or when adding new nodes becomesineffective. Various formulas have been proposed to computethe “best” partitioning of a node’s data subset, the mostpopular of which being the Gini coefficient, the twoing rule,and the information gain. After experimental assessments,we found the choice of partitioning criterion insignificantin terms of localization accuracy and use, arbitrarily, theGini coefficient. In order to classify a new observation Z,the appropriate parameter zi is compared at each node, thedecision of which dictates which branch is taken - a step thatis repeated until a leaf is reached. The target variable p atthe leaf is the location of the robot.

Random Forest [4]: random forests are an ensemble ofdecision trees built to reduce over fitting behaviors oftenobserved in single decision trees. This is achieved by creatinga set of |d| trees, as described in the previous paragraph,each with a different starting dataset, Td, selected randomlywith replacement from the full training data T . An additionaldifference comes from the node splitting process, which isperformed by considering a random set of |q| input param-eters as opposed to all of them. The partitioning criterionis still used, but only the |q| randomly selected parametersare considered. In order to classify a new observation Z, itis processed by each of the |d| trees, resulting in |d| outputvariables pd, some of which may be the same. In some sense,each decision tree in the forest votes for an output variableand the one with the most votes is chosen as the locationof the robot. It is important to note that a lot of informationcan be extracted from this voting scheme, since the votescan trivially be converted to the robot’s probability of beingat a particular location, P (p|Z) = Vp/|d|, where Vp is thetotal number of votes received for location p with Cartesiancoordinate Cp. The number of trees |d| encompasses atradeoff between speed and accuracy. We set |d| = 50 afterdetermining that it yields the highest ratio of accuracy to thenumber of trees.

Gaussian Model [11], [15], [2]: the Gaussian modeltechnique, proposed in the robotics literature, attempts tomodel the inherent noise of signal strength readings througha Gaussian distribution. For each location p and for each APa the mean and standard deviation of the signal strengthreadings are computed, yielding µa

p and σap , respectively.

This means that a total of |p| × |a| Gaussian distributionsare calculated, where all µa

p and σap are computed from |s|

values (i.e., the total number of observations performed ateach location). The location, p, of a new observation, Z, isderived using the Gaussian’s probability density function:

arg maxp

|a|∏a=1

1

σap

√2π

exp

(−

(za − µap)2

2(σap)2

)Support Vector Machine [19]: support vector machines

work by constructing a set of hyperplanes in such a waythat they perfectly divide two data classes (i.e., they performbinary classification). Generating the hyperplanes is essen-tially an optimization problem that maximizes the distancebetween the hyperplanes and the nearest training point ofeither class. Although initially designed as a linear classifier,the usage of kernels (e.g., polynomial, Gaussian) enablesnon-linear classification. Since we are looking to divide ourtraining data T into |p| classes, we use a support vectormachine variant that works for multiple classes. We chosethe one-versus-all approach (we empirically determined itwas better than one-versus-one), which consists in building|p| support vector machines that each try to separate one classp from the rest of the other classes. This essentially createsa set of |p| binary classification problems, each of which issolved using the standard support vector machine algorithmwith a Gaussian kernel. Once all the support vector machinesare trained, we can localize given a new observation Z byevaluating its high-dimensional position with respect to thehyperplanes, for each support vector machine p. The classp is chosen by the support vector machine that classifies Zwith the greatest distance from its hyperplanes.

Nearest Neighbor Search [1]: the nearest neighbor searchdoes not require any processing of the training data, T .Instead, the distance between all the points in T and a newobservation Z is computed, yielding |p| × |s| distances Ds

p.The location p of the new observation is then selected withthe formula: arg minp

(Ds

p

). The only necessary decision for

this algorithm involves the choice of one of the numerousproposed distance formulas (e.g., Euclidean, Mahalanobis,City Block, Chebychev, Cosine). We use Euclidean distanceto rigorously follow the algorithm presented in [1].

Multinomial Logit [16]: multinomial logit is an exten-sion of logistic regression allowing multiple classes to beconsidered. Logistic regression separates two classes linearlyby fitting a binomial distribution to the training data (i.e.,one class is set as a “success” and the other class as a“failure”). This generalized linear model technique producesa set of |a| + 1 regression coefficients βi that are used tocalculate the probability of a new observation being a “suc-cess”. Conversely to the multi-class support vector machine,

multinomial logit follows a one-against-one methodologywhere one class pref is trained against each of the remainingclasses. Consequently, |p|−1 logistic regressions are trained,yielding |p| − 1 sets of regression coefficients, βp

i . Onceall the regression coefficients are learned, the probabilityof being at location p given a new observation Z can becalculated as follows:

P (p|Z) =γp

1 +∑|p|−1

j=1 exp(Z · βj), j 6= pref

where γp = exp(Z · βp) when p 6= pref and γp = 1 whenp = pref . The location p can then be selected as the maxi-mum probability by utilizing the formula arg maxp P (p|Z).

B. ResultsThe experimental results of the WiFi localizer cast as

a classification problem greatly influenced our algorithmicdesign choices for the end-to-end algorithm. Therefore, wepresent the results before continuing with the algorithm’sdescription. Specifically, we gathered a large indoor trainingdataset at our university, stopping the robot at pre-determinedlocations approximately 1 meter apart. The entire dataset iscomprised of 156 locations (|p| = 156), 20 signal strengthreadings for each location (|s| = 20), and a total of 48unique APs (|a| = 48). The results shown in this andsubsequent sections are performed by sub-sampling the entiredataset. Specifically, |s| is varied from 1 to 19, essentiallyrepresenting a percentage of the dataset being used for train-ing, and the remaining data is used for classification. Thisprocedure effectively mimics inevitable differences betweendata acquired by the mapper and localizers. Moreover, thetraining and classification data are randomly sampled 50different times for each experiment in order to remove anypotential bias from a single sample. Presented results areaverages of those 50 samples and we omit error bars in ourgraphs since the results’ standard deviation were all similarand insignificant. We note, once again, that this processdoes not represent a real-world scenario since it assumesthe mapper and localizers follow the exact same path. Itprovides, however, a very good platform for comparing thesix presented classification algorithms in Section IV-A.

We start by analyzing each algorithm’s accuracy as afunction of the number of readings taken at each location,|s|, a plot of which is shown in Figure 2. Since our goal isto ultimately be able to map unknown environments in real-time, the initial data acquisition performed by the mapperneeds to be efficient. It takes on average 411ms to get signalstrength readings from all the APs in range so we want tominimize |s|. As expected, Figure 2 shows a tradeoff betweenthe algorithms’ localization accuracy and the number ofreadings used during the training phase. Since we want tolimit the time it takes to gather the training data, setting thenumber of readings per location to 3 (|s| = 3) provides agood compromise between speed and accuracy, especiallysince the graph shows an horizontal asymptote starting at oraround that point for the best algorithms. As such, the restof the results presented in the paper will be performed using3 readings per location.

1 2 3 4 5 6 7 8

0.5

1.5

2.5

3.5

4.5

Readings per Location

Ave

rage

Cla

ssifi

catio

n Er

ror (

m) Decision Tree

Random ForestGaussian ModelSupport Vector MachineNearest Neighbor SearchMultinomial Logit

Fig. 2. Average classification error, in meters, for each classificationalgorithm as a function of increasing number of readings per location (|s|).

On average, when |s| = 3, the Random Forest’s classi-fication error is 43.42%, 55.47%, and 57.23% better thanthe previously published WiFi localization algorithms (Gaus-sian Model, Support Vector Machine, and Nearest Neigh-bor Search, respectively). Figure 3 shows the cumulativeprobability of classifying a location within the error marginindicated on the x-axis. This figure corroborates the findingsof Figure 2, showing that the best algorithm is the RandomForest, which had never been exploited in the context ofWiFi localization. Moreover, the Random Forest can localizea new observation, Z, to the exact location (i.e., zero marginof error) 88.58% of the time.

0 1 2 3 4 5 640

50

60

70

80

90

100

Classification Error (m)

Cum

ulat

ive

Prob

abili

ty (%

)

Decision TreeRandom ForestGaussian ModelSupport Vector MachineNearest Neighbor SearchMultinomial Logit

Fig. 3. Cumulative probability of correctly localizing as a function ofincreasing error margin, for each classification algorithm.

In addition to our dataset, we use a publicly availabledataset provided by Ladd et al. [15], with data covering 3floors of the computer science building at Rice University.We process the Rice dataset in the exact same manner asours, dividing it in 50 random samples of training andclassification data for |s| = 3. The average error for eachalgorithm and each dataset is shown in Figure 4. There area couple of interesting observations that can be made fromthese results. Firstly, the trend is the same for both datasets.In other words, the list of the algorithms from best to worstaverage accuracy (i.e., Random Forest, Gaussian Model,Support Vector Machine, Nearest Neighbor Search, DecisionTree, and Multinomial Logit) is the same regardless of which

dataset is used. Secondly, the Rice dataset uses signal tointerference ratios for their observations as opposed to signalstrengths. Although both measures are loosely related, thisindicates that the classification algorithms are both generaland robust. These important findings indicate that the meth-ods proposed are data- and environment-independent and thatwe are not over fitting our particular dataset. Please note thatthe overall higher average classification error observed inthe Rice dataset comes from sparser location samples: 3.33meters on average as opposed to 0.91 meters for our dataset.

0

2

4

6

8

10

12

14

DatasetA

vera

ge C

lass

ifica

tion

Erro

r (m

)ytisrevinU eciRdecreM CU

Decision TreeRandom ForestGaussian ModelSupport Vector MachineNearest Neighbor SearchMultinomial Logit

Fig. 4. Average classification error, in meters, for each classificationalgorithm executed on the UC Merced and Rice University datasets.

We conclude this section by mentioning that, when record-ing 3 readings per location, all of the algorithms can betrained in less than 30 seconds, apart from the Support VectorMachine that takes 90 seconds. In addition, the trainingprocess can be parallelized for all the algorithms, addinga potential decrease in the computation of the WiFi map.Put differently, once the mapper has acquired its trainingdata, the map can be built in less than 30 seconds, whichis very fast relative to the time it takes to explore theenvironment. Once the WiFi map is created and given to thelocalizers, they can localize in less than 200ms. Needlessto say that the presented algorithms are very effective, froma computational standpoint, even in unknown environmentswhere map building needs to be performed in real-time.

V. REGRESSION

A. Description

Although the results of the classification algorithms arevery encouraging, they do not depict a real world scenario,where locations explored by the mapper, upon which theWiFi map will be created, will surely be different thanthose explored by the localizers. Consequently, we re-castthe WiFi localizer as a regression problem, where someinference is performed to generate Cartesian coordinates forlocations that are not encompassed in the training data.Although typical off-the-shelf regression algorithms (e.g.,Neural Network, Radial Basis Functions, Support VectorRegression) might seem like a good choice initially, theyget corrupted by the nature of our training data, the majorityof which depicts unseen APs (i.e., zap = −100). In addition,it would be wise to exploit the good results exhibited by the

classification algorithms. We consequently design our ownregression algorithm that builds upon the best classificationalgorithm: the Random Forest. In addition to providing thebest classification results, the Random Forest is appealingdue to its voting scheme, which can be interpreted as P (p|Z),for each p ∈ T .

Our regression algorithm is based on a Gaussian MixtureModel (GMM) [17] described in Algorithm 1. The GMM isintroduced to non-linearly propagate, in the two-dimensionalCartesian space, results acquired from the Random Forest.More specifically, we build a GMM comprised of |p| mixturecomponents (line 1). Each mixture component is constructedfrom a Gaussian distribution with a mean µ(p) (line 2),corresponding to the Cartesian coordinates in the trainingdata, covariance Σ(p) (line 3), and mixture weights φ(p)that are acquired directly from the Random Forest’s votingscheme (line 4). Line 4 highlights the key difference betweenthe classification and regression methodologies, which arisesfrom the fact that taking the mode of the Random Forest’sresults, as is done in classification, discards valuable infor-mation that is instead exploited in our regression algorithm.In some sense, the mixture weights are proportional to theRandom Forest’s belief of being at location p given theobservation Z (i.e., φ(p) = P (p|Z)).

Algorithm 1 Construct-GMM(Z)1: for p← 1 to |p| do2: µ(p)← Cp

3: Σ(p)← σ2I4: φ(p)← P (p|Z)←Random-Forest-Predict(Z)5: end for6: return gmm←Build-GMM(µ, Σ, φ)

Algorithm 2 provides pseudo-code for the rest of theregression algorithm. Once the GMM is built (line 1), acouple of approaches are available, the most popular ofwhich consists in taking the weighted mean of the model ordrawing samples from the GMM with probability φ(p) andcomputing the samples’ weighted mean. Instead of samplingfrom the GMM, our regression algorithm uses a k NearestNeighbor Search (line 2) to provide k samples that are notonly dependent on the observation Z, but also come froma different model than the Random Forest. This is a crucialstep that adds robustness to the algorithm by combining twoof the presented classification algorithms. In other words,where and when one algorithm might fail, the other mightsucceed. The choice of the Nearest Neighbor Search for thisstep (as opposed to the Gaussian Model or the Support VectorMachine) comes from the fact that |s| times more samplescan be drawn from it (a total of |s| × |p|, as opposed to|p|). The returned Cartesian coordinate (line 3) is finallycalculated as the weighted mean of the k Nearest Neighbors,where the weight of each Nearest Neighbor is set from theProbability Distribution Function (PDF) of the GMM.

The entire regression algorithm requires two parametersto be set, σ (Algorithm 1) and k (Algorithm 2). σ dictates

Algorithm 2 Regression(T , Z)1: gmm← Construct-GMM(Z) // See Algorithm 12: nn←k-NN(T , Z, k)

3: C ←∑k

i=1 nni × PDF(gmm,nni)∑ki=1 PDF(gmm,nni)

4: return C

how much the Gaussian components influence each other andshould be approximately set to the distance between WiFireadings in the training data. k should be as high as possiblein order to provide a lot of samples, yet low enough notto incorporate “neighbors” that are too far away. We havefound setting k to 25% of all the observations in T (i.e.,k = 0.25× |s| × |p|) to be a good solution for this tradeoff.

Given a new observation Z, the regression essentiallymaps a three-dimensional surface to the X-Y Cartesianspace of the environment, where the higher the surface’s Z-value the more probable the X-Y location. A representativesnapshot of the process is shown in Figure 5, highlightingan important behavior of the presented regression algorithm.Indeed, the algorithm is not only capable of generatingCartesian coordinates that were not part of the initial trainingdata, but also takes into consideration neighboring votesfrom the Random Forest classification. In the figure, thehighest Random Forest vote (represented by ‘*’) is somewhatisolated and, consequently, its neighbors do not contribute tothe overall surface. The region close to the actual robot’slocation (denoted by ‘+’), however, is comprised of manylocal neighbors whose combined votes outweigh those of theRandom Forest classification, resulting in a better regressionestimation (symbolized by ‘x’).

Fig. 5. Regression example, showing the PDF of the GMM overlaid ontop of a section the environment’s map. The markers +, x, and * representthe ground truth, regression, and random forest classification locations,respectively.

B. Results

In order to evaluate the accuracy of the regression algo-rithm, it is necessary to gather new data that will mimic alocalizer exploring the environment and localizing it withthe regression algorithm trained on T . Consequently, weperform 10 new runs (Ni), assuming no prior knowledge ofthe locations covered by T . In other words, while the mapperdata (T ) and the localizer data (N ) approximately cover

the same environment, they are not acquired at the sameCartesian coordinates. As such, this experiment providesa direct measure of the regression algorithm’s strength.For evaluation purposes, we manually record the robot’sground truth position at discrete locations so that we canquantitatively evaluate the regression algorithm. Figure 6shows the average accuracy (from the 10 runs and 50random samples used for training) of the 4 best classificationalgorithms compared against the regression algorithm. Notsurprisingly, the regression algorithm works very well byoutperforming the Random Forest, Gaussian Model, SupportVector Machine, and Nearest Neighbor Search classificationalgorithms by 29.77%, 37.95%, 34.07%, and 28.23%, re-spectively. We conclude this section by mentioning that theregression algorithm only takes 131ms to localize.

1.5

2

2.5

3

Ave

rage

Reg

ress

ion

Erro

r (m

)

RandomForest

GaussianModel

SupportVector

Machine

NearestNeighborSearch

Regression

Fig. 6. Average regression error, in meters, comparing the 4 bestclassification algorithms (left) and the regression algorithm (right).

VI. MONTE CARLO LOCALIZATION

A. Description

Although the regression algorithm clearly improves theWiFi localizer’s accuracy for real-world scenarios, furtherimprovements can be obtained by taking into account thespatial and temporal constraints implicitly imposed by robotmotion. In other words, the classification and regression algo-rithms discussed so far solve the localization problem with-out taking into consideration the robot’s previous location,or, more precisely the probability distribution of the robot’sprevious location. Specifically, we use an MCL algorithm,built from the “standard” implementation presented in [18],with a couple of modifications. The motion model, whichexploits the translational and rotational velocities of therobot, is exactly the same as in [18]. The measurement modelis, evidently, different since it needs to take into account theaforementioned WiFi localizer regression. The pseudo-codeis presented in Algorithm 3 and shows the effectiveness ofour GMM implementation (line 1), which can seamlesslytransition from regression to MCL. Indeed, the measurementmodel needs to assign each particle (line 2) the probability ofbeing at the particle’s state given the sensor measurement Z.Thanks to the GMM, the particle’s weight is easily retrievedby using the PDF (line 3).

Algorithm 3 Measurement-Model(Z)1: gmm← Construct-GMM(Z) // See Algorithm 12: for m← 1 to |m| do3: m.weight ← PDF(gmm, m.state(X ,Y ))4: end for

In addition to the measurement model, we modify theparticles’ initialization procedure. Instead of randomly oruniformly sampling the particles’ state and giving eachparticle an equal weight, we force the robot to perform ameasurement reading, Z, and initialize the particles usingthat measurement, as shown in Algorithm 4. We start byconstructing the GMM (line 1) and, for each particle in ourfilter (line 2), sample a data point from the GMM that willserve as the particle’s X ,Y state (line 3). Since WiFi signalstrengths cannot infer the robot’s rotation, we randomlysample the particle’s θ state (line 4). The particles’ weightis calculated from the GMM’s PDF (line 5). In order to addrobustness to the previously mentioned problem with robots’rotations, we pick the |v| best particles (line 7) and add ynew particles that are the same as v but have y different θstates each randomly sampled between 0 and 2π (line 8).Since we have augmented the total number of particles, wefinalize our particle initialization by keeping the best |m|particles (line 9).

Algorithm 4 Initialize-Particles(Z)1: gmm← Construct-GMM(Z) // See Algorithm 12: for m← 1 to |m| do3: m.state(X ,Y ) ← sample(gmm)4: m.state(θ) ← rand(0 to 2π)5: m.weight ← PDF(gmm, m.state(X ,Y ))6: end for7: for all m with the top |v| weights do8: Add y particles ny where

ny ← m, andny .state(θ) ← randy(0 to 2π)

9: end for10: Select the |m| particles with highest weights

B. Results

In order to evaluate the MCL, we take advantage ofthe same experimental data used for the regression results.Namely, the algorithm is trained on T and we localize therobot for each of the 10 Ni additional robot runs. TheMCL updates 5000 particles (i.e., |m| = 5000), where thecomputation time takes, on average, 2.1ms and 145.6msfor the motion model sampling and measurement model,respectively. In order for the MCL to output a Cartesiancoordinate, we take the weighted average of all particles,the result of which can be checked against ground truth.Due to the random nature of the MCL, each of the 10 runsare localized 100 different times. The average mean for the100 trials of all 10 runs is 0.61 meters, with a low averagestandard deviation of 0.0049m. Comparing these results to

an offline LRF SLAM algorithm [10], which produces anaverage error of 0.27m, we observe a compelling tradeoffbetween accuracy and sensor payload (or price) when usingthe proposed WiFi localizer. An illustrative example of apartial run is shown in Figure 7. We encourage readersto watch the accompanied video, which visually shows theoutcome of the MCL, running on a robot, for every iteration.Interested readers should also visit our website1, where theywill have access to the raw data, trained classifiers, and codepresented in this paper.

X (m)

Y (m

)

0 5 10 15 20

0

5

10

15

20

SLAMMCLGround Truth

Fig. 7. Localized run showing the output of LRF SLAM (straight line),the WiFi localizer (dotted line), and Ground Truth (circles).

VII. CONCLUSIONS

We have presented an hybrid algorithm, mixing classifi-cation and regression methods, capable of localizing, withsub-meter accuracy, robots with a minimal sensor payloadconsisting of a WiFi card and odometry. In fact, the end-to-end WiFi localizer not only favorably compares to previouslypublished algorithms, but its performance is also competitivewith the full LFR SLAM solution. The evaluation andcomparison of our WiFi localizer against both publishedand new algorithms, along with experiments performed ondifferent datasets, provide compelling evidence regarding therobustness of our approach. Moreover, the algorithm is fastenough to ensure real-time mapping and localization.

There are a few of interesting future directions that are tobe considered. First, we would like to modify our mappingalgorithm so that it can be built incrementally, allowingthe mapper to provide the localizers with partial maps or

1https://robotics.ucmerced.edu/Robotics/IROS2012/

sections of the environment. Random Forests would stillbe a viable option in that scenario, although they wouldrequire some modifications to increase the number of treeswhen new data is available and to prune older trees that,over time, will not encompass enough information. Second,we believe that WiFi localization, thanks to its low sensorrequirement commonly found in robots, would be a greatmiddle-layer to merge heterogeneous maps together. Indeed,WiFi localization could solve, for example, the difficultproblem of merging a Cartesian grid map with a topologicalor visual map. Finally, if needed, the algorithms presentedcan be made faster by parallelizing the training process.

REFERENCES

[1] P. Bahl and V. Padmanabhan. RADAR: An in-building RF-based userlocation and tracking system. In IEEE Int. Conference on ComputerCommunications, pages 775–784, 2000.

[2] J. Biswas and M. Veloso. WiFi localization and navigation forautonomous indoor mobile robots. In IEEE International Conferenceon Robotics and Automation, pages 4379–4384, 2010.

[3] W. Braun and U. Dersch. A physical mobile radio channel model.IEEE Transactions on Vehicular Technology, 40(2):472–482, 1991.

[4] L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.[5] L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and

Regression Trees. CRC Press, Boca Raton, FL, 1984.[6] F. Duvallet and A. Tews. WiFi position estimation in industrial

environments using Gaussian processes. In IEEE/RSJ Int. Conferenceon Intelligent Robots and Systems, pages 2216–2221, 2008.

[7] B. Ferris, D. Fox, and N. Lawrence. WiFi-SLAM using Gaussianprocess latent variable models. In Int. Joint Conferences on ArtificialIntelligence, pages 2480–2485, 2007.

[8] J. Fink, N. Michael, A. Kushleyev, and V. Kumar. Experimental char-acterization of radio signal propagation in indoor environments withapplication to estimation and control. In IEEE/RSJ Int. Conference onIntelligent Robots and Systems, pages 2834–2839, 2009.

[9] A. Goldsmith. Wireless Communications. Cambridge Press, 2005.[10] G. Grisetti, C. Stachniss, and W. Burgard. Improved techniques for

grid mapping with rao-blackwellized particle filters. IEEE Transac-tions on Robotics, 23(1):34–46, 2006.

[11] A. Howard, S. Siddiqi, and G. Sukhatme. An experimental study oflocalization using wireless ethernet. In Int. Conference on Field andService Robotics, 2003.

[12] J. Huang, D. Millman, M. Quigley, D. Stavens, S. Thrun, and A. Ag-garwal. Efficient, generalized indoor WiFi GraphSLAM. In IEEE Int.Conference on Robotics and Automation, pages 1038–1043, 2011.

[13] Wild Packets Inc. Converting signal strength percentage to dBmvalues. Technical Report 20021217-M-WP007, 2002.

[14] D. Koutsonikolas, S. Das, Y. Hu, Y. Lu, and C. Lee. CoCoA:Coordinated cooperative localization for mobile multi-robot ad hocnetworks. Ad Hoc & Sensor Wireless Networks, 3(4):331–352, 2007.

[15] A. Ladd, K. Bekris, A. Rudys, D. Wallach, and L. Kavraki. On thefeasibility of using wireless ethernet for indoor localization. IEEETransactions on Robotics and Automation, 20(3):555–559, 2004.

[16] P. McCullagh and J. Nelder. Generalized Linear Models. Chapman& Hall, New York, NY, 1990.

[17] G. McLachlan and D. Peel. Finite Mixture Models. John Wiley &Sons, Inc., Hoboken, NJ, 2000.

[18] S. Thrun, W. Burgard, and D. Fox. Probabilistic Robotics, chapter5,8. The MIT Press, Cambridge, Massachusetts, 2005.

[19] D. Tran and T. Nguyen. Localization in wireless sensor networksbased on support vector machines. IEEE Transactions on Paralleland Distributed Systems, 19(7):981–994, 2008.

[20] M. Youssef, A. Agrawala, A. Shankar, and S. Noh. A probabilisticclustering-based indoor location determination system. TechnicalReport CS-TR-4350, University of Maryland, College Park, 2002.

[21] S. Zickler and M. Veloso. RSS-based relative localization and tetheringfor moving robots in unknown environments. In IEEE Int. Conferenceon Robotics and Automation, pages 5466–5471, 2010.

Documents

Combining Classiﬁcation and Regression for WiFi …robotics.ucmerced.edu/sites/robotics.ucmerced.edu/files/...X (m) Y (m) 6 10 14 18 16 18 20 Fig. 1. Illustrative representation