Prediction of Taxi Destinations Using a Novel Data ...static.tongtianta.site/paper_pdf/1fe105e4-67c7-11e9-a1f6...Meanwhile, real-time prediction of trip destina-tions could also be

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 1

Prediction of Taxi Destinations Using a Novel DataEmbedding Method and Ensemble Learning

Xiaocai Zhang , Zhixun Zhao, Yi Zheng, and Jinyan Li

Abstract— The accurate and timely destination prediction oftaxis is of great importance for location-based service applica-tions. Over the last few decades, the popularization of vehiclenavigation systems has brought the era of big data to the taxiindustry. Existing destination prediction approaches are mainlybased on various Markov chain models or trip matching ideas,which require geographical information and may encounter theproblem of data sparsity. Other machine learning predictionmodels are still unsatisfactory in providing favorable results.In this paper, first, we propose use of a novel and efficient dataembedding method for time-related feature pre-processing. Thekey idea behind this is to embed the data into a two-dimensionalspace before feature selection. Second, we propose use of anovel data-driven ensemble learning approach for destinationprediction. This approach combines the respective superiorities ofsupport vector regression and deep learning at different segmentsof the whole trajectory. Our experiments are conducted ontwo real data sets to demonstrate that the proposed ensemblelearning model can get superior performance for taxi destinationprediction. Comparisons also confirm the effectiveness of theproposed data embedding method in the deep learning model.

Index Terms— Taxi, destination prediction, support vectorregression (SVR), deep learning, ensemble learning.

I. INTRODUCTION

TAXI plays an important role in modern transport systemall over the world, especially in large urban cities [1]–[3].

Over the past few decades, global positioning system (GPS)has been widely used in a rapid increasing number of applica-tions, such as vehicle based navigation system or smartphonebased navigation system, which are broadly operated in theroad networks of cities. Taxi drivers use the GPS basednavigation system to help them to reach unfamiliar locations.With the GPS receiver embedded in the vehicle navigationdevice or smartphone of taxi driver or rider, the past locationdata could be recorded. Such a huge amount of movement datacan be utilized in plentiful location-based services (LBSs) [4],such as targeted advertising, scenic spots recommending, rout-ing, online traffic conditions broadcasting and location-based

Manuscript received December 7, 2017; revised May 7, 2018 andJuly 3, 2018; accepted December 14, 2018. This work was supported bythe China Scholarship Council. The Associate Editor for this paper wasP. Ioannou. (Corresponding author: Jinyan Li.)

The authors are with the Advanced Analytics Institute, Faculty of Engi-neering and Information Technology, University of Technology Sydney,Ultimo, NSW 2007, Australia (e-mail: [email protected];[email protected]; [email protected]; [email protected]).

Digital Object Identifier 10.1109/TITS.2018.2888587

social networks which can be implemented in the recommen-dation systems.

The term of LBSs denotes applications that integrate geo-graphic locations with the general notion of services [5]. Withthe accurate geographic information provided with navigationsystems, usage of various LBSs has become an importantpart of people’s daily life [6]. According to a survey ofemployees at a large software company, 68% of the searchershappened often in the transits, while 39% of them want toget the information related to their destinations or near thesedestinations and 12% want the information on the route to theirdestinations [7]. In the industry of taxi, several applications canbenefit from accurate trip destination prediction. For example,application of targeted advertisement, associated with the nearfuture drop-off location of passengers during each trip, suchas shopping, restaurants or hotels recommending, can beachieved within recommendation systems. Such strategies arealso applicable for sightseeing places recommendation [8].Comparing with the existing advertising mode in the taxiindustry, there are significance of pertinence as well as high-efficiency. Meanwhile, real-time prediction of trip destina-tions could also be useful to taxi booking platforms likeUber or DiDi in the situation that the users change theirpreset drop-off locations during trips. This research methodcould also be significant in other fields, such as personaltrip destination prediction. Furthermore, in maritime transport,predicting estimated time of arrival of vessels could be alsoachieved, which is usually set manually with a speed inautomatic identification system by seafarer before departureand it is not fully reliable [9].

In this paper, we attempt to unite two machine learningmethods for destination prediction that learns features amonglimited prior knowledge. More specifically, an ensemble learn-ing model (ELM) based on support vector regression (SVR)and deep learning (deep belief network [10]) is proposed.In general, these two models perform better than others atdifferent segments of the whole trajectory. For the architecturein deep learning, we propose to use a novel data embeddingtechnique called circular fuzzy embedding (CFE) for featuresrepresentation, which maps high-dimensional data into a two-dimensional plane. Finally, experiments conducted on twoindependent and real-world data sets show that our proposedensemble learning model for destination prediction has supe-rior performance comparing with the existing methods.

The rest of this paper is structured as follows. Section II rev-iews the related work on the studies of destination prediction.

1524-9050 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

https://orcid.org/0000-0002-3783-6560

https://orcid.org/0000-0003-1833-7413


2 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

In Section III, we elaborate the method of CFE and ELM fordestination prediction. Section IV presents the experiments andanalysis of the result. Section V concludes this paper.

II. RELATED WORK

The analysis of taxi trajectory datasets has been consideredby several researches in the subjects of data mining, machinelearning and intelligent transportation systems. Recommenda-tion and prediction systems are part among theses popular top-ics. Liu and Qu [11] propose a dynamic congestion conditionsprediction model using topic-aware Gaussian process, thenadaptive routing recommendation algorithm can be applied.Ding et al. [2] develop a taxi HUNTS system for trajectoryrecommendation for taxi driver to enhance passenger pick-up probability and profit. Liu and Wang [12] develop acommunity detection technique based on mobility trajectory,then an online recommendation method based on trajectorycommunity is proposed to improve service.

Location prediction of a mobile agent has been the tra-ditional domain in the trajectory analysis. A common andsimple approach is trip matching, if an on-going trip (querytrip) matches part of a popular trajectory from the historicaltrajectories, then the destination of this popular route will bepredicted as the destination of the query trip [8]. Trip matchingmechanism may not be efficient within big historical data.Other common models are based on various Markov (hid-den Markov) chain models [13]–[17]. Ashbrook and Starnerintroduced Markov chain model to predict the most likelynext location firstly in 2003. This model consists of nodes,each node represents a location, which are used as the statesof Markov process. Then, the probabilities can be derivedfrom the mass volume of historical locations which havebeen visited. The transition between two states represents theprobability of the user travelling between these two locations,which can be trained through the historical trajectories [13].Simmons et al. [15] use a hidden Markov chain model (HMM)to predict the route and destination of the driver based onan on-line observation of their GPS position. Variants ofMarkov chain model [18] and HMM [16] are also given topredict the next location or route of the drivers based ontheir past GPS log data. However, those approaches alwaysneed to associate with extra geographical information, suchas map database, which is to provide road graph consistingof road intersections and linking between intersections [15].Sometimes they may lead to data sparsity problem in practiceas the historical trajectory data can’t cover all possible querytrajectories [8], i.e., the query trajectory doesn’t match anyhistorical trajectory or the probability of the transition betweentwo locations approximates zero. Except for trip matching andMarkov chain based models, Le et al. [19] apply algorithmicgame theory and statistical learning theory to predict bundlesof locations to visit from reference points data, which isgenerated via trajectory clustering and hidden Markov model.

In addition, some machine learning algorithms have alsobeen applied in location prediction to detect the flow patternswithin certain cities. Artificial neural network with shallowstructure is applied in taxi destination prediction [20]. Theinput layer of this model are the initial and last points of

Fig. 1. Circle constructed for feature of hour of day.

the historical trajectory prefix integrated with some meta-data,such as client ID, taxi ID, stand ID and time information.The output layer are the clusters of corresponding destinations.However, such model might not be applicable for some certainurban cities, as some features, such as client ID or taxi standID may not be recorded in taxi operation systems. In addition,several machine learning models like decision tree [21], [22],bootstrapped decision tree, decision tree with pruning [21],naive Bayes [22] and reinforcement learning [23] have alsobeen applied in location prediction.

III. METHODOLOGY

In this section, we first introduce a novel data embeddingtechnique called circular fuzzy embedding (CFE) for repre-senting time-related feature before feature learning. It mapshigher dimensional feature into a two-dimensional plane,and fuzzy membership is introduced to avoid the instabil-ity between adjacent sections. Then, an ensemble learningmodel (ELM) based on SVR and deep belief network (DBN)for destination prediction is proposed. SVR is an efficientsupervised learning method that has been applied widely inpattern recognition. The basic principle is to transform thetraining data from the input space into a hyper feature space,which can be done with a transformation function and then findan optimal regression function by minimizing the regressionrisk [24], [25]. DBN is a stack of restricted Boltzmannmachine, each one has one layer of hidden units and one layerof visible units, where unsupervised pre-training is employedbefore fine turning [26], [27]. A restricted Boltzmann machineis an undirected graphical model with visible units connectedwith hidden unit using undirected weighted connections [28],while there is no visible-visible units or hidden-hidden unitsconnection [29].

A. Circular Fuzzy Embedding (CFE)

In the transportation field, in general some types of time-related data are discrete, such as date, days of the week, daytype. For discrete data processing method in machine learning,


ZHANG et al.: PREDICTION OF TAXI DESTINATIONS USING A NOVEL DATA EMBEDDING METHOD AND ENSEMBLE LEARNING 3

Fig. 2. (a) Circle constructed for feature of day type. (b) Circle constructed for feature of week of year.

the most common one is One-hot embedding technique, whichconverts discrete features into binary vectors. For example,supposing we have a three-categorical feature comprisingof “Holiday”, “Weekday” and “Weekend”, the observationcontaining “Holiday” can be converted into binary vector of(1, 0, 0). Similarly, (0, 1, 0) and (0, 0, 1) correspond to cate-gories of “Weekday” and “Weekend”, respectively. However,such embedding technique has some certain drawbacks:

1) May lead to data sparsity and curse of dimensionality[30];

2) Occupy large memory usage if the size of category ishuge;

3) Slow down training of network with large number ofcategory;

4) Consider little about similarities between observations.As it turns into binary vector, the similarities betweeneither two observations are the same.

To avoid such problems, we develop CFE technique fortime-related date embedding. The inspiration of CFE techniquecomes from Word2vec, which is a model used to generateword embeddings [31].

The first step is to construct a circle C centred on (0, 0)to embed all categories, where C ⊂ R

2. The reason for con-structing a circle is because it not only represents the uniqueidentity but also measures the similarities between them easily.However, for those continuous variables, we convert them intodiscrete sections. For instance, we embed variable of hourof day on a circle averagely. Firstly, we divide a whole day(24 hours) into 12 discrete and disjoint ranges, from (23, 1] to(21, 23], which is demonstrated in Fig. 1. Supposing the radiusdenotes R, the radian from each category to axes X representsθ, then it could be mapped into a two-dimensional space.Compared with twelve-dimensional space when applying One-hot embedding, it reduces the dimensionality significantly.In addition, it also represents the similarities between eachcategory. In the field of transportation, traffic in adjacent time

sectors are more likely to have similar patterns [32], [33].For example, in Fig. 1, the travelling pattern in the range of(23, 1] is more likely to be similar with (1, 3] or (21, 23],contrasted with that in range (11, 13]. Similarly, as shownin Fig. 2, we embed features of day type and week of year(52 weeks) on other two circles, respectively. In Fig. 2 (a),we address the issue that the differences between categoriesof “Holiday”, “Weekday” and “Weekend” are not the same.

As shown in Fig. 1, different sectors of time are embeddedinto two-dimensional space with unique vectors. However,time around the bounds of the adjacent sectors may beembedded with quite different vectors, while there may belittle difference between them in fact. For instance, time range(1, 5] is divided into two sectors, (1, 3] and (3, 5], each withtwo-hour interval. Sector (1, 3] associates with the embeddingvector of Q2 and sector (3, 5] with Q3,

Q2 = (R cos(π/3), R sin(π/3)) (1)

Q3 = (R cos(π/6), R sin(π/6)) (2)

Supposing we have time points of 02:59:00 and 03:01:00,they will be embedded into two totally different vectorswithin different sectors. However, they are almost the same.In order to avoid such unstable situation, we introduce themembership function in fuzzy set theory [34]. The membershipfunction (FV ) of hour of day (h) is illustrated in Fig. 3, V1,V2, . . ., V12 denote the time sectors (21,1], (1,3], . . ., (21,23],respectively.

The equations of membership function (FV1 , FV2) for timesectors (23, 1] and (1, 3] can be written as

FV1 =

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

h − 22.5, 22.5 ≤ h < 23.51, 0 ≤ h < 0.5 or 23.5 ≤ h < 241.5 − h, 0.5 ≤ h < 1.50, others

(3)



Fig. 3. Linear combined membership function of hour of day feature.

FV2 =

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

h − 0.5, 0.5 ≤ h < 1.51, 1.5 ≤ h < 2.53.5 − h, 2.5 ≤ h < 3.50, others

(4)

In the same way, the membership functions for the restsectors can be derived, which will not be listed in this paper.Then, we can get the final embedding vectors (B1(h)) of thehour of day feature,

B1(h) =12∑

i=1

FVi(h) · Qi (5)

where Qi represents the ith embedding vector before intro-ducing fuzzy membership, FVi(h) denotes the membershipof time point h corresponding to Vi. With the CFE methoddescribed above, we can also get the embedding vectors ofday type feature B2, and week of year feature B3 can also bederived.

B. Ensemble Learning Model (ELM)

The key idea of ELM is to construct a knowledge base,and apply different models in the knowledge base underdifferent conditions to get superior prediction. In this paper,the knowledge base contains models of SVR and DBN, as theyperform better than other machine learning models in differentproportions of the whole trajectory of taxi. Specifically, it getsbetter prediction result with DBN (classification) when thetaxi is currently located in the initial part of the whole trip,otherwise, SVR does better. The elaborate experimental resultscan be referred to Fig. 7, table IV and table V. Therefore,the key of ensemble turns into how to detect which proportionof the whole on-going trajectory the taxi is currently locatedat. So that the best model could be allocated accurately.

Fig. 4 illustrates the elaborate description of ELM. DefineX and Y as the input trajectory data and targets for trainingmodels of SVR and DBN. X∗ denotes the input trajectoriesfor prediction and O is the predicted output. Step 2 trainseach model with parameters θSV R and θDBN , respectively.Steps from 3 to 13 is the key component of our proposedELM, it constructs the training data and corresponding labelsfor segment estimation classifier, which provides basis formodel selection. From step 3 to 8, the input trajectory prefixX is extracted with the prior proportions (from 10% to 95%,with increment of 5%) of the whole trajectory. Steps from9 to 11 set the corresponding target Label = 1 only if

the prior proportion of the extracted trajectory exceeds λ,which is a constant parameter based on the performance ofSVR and DBN, otherwise, Label = 0. Steps from 14 to19 elaborate the mechanism of proposed ELM. In step 15,a lazy supervised learning approach k-nearest neighbor (kNN)algorithm is applied to predict the segment proportions of thecurrent position with query trajectory X∗, as it is simple andperforms good with large training data. Finally, SVR and DBNin the knowledge base can be applied based on the valueof ρ. Fig. 5 gives the brief process of ELM.

C. Input and Output

As ELM is derived from SVR and DBN, we need todetermine the inputs and outputs for training each model. Theinputs of SVR model are the initial m points and last m points(except the final destination) of the trajectory prefix, whichgive us a total of 2m points or 4m numerical values. When theprefix of the trajectory contains less than 2m points, overlapthe initial and last m points. When the prefix contains less thanm points, repeat the first or last point. The outputs include thepredicted longitudinal value and latitudinal value, which infact acts as the function of regression.

Fig. 6 shows the inputs and outputs of the constructed modelof DBN. Firstly, we apply k-means clustering algorithm topartition trip destinations of the training data into n clusters,denote the centre of the ith cluster as ci, 1 ≤ i ≤ n. The inputsof DBN model are 2m points of trajectory prefix integratedwith time-related embedding vectors B1 ∼ B3, which has beenderived with our proposed CFE technique. The outputs are theprobabilities pi corresponding to the ith cluster, which can beimplemented with a Softmax layer on the top layer,

pi =exp(ei)

∑nj=1 exp(ej)

(6)

where ei is the ith activation of the previous layer. Then,the predicted destination can be calculated with Eq. (7).

y =n∑

i=1

pici (7)

IV. EXPERIMENTS AND RESULTS

To validate the effectiveness of our proposed ensemblelearning model, we conduct experiments on two real dataset: the Porto data set and the Chengdu data set. Our exper-iments were carried out on a sever with Intel Xeon CPU



Fig. 4. Pseudo code of ELM.

E5-2680 v2 of 2.8GHz. Some models are subdivided intoclassification and regression with the postfix of “C” and “R”for identity, respectively. To examine the effects of the pro-posed ELM within different segments of the whole trajectory,we extract the initial 10% ∼ 90% (with increment of 10%) ofthe whole trajectories for both validation and testing sets.

A. Data Sets

1) The Porto Data Set: It is a real-world large-scale dataset of taxis operating in Porto, Portugal [35]. It was col-lected from 442 taxis running from 1st July 2013 to 30thJune 2014. Each observation contains a list of GPS coordinateswith longitude and latitude, corresponding timestamp andday type (holiday, workday or weekend). The last item ofthe list represents the destination of this trajectory whilethe first one corresponds to this trip’s pickup location.The time interval between the adjacent GPS coordinatesis 15 seconds.

Fig. 5. Flow chart of ELM.

Fig. 6. Architecture of DBN model.

TABLE I

SIZES OF TRAINING, TESTING AND VALIDATION SETS

2) The Chengdu Data Set: This data set is also a real-world set collected from more than 14 thousand taxis inthe city of Chengdu, China [36]. The period is from 3rd to30th August 2014. Each observation comprises taxi identity,GPS coordinates, activity (carrying passenger or not) andcorresponding timestamp. Every GPS coordinate of taxi isrecorded every 10 seconds.

B. Indices of Evaluation

To evaluate the performance of the proposed models, we usethese indices: mean absolute error (MAE) and root meansquare error (RMSE). On top of that, for the overall perfor-mance evaluation, we define the average MAE (AMAE) and



Fig. 7. Determination of λ with validation sets. (a) On the Porto data set. (b) On the Chengdu data set.

Fig. 8. Network error with epoch in DBN. (a) On the Porto data set. (b) On the Chengdu data set.

average RMSE (ARMSE). These indexes are defined as

dij

= 2r arcsin(

√

sin2(ϕij−ϕi

2)+ cosϕij cosϕi sin2(

λij−λi

2))

(8)

MAEj =1n

n∑

i=1

dij (9)

RMSEj =

√√√√ 1

n

n∑

i=1

d2ij (10)

AMAE =

∑mj=1 MAEj

m(11)

ARMSE =

∑mj=1 RMSEj

m(12)

where dij denotes the Haversine distance between the pre-dicted GPS coordinate (ϕij , λij) and real coordinate (ϕi, λi).r is the radius of the earth sphere, we set r = 6371 km.

C. Experimental Settings

1) Data Pre-Processing: As the time interval in theChengdu data set is only 10 seconds, we convert it into 20 sec-onds by extracting separated coordinates from the previoustrajectory prefix. After dropping some abnormal trajectories,we get a data set of Porto with 1566798 complete trajectoriesand a data set of Chengdu with 792781 trajectories in total.As a practical matter, we always predict destination of currenttrajectory according to the previous trajectories. Therefore,we arrange these data sets by their start time of trip in anincreasing order, and take the initial 80% and the last 20%as training and testing set, respectively. As these data sets arevery large, it is quite challenging to train when feeding thewhole data. We randomly select 30% candidates for trainingand 10% of the training data as validation set, which is used toturn the parameters. Table I gives the sample sizes of training,testing and validation sets.

2) Architecture of the Learning Model: There are someparameters m, θSV R, θDBN and λ which should be set beforetraining. In our experiment, we take m = 5 for both models.



TABLE II

EFFECT OF DIFFERENT STRUCTURES ON VALIDATION SET (ON THE PORTO DATA SET)

TABLE III

EFFECT OF DIFFERENT STRUCTURES ON VALIDATION SET (ON THE CHENGDU DATA SET)

TABLE IV

PERFORMANCE COMPARISON BETWEEN DIFFERENT PREDICTORS (ON THE PORTO DATA SET)

For determining θSV R, after having tried several combinationsof parameters in validation set, we finally choose radial basisfunction as kernel, and set constant C = 100 and ε = 0.1for insensitive-loss function. For θDBN , the determination ofparameters, such as the layer size, neurons in output layer,nodes in each hidden layer, are elaborated later. For λ in

ELM, the MAE of different segments in the Porto data setwith SVR and DBN are calculated in Fig. 7 (a). It is shownthat the overall performance of SVR without CFE is better,while the DBN-C model with CFE performs best within theinitial 30% of the whole trajectory. They perform similarlywhen the extracted percentage is around 30%, therefore the



TABLE V

PERFORMANCE COMPARISON BETWEEN DIFFERENT PREDICTORS (ON THE CHENGDU DATA SET)

parameter of λ in our ELM can be set to 0.30. In a similarway, we set λ = 0.25 for the data set of Chengdu.

3) Structure of DBN: To find the best architecture for theDBN model, we test the performance on validation sets withseveral different architectures, and we choose the structurewith the best performance. There are some parameters need tobe allocated. The first is the number of neurons in the outputlayer. It is chosen from {1800, 2000, 2200, 2400, 2600, 2800}in the Porto data set, while for the Chengdu data set, we choosefrom {600, 800, 1000, 1200, 1400, 1600}. The second is thelayer size of network, where layer size from two to sevenare chosen to be tested. The third parameter is the number ofnodes in each layer, for simplicity, the number of nodes ineach layer is set to be the same, and it is chosen from 200 to1000 for Porto and 50 to 500 for Chengdu.

Regarding the performances of AMAE and ARMSE onthe validation sets, the best structure of DBN for the Portodata set can be found from Table II: layer size = 2, nodesin layer = 500 and cluster number in output layer = 2400.With Table III, the best structure for Chengdu: layer size= 3, nodes in layer = 100 and cluster number in outputlayer = 800.

D. Results

When applying kNN classifier to ELM, we set k = 23 forPorto data set and k = 25 for Chengdu data set. Fig. 9 showsthe accuracy of model for estimating the current segment,which performs as a classifier to allocate the model with betterperformance from knowledge base. From Fig. 9, we can find

Fig. 9. Accuracy of model for current segment estimation.

that the overall accuracy of the constructed kNN classifieris well, while it performs relatively inferior only when thepercentage of the whole trajectory is around λ. However, thisdoes not make much sense, as the performances of both SVRand DBN-C model are similar in that situation.

We compare the performance of our proposed ELM withSVR, DBN, artificial neural network (ANN) [14], kNN andnaive Bayes (NB) models. Among these models, some aresubdivided into classification and regression models, with thepostfix of “C” and “R”, respectively. In addition, to evaluatethe performance of our proposed CFE technique, each modelis tested with and without CFE technique, respectively.



TABLE VI

PERFORMANCE COMPARISON BETWEEN CFE AND THE ONE-HOT EMBEDDING IN DEEP LEARNING MODELS (ON THE PORTO DATA SET)

TABLE VII

PERFORMANCE COMPARISON BETWEEN CFE AND THE ONE-HOT EMBEDDING IN DEEP LEARNING MODELS (ON THE CHENGDU DATA SET)

TABLE VIII

TIME COSTS BY THE ELM MODEL

Generally, as shown in Table IV and V, with the taxigetting closer to the destination, all the prediction models canget higher accuracy. ELM improves the overall performance,and it performs best among all those models. Model SVRoutperforms DBN-C when the whole trajectory completionproportion approximates 1, since SVR performs as a regressiontask to learn the real numerical destinations from training data,while DBN-C is the task for classification to learn the prob-ability distributions of destination clusters. However, DBN-C performs best at the initial proportions, as the pre-trainedDBN-C has better generative ability [37]. Moreover, it canlearn invariant features and generate invariant representationsfrom training data, which is insensitive to some transforma-tions and exhibits better classification invariance [38], [39].As ELM is derived from SVR mostly, compared with SVR,ELM improves the overall accuracy and it enhances theperformance of the initial proportion of the whole trajectorysignificantly. Compared with models of DBN, ANN, kNNand NB, ELM increases the overall performance significantly.Besides, the performance of proposed CFE technique is obvi-ous when applied in deep learning model (DBN).

In order to evaluate the effectiveness and efficiency of ourproposed CFE technique in feature representation, we conductexperiment and compare it with the most commonly used One-hot embedding technique (denotes as One-Hot-E) in our deep

learning models. From Table VI and Table VII, CFE requiresmuch lower dimensionality for the feature representation com-pared with the One-Hot-E (26 versus 48 in these experiments),which derives a network with less hyperparameters to belearned from training data. As a result, CFE requires lesstime for training the whole network, especially with complexnetworks (like deep learning), which has been proved inboth the Porto and Chengdu taxi experiments. In addition,CFE employs a fuzzy membership and can well address thesimilarity issues between the observations, leading to a betterprediction performance compared with the similarity-equalOne-Hot-E. The proposed ELM is an ensemble model of SVRand DBN, which need to be trained prior to feeding into thesegment detection classifier. Table VIII demonstrates the timecost of model training and trip’s destination prediction forELM method.

As presented in Fig. 10, assume that a taxi in Porto pickedup passengers at a location ( ) at a previous timestamp,the current position of the taxi is labeled as ( ). The realdestination of this trip is denoted as ( ). Then, the input ofour proposed ELM model is the selected 2m points (the initialm points counted from pickup location and the last m pointsfrom current location, m = 5 in this experiment) between thepickup location ( ) and current location ( ), the output isthe geographical location of the next one stop. The predicted



Fig. 10. Comparison of ELM and other models with a case study.

TABLE IX

DISTANCES BETWEEN THE REAL AND PREDICTED DESTINATIONS

WITH DIFFERENT MODELS (UNIT: km)

destination via models of ELM, NB, ANN-R, ANN-C, DBN-R, SVR and kNN-R are marked as ( ), ( ), ( ), ( ),( ), ( ), ( ), respectively. The distance of each predicteddestination to the real destination is shown in Table IX. ELMgets a predicted coordinate ( ), which is closer to the realdrop-off location of this trip, then various kinds of LBSs basedon position can be served for taxi riders while in service.Compared with the predicted coordinates derived from othermodels, the surroundings of the real destination ( ) is morerelevant to those around the coordinate predicted via ELM,which is also reasonable and efficient for giving guideline fortaxi riders. While if the predicted destination is too far awayfrom the real one, this could result in misunderstandings andconfusions for users.

V. CONCLUSION

In this paper, we proposed a data-driven ensemble learningapproach for the taxi destination prediction problem, whichincorporates the advantages of SVR and DBN models dealingwith different segments of the trajectories. A novel dataembedding technique called CFE was applied in deep learningmodel. We evaluated the individual and overall predictionperformances and made comparisons with the SVR, the DBN,the ANN, the kNN and the naive Bayes models. From our

experiments on two real-world taxi GPS trajectory data setscollected from two different urban cities, we demonstrated thatour ensemble learning approach performs better than othermodels in terms of the overall performance. In general, it canget more accurate predicted results, when the taxi gettingcloser to the drop-off location. Experiments also showed theeffectiveness of our proposed CFE technique in deep learning.For future work, it is interesting to focus on the investigationof other algorithms for estimating the current segment of thewhole trajectory. Furthermore, prediction of the arrival timewould be also of great significance for this problem.

REFERENCES

[1] L. Bidasca and E. Townsend, “Making taxis safer: Managing road risksfor taxi drivers, their passengers and other road users,” in Proc. Eur.Transp. Safety Council, Brussels, Belgium, 2016, pp. 5–6.

[2] Y. Ding, S. Liu, J. Pu, and L. M. Ni, “Hunts: A trajectory recommen-dation system for effective and efficient hunting of taxi passengers,”in Proc. IEEE 14th Int. Conf. Mobile Data Manage., Jun. 2013,pp. 107–116.

[3] S. Liu, L. M. Ni, and R. Krishnan, “Fraud detection from taxis’ drivingbehaviors,” IEEE Trans. Veh. Technol., vol. 63, no. 1, pp. 464–472,Jan. 2014.

[4] M. Xu, D. Wang, and J. Li, “Destpre: A data-driven approach todestination prediction for taxi rides,” in Proc. ACM Int. Joint Conf.Pervasive Ubiquitous Comput., 2016, pp. 729–739.

[5] J. Schiller and A. Voisard, Location-Based Services. San Francisco, CA,USA: Morgan Kaufmann, 2004, pp. 1–4.

[6] T. Peng, Q. Liu, and G. Wang, “Enhanced location privacy preservingscheme in location-based services,” IEEE Syst. J., vol. 11, no. 1,pp. 219–230, Mar. 2017.

[7] J. Teevan, A. Karlson, S. Amini, A. J. Brush, and J. Krumm, “Under-standing the importance of location, time, and people in mobile localsearch behavior,” in Proc. 13rd Int. Conf. Human Comput. Interact.Mobile Devices Services, 2011, pp. 77–80.

[8] A. Y. Xue, R. Zhang, Y. Zheng, X. Xie, J. Huang, and Z. Xu, “Destina-tion prediction by sub-trajectory synthesis and privacy protection againstsuch prediction,” in Proc. IEEE 29th Int. Conf. Data Eng., Apr. 2013,pp. 254–265.

[9] G. Pallotta, M. Vespe, and K. Bryan, “Vessel pattern knowledge dis-covery from AIS data: A framework for anomaly detection and routeprediction,” Entropy, vol. 15, no. 6, pp. 2218–2245, Jun. 2013.

[10] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithmfor deep belief nets,” Neural Comput., vol. 18, no. 7, pp. 1527–1554,May 2006.

[11] S. Liu and Q. Qu, “Dynamic collective routing using crowdsourcingdata,” Transp. Res. B Methodol., vol. 93, pp. 450–469, Nov. 2016.

[12] S. Liu and S. Wang, “Trajectory community discovery and recommen-dation by multi-source diffusion modeling,” IEEE Trans. Knowl. DataEng., vol. 29, no. 4, pp. 898–911, Apr. 2017.

[13] D. Ashbrook and T. Starner, “Using GPS to learn significant locationsand predict movement across multiple users,” Pers. Ubiquitous Comput.,vol. 7, no. 5, pp. 275–286, Sep. 2003.

[14] M. Li, A. Ahmed, and A. J. Smola, “Inferring movement trajectoriesfrom gps snippets,” in Proc. 8th ACM Int. Conf. Web Search DataMining, 2015, pp. 325–334.

[15] R. Simmons, B. Browning, Y. Zhang, and V. Sadekar, “Learning topredict driver route and destination intent,” in Proc. IEEE Intell. Transp.Syst. Conf., Sep. 2006, pp. 127–132.

[16] J. A. Alvarez-Garcia, J. A. Ortega, L. Gonzalez-Abril, and F. Velasco,“Trip destination prediction based on past GPS log using a HiddenMarkov model,” Expert Syst. Appl., vol. 37, no. 12, pp. 8166–8171,Dec. 2010.

[17] B. D. Ziebart, A. L. Maas, A. K. Dey, and J. A. Bagnell, “Navigate like acabbie: Probabilistic reasoning from observed context-aware behavior,”in Proc. 10th Int. Conf. Ubiquitous Comput., 2008, pp. 322–331.

[18] S. Gambs, M.-O. Killijian, and M. N. del P. Cortez, “Next placeprediction using mobility Markov chains,” in Proc. 1st Workshop Meas.,Privacy, Mobility, 2012, p. 3.

[19] T. V. Le, S. Liu, H. C. Lau, and R. Krishnan, “Predicting bundles ofspatial locations from learning revealed preference data,” in Proc. Int.Conf. Auton. Agents Multiagent Syst., 2015, pp. 1121–1129.



[20] A. De Brébisson, É. Simon, A. Auvolat, P. Vincent, and Y. Bengio.(2015). “Artificial neural networks applied to taxi destination predic-tion.” [Online]. Available: https://arxiv.org/abs/1508.00021

[21] C. Manasseh and R. Sengupta, “Predicting driver destination usingmachine learning techniques,” in Proc. 16th Int. IEEE Conf. Intell.Transp. Syst., Oct. 2013, pp. 142–147.

[22] V. Costa, T. Fontes, P. M. Costa, and T. G. Dias, “Prediction of journeydestination in urban public transport,” in Proc. 17th Portuguese Conf.Artif. Intell., 2015, pp. 169–180.

[23] T. V. Le, S. Liu, and H. C. Lau, “Reinforcement learning framework formodeling spatial sequential decisions under uncertainty,” in Proc. Int.Conf. Auton. Agents Multiagent Syst., 2016, pp. 1449–1450.

[24] C.-H. Wu, J.-M. Ho, and D. T. Lee, “Travel-time prediction with supportvector regression,” IEEE Trans. Intell. Transp. Syst., vol. 5, no. 4,pp. 276–281, Dec. 2004.

[25] A. J. Smola and B. Schölkopf, “A tutorial on support vector regression,”Statist. Comput., vol. 14, no. 3, pp. 199–222, Aug. 2004.

[26] W. Huang, G. Song, H. Hong, and K. Xie, “Deep architecture for trafficflow prediction: Deep belief networks with multitask learning,” IEEETrans. Intell. Transp. Syst., vol. 15, no. 5, pp. 2191–2201, Oct. 2014.

[27] G. E. Hinton, “Training products of experts by minimizing contrastivedivergence,” Neural Comput., vol. 14, no. 8, pp. 1771–1800, Mar. 2002.

[28] Y. W. Teh and G. E. Hinton, “Rate-coded restricted Boltzmann machinesfor face recognition,” in Proc. NIPS, 2001, pp. 908–914.

[29] A.-R. Mohamed, G. E. Dahl, and G. Hinton, “Acoustic modeling usingdeep belief networks,” IEEE Trans. Audio, Speech, Language Process.,vol. 20, no. 1, pp. 14–22, Jan. 2012.

[30] P. Wang, B. Xu, J. Xu, G. Tian, C.-L. Liu, and H. Hao, “Semantic expan-sion using word embedding clustering and convolutional neural networkfor improving short text classification,” Neurocomputing, vol. 174,pp. 806–814, Jan. 2016.

[31] T. Mikolov, K. Chen, G. Corrado, and J. Dean. (2013). “Effi-cient estimation of word representations in vector space.” [Online].Available: https://arxiv.org/abs/1301.3781

[32] C. Peng, X. Jin, K.-C. Wong, M. Shi, and P. Liò, “Collective humanmobility pattern from taxi trips in urban area,” PLoS ONE, vol. 7, no. 4,p. e34487, Apr. 2012.

[33] X. Liu, L. Gong, Y. Liu, and Y. Gong, “Revealing travel patterns andcity structure with taxi trip data,” J. Transport Geogr., vol. 43, pp. 78–90,Feb. 2015.

[34] L. A. Zadeh, “Fuzzy sets,” Inf. Control, vol. 8, no. 3, pp. 338–353,Jun. 1965.

[35] Kaggle. Accessed: May 30, 2017. [Online]. Available: https://www.kaggle.com

[36] DataCastle. Accessed: Sep. 30, 2017. [Online]. Available: http://www.dcjingsai.com

[37] N. Gao, L. Gao, Q. Gao, and H. Wang, “An intrusion detection modelbased on deep belief networks,” in Proc. 2nd Int. Conf. Adv. Cloud BigData, 2014, pp. 247–252.

[38] N. Ji, J. Zhang, C. Zhang, and L. Wang, “Discriminative restrictedBoltzmann machine for invariant pattern recognition with linear trans-formations,” Pattern Recognit. Lett., vol. 45, pp. 172–180, Aug. 2014.

[39] O. E. David and N. S. Netanyahu, “DeepSign: Deep learning forautomatic malware signature generation and classification,” in Proc.IJCNN, 2015, pp. 1–8.

Xiaocai Zhang received the B.E. degree in nav-igation technology and the M.E. degree in trafficand transport engineering from Dalian MaritimeUniversity, Dalian, China, in 2013 and 2016, respec-tively. He is currently pursuing the Ph.D. degree indata analytics with the Advanced Analytics Institute,University of Technology Sydney, Australia.

His research interests include data mining,machine learning, and intelligent transportationsystems.

Zhixun Zhao received the B.E. degree in elec-tronic science and technology from the Univer-sity of Electronic Science and Technology ofChina, Chengdu, China, in 2013, and the M.E.degree in microelectronics technology from theNational University of Defense Technology, Chang-sha, China, in 2015. He is currently pursuingthe Ph.D. degree in data analytics and bioin-formatics with the Advanced Analytics Institute,University of Technology Sydney, Sydney, Aus-tralia.

His research interests include data mining, machine learning, and applica-tions in bioinformatics.

Yi Zheng received the B.E. degree in computerscience and technology from the University of Elec-tronic Science and Technology of China, Chengdu,China, in 2012. He is currently pursuing the Ph.D.degree in data mining with the Advanced AnalyticsInstitute, University of Technology Sydney, Sydney,Australia.

His research interests include data mining, textmining, and medical data analysis.

Jinyan Li received the bachelor’s degree in science(applied mathematics) from the National Univer-sity of Defense Technology, China, the master’sdegree in engineering (computer engineering) fromthe Hebei University of Technology, China, and thePh.D. degree (computer science) from the Universityof Melbourne, Australia. He joined the Universityof Technology Sydney, in 2011, after ten years ofresearch and teaching work in Singapore with theInstitute for Infocomm Research, Nanyang Techno-logical University, and National University of Singa-

pore. He is currently a Professor of data science with the Advanced AnalyticsInstitute, and a core member at the Centre for Health Technologies, Faculty ofEngineering and Information Technology, University of Technology Sydney,Australia. He is also the Bioinformatics Program Leader.

He is widely known for his pioneering and theoretical research work onemerging patterns that has spawned numerous follow-up research interestsin data mining, machine learning, and bioinformatics, making an enduringcontribution to these fields.

Documents

Prediction of Taxi Destinations Using a Novel Data ...static.tongtianta.site/paper_pdf/1fe105e4-67c7-11e9-a1f6...Meanwhile, real-time prediction of trip destina-tions could also be