Sadilek Krumm Far Out AAAI 12

Embed Size (px)

Citation preview

  • 8/13/2019 Sadilek Krumm Far Out AAAI 12

    1/7

    Far Out: Predicting Long-Term Human Mobility

    Adam Sadilek

    Department of Computer ScienceUniversity of Rochester

    Rochester, NY [email protected]

    John KrummMicrosoft ResearchOne Microsoft Way

    Redmond, WA [email protected]

    Abstract

    Much work has been done on predicting where is one going tobe in the immediate future, typically within the next hour. Bycontrast, we address the open problem of predicting humanmobility far into the future, a scale of months and years. Wepropose an efficient nonparametric method that extracts sig-nificant and robust patterns in location data, learns their asso-ciations with contextual features (such as day of week), andsubsequently leverages this information to predict the mostlikely location at any given time in the future. The entire pro-cess is formulated in a principled way as an eigendecompo-sition problem. Evaluation on a massive dataset with morethan 32,000 days worth of GPS data across 703 diverse sub-

    jects shows that our model predicts the correct location withhigh accuracy, even years into the future. This result opens anumber of interesting avenues for future research and appli-cations.

    Introduction

    Where are you going to be 285 days from now at 2PM?This work explores how accurately such questions can be

    answered across a large sample of people. We propose anovel model of long-term human mobility that extracts sig-nificant patterns shaping peoples lives, and helps us under-stand large amounts of data by visualizing the patterns ina meaningful way. But perhaps most importantly, we showthat our system, Far Out,predicts peoples location with highaccuracy, evenfar into the future, up to multiple years.

    Such predictions have a number of interesting applica-tions at various scales of the target population size. We willgive a few examples here. Focusing on one individual at atime, we can provide better reminders, search results, andadvertisements by considering all the locations the person islikely to be close to in the future (e.g., Need a haircut? In4 days, you will be within 100 meters of a salon that will

    have a $5 special at that time.). At the social scale (peopleyou know), we can leverage Far Outs predictions to sug-gest a convenient place and time for everybody to meet, evenwhen they are dispersed throughout the world. We also en-vision a peer-to-peer package delivery system, but there onewould heavily rely on a reliable set of exchange locations,

    Adam performed this work while at Microsoft Research.Copyright c 2012, Association for the Advancement of ArtificialIntelligence (www.aaai.org). All rights reserved.

    Figure 1:This screenshot of our visualization tool shows mobilitypatterns of one of our subjects living in the Seattle metropolitanarea. The colored triangular cells represent a probability distribu-tion of the persons location given an hour of a day and day type.

    where people are likely to meet in the future. Far Out canprovide these. Finally, at the population scale, Far Out is the

    first step towards bottom-up modeling of the evolution ofan entire metropolitan area. By capturing long-term mobil-ity of individuals, emergent patterns, such as traffic conges-tion, spread of disease, and demand for electricity or otherresources, can be predicted a long time ahead as well. Theseapplications motivate the predictive aspect of Far Out, but aswe will see, the patterns it finds are also useful for gaininginsight into peoples activities and detecting unusual behav-ior. Researchers have recently argued for a comprehensivescientific approach to urban planning, and long-term mod-eling and prediction of human mobility is certainly an es-sential element of such a paradigm (Bettencourt and West2010).

    Techniques that work quite well for short-term predic-

    tion, such as hidden Markov models and random walk-basedformalisms, are of little help for long-term inference. Bothclasses of models make strong independence assumptionsabout the domain, and one often postulates that a personslocation at timetonly depends on her location at timet 1.Such models give increasingly poorer and poorer predictionsas they are forced to evolve the system further into the fu-ture (Musolesi and Mascolo 2009). Although one can im-prove the performance by conditioning on a larger context

  • 8/13/2019 Sadilek Krumm Far Out AAAI 12

    2/7

    0 1 2 3 4 5 6 7 80

    20

    40

    60

    80

    100

    log10

    (Area [km2]) / log

    10(Distance [km])

    NumberofSubjects

    Area

    Distance

    Figure 2:The distribution of the bounding rectangular geographi-cal areas and longest geodesic distances covered by individual sub-

    jects.

    and structure the models hierarchically, learning and infer-ence quickly become intractable or even infeasible due tocomputational challenges and lack of training data.

    While your location in the distant future is in generalhighly independent of your recent location, as we will see, itis likely to be a good predictor of your location exactly oneweek from now. Therefore, we view long-term prediction asa process that identifies strong motifs and regularities in sub-

    jects historical data, models their evolution over time, andestimates future locations by projecting the patterns into the

    future. Far Out implements all three stages of this process.

    The Data

    We evaluate our models on a large dataset consisting of703 subjects of two types: people (n 307) and vehicles(n 396). The people include paid and unpaid volunteerswho carried consumer-grade GPS loggers while going abouttheir daily lives. Vehicles consist of commercial shuttles,paratransit vans, and personal vehicles of our volunteers, andhad the same GPS unit installed on their dashboard. Whilesome of the shuttles follow a relatively fixed schedule, mostof them are available on demand and, along with the para-transit vans, flexibly cover the entire Seattle metropolitan

    area.Since this work focuses on long-term prediction, we needto consider only datasets that span extensive time periods,which are rare. The number of contiguous days availableto us varies across subjects from 7 to 1247 ( 45.9, 117.8). Overall, our dataset contains 32,268 days worthof location data. Fig. 2 shows the distribution of the area(bounding rectangle) covered by our subjects. We observehigh variance in the area across subjects, ranging from 30 tomore than108 km2. To put these numbers in perspective, thesurface area of the entire earth is5.2 108 km2.

    Methodology and Models

    Our models leverage Fourier analysis to find significant pe-

    riodicities in human mobility, and principal component anal-ysis (PCA) to extract strong meaningful patterns from loca-tion data, which are subsequently leveraged for prediction.

    To enable Fourier analysis, we represent each GPS read-ing, consisting of a latitude, longitude pair for each time t,as a complex numberzt latitudet`plongitudetqi. This al-lows us to perform Fourier analysisjointlyover both spatialdimensions of the data, thereby extracting significant peri-ods in a principled way. We can map a functionffrom time

    domain to frequency domain via discrete Fourier transform(DFT) given by

    Zk Ft

    tztu

    T1

    t0

    pkq

    T1t0

    ztep2k tT qi (1)

    where z0, . . . zT1 is a sequence of complex numbers rep-resenting a subjects location over Tseconds. We refer thereader to (Brigham and Morrow 1967) for more details onDFT.

    PCA is a dimensionality reduction technique that trans-forms the original data into a new basis, where the basisvectors are, in turn, aligned with the directions of the high-est remaining variance of the data. PCA can be performedby eigendecomposition of the data covariance matrix, orby applying singular value decomposition (SVD) directlyon the data matrix. Our implementation uses the latter ap-proach, as its more numerically stable. PCA has a proba-bilistic interpretation as a latent variable model, which en-dows our model with all the practical advantages stemmingfrom this relationship, such as efficient learning and dealingwith missing data (Tipping and Bishop 1999). For a thor-ough treatment of PCA, see (Jolliffe 2002).

    We consider continuous (GPS coordinates) as well as dis-crete (occupancy grid) data, and our models work with bothmodalities without any modification to the mathematics orthe algorithms. In both cases we represent each day as a vec-tor of features. In the continuous representation, we have a56-element vector shown in Fig. 3. The first 24 elementscapture the subjects median latitude for each hour of theday, the next 24 elements correspond to the median longi-tude, the following 7 elements encode the day of week (in1-out-of-7 binary code, since its a categorical variable), andthe final element is 1 if the day is a national holiday in thesubjects current locale (e.g., Christmas, Thanksgiving) and0 otherwise. This representation helps us capture the depen-

    dence between the subjects location and the hour of the day,day of week, and whether or not the day is a holiday. Thecontinuous representation is best suited for predicting a sub-

    jects single, approximate location for a given time, possiblyfor finding nearby people or points of interest. This represen-tation is not probabilistic, as the discretized representationwe describe next.

    In the discretized condition, we divide the surface of theglobe into equilateral triangular cells of uniform size (sidelength of 400 meters), and assign each GPS reading to thenearest cell. We then induce an empirical probability dis-tribution over the ten most frequently visited cells and oneother location that absorbs all GPS readings outside of thetop ten cells. Our analysis shows that the 10 discrete lo-

    cations capture the vast majority of an individuals mobil-ity, and each such cell can often be semantically labeled ashome, work, favorite restaurant,etc.

    Fig. 1 shows the occupancy probability distribution overthe cells for one of our subjects, given by

    PrpC c | T t,W wq countpc,t,wqc1PC

    countpc1, t ,wq (2)

    whereC,T, andW are random variables representing cells,

  • 8/13/2019 Sadilek Krumm Far Out AAAI 12

    3/7

    Lat

    0

    Lat

    23. . .

    Lat

    22

    Lat

    1

    Lon

    0

    Lon

    23. . . Sun Sat. . .Mon Hol.Fri

    Lon

    1

    Lon

    22

    Figure 3:Our continuous vector representation of a day d consistsof the median latitude and longitude for each hour of the day (00:00through 23:59), binary encoding of the day of week, and a binaryfeature signifying whether a national holiday falls on d.

    Cell

    1. . .

    Cell

    2. . .Other

    Cell

    10Sun Sat. . .Mon Hol.Fri

    Cell

    1. . .

    Cell

    2Other

    Cell

    10

    00:00 - 00:59 23:00 - 23:59

    Figure 4:Our cell-based vector representation of a day d encodesthe probability distribution over dominant cells conditioned on thetime within d, and the same day-of-week and holiday informationas the continuous representation (last 8 elements).

    time of day, and day type, respectively. Cis the set of allcells.

    We construct a feature vector for each day from this prob-ability distribution as shown in Fig. 4, where the first 11 el-ements model the occupancy probability for the 11 discreteplaces between 00:00 and 00:59 of the day, the next 11 ele-ments capture 01:00 through 01:59,etc.The final 8 elementsare identical to those in the continuous representation. Thediscretized representation sacrifices the potential precisionof the continuous representation for a richer representationof uncertainty. It does not constrain the subjects location toa single location or cell, but instead represents the fact thatthe subject could be in one of several cells with some uncer-tainty for each one.

    The decision to divide the data into 24-hour segmentsis not arbitrary. Applying DFT to the raw GPS data as de-

    scribed above shows that most of the energy is concentratedin periods shorter or equal to 24 hours.

    Now we turn our attention to the eigenanalysis of the sub-jects location, which provides further insights into the data.Each subject is represented by a matrix D, where each rowis a day (either in the continuous or the cell form). Prior tocomputing PCA, we apply Mercator cylindrical projectionon the GPS data and normalize each column of observationsby subtracting out its mean and dividing by its standarddeviation . Normalizing with the mean and standard devia-tion scales the data so values in each column are in approxi-mately the same range, which in turn prevents any columnsfrom dominating the principal components.

    Applying SVD, we effectively find a set of eigenvectors ofDs covariance matrix, which we call eigendays(Fig. 5). Afew top eigendays with the largest eigenvalues induce a sub-space, onto which a day can be projected, and that capturesmost of the variance in the data. For virtually all subjects, teneigendays are enough to reconstruct their entire location logwith more than 90% accuracy. In other words, we can accu-rately compress an arbitrary day d into onlyn ! |d| weightsw1, . . . , wnthat induce a weighted sum over a common set

    of ten most dominant eigendays Ei:

    d

    ni1

    wiEi

    `

    ffdiagpq. (3)

    This applies to both continuous and discretized data. Thereason for this is that human mobility is relatively regular,and there is a large amount of redundancy in the raw repre-

    sentation of peoples location. Note that unlike most otherapproaches, such as Markov models, PCA captures long-term correlations in the data. In our case, this means patternsin location over an entire day, as well as joint correlationsamong additional attributes (day of week, holiday) and thelocations.

    Our eigenanalysis shows that there are strong correlationsamong a subjects latitudes and longitudes over time, andalso correlations between other features, such as the day-of-week, and raw location. Lets take eigenday #2 (E2) inFig. 5 as an example. From the last 8 elements, we see thatPCA automatically grouped holidays, weekends, and Tues-days within this eigenday. The location pattern for days thatfit these criteria is shown in the first 48 elements. In particu-

    lar,E2makes it evident that this person spends her eveningsand nights (from 16:00 to 24:00) at a particular constant lo-

    cation in the North-West corner of her data, which turnsout to be her home.

    The last 8 elements of each eigenday can be viewed asindicators that show how strongly the location patterns inthe rest of the corresponding eigenday exhibit themselves ona given day-of-week holiday combination. For instance,E3 is dominant on Saturdays, E7 on Fridays, and E10 onTuesdays that are not holidays (compare with E2).

    Fig. 6 shows the top ten eigendays for the cell-based rep-resentation. Now we see patterns in terms of probability dis-tributions over significant cells. For instance, this subject ex-hibits a strong baseline behavior (E1) on all daysand es-

    pecially nonworking daysexcept for Tuesdays, which arecaptured in E2. Note that the complex patterns in cell oc-cupancy as well as the associated day types can be directlyread off the eigendays.

    Our eigenday decomposition is also useful for detec-tion of anomalous behavior. Given a set of eigendays andtheir typical weights computed from training data, we cancompute how much a new day deviates from the subspaceformed by the historical eigendays. The larger the deviation,the more atypical the day is. We leave this opportunity forfuture work.

    So far we have been focusing on the descriptive aspectof our modelswhat types of patterns they extract and howcan we interpret them. Now we turn to the predictive power

    of Far Out.

    Predictive Models

    We consider three general types of models for long-termlocation prediction. Each type works with both continuous(raw GPS) as well as discretized (triangular cells) data, andall our models are directly applied to both types of datawithout any modification of the learning process. Further-more, while we experiment with two observed features (day

  • 8/13/2019 Sadilek Krumm Far Out AAAI 12

    4/7

    Lon

    Lat Eigenday #1

    0 8 16 22 SMTWT F SH

    !0.15 !0.1 !0.05 0 0.05 0.1 0.15

    Lon

    Lat Eigenday #2

    0 8 16 22 SMTWTF SH

    !0.2 !0.15 !0.1 !0.05 0 0.05 0.1 0.15 0.2

    Lon

    Lat Eigenday #3

    0 8 16 22 SMTWT F SH

    !0.1 0 0.1 0.2 0.3

    Lon

    Lat Eigenday #4

    0 8 16 22 SMTWTF SH

    !0.4 !0.3 !0.2 !0.1 0 0.1 0.2 0.3 0.4

    LonL

    at Eigenday #5

    0 8 16 22 SMTWT F SH

    !0.2 0 0.2 0.4 0.6

    LonL

    at Eigenday #6

    0 8 16 22 SMTWTF SH

    !0.4 !0.2 0 0.2 0.4 0.6

    Lon

    Lat Eigenday #7

    0 8 16 22 SMTWT F SH

    !0.2 0 0.2 0.4 0.6

    Lon

    Lat Eigenday #8

    0 8 16 22 SMTWTF SH

    !0.2 0 0.2 0.4 0.6

    Lon

    Lat Eigenday #9

    0 8 16 22 SMTWT F SH

    !0.2 0 0.2 0.4 0.6

    Lon

    Lat Eigenday #10

    0 8 16 22 SMTWTF SH

    !0.3 !0.2 !0.1 0 0.1 0.2 0.3

    t

    T T

    . . . . . .

    t

    T T

    . . . . . . . .

    t

    T T

    . . . . .

    t

    T T

    . . . .

    t

    T T

    . . . . .

    t

    T T

    . . . . . . . .

    Figure 5: Visualization of the top ten most dominant eigendays(E1 through E10). The leftmost 48 elements of each eigenday cor-respond to the latitude and longitude over the 24 hours of a day,latitude plotted in the top rows, longitude in the bottom. The next7 binary slots capture the seven days of a week, and the last ele-ment models holidays versus regular days (cf.Fig. 3). The patternsin the GPS as well as the calendar features are color-coded usingthe mapping shown below each eigenday.

    Cells

    Eigenday #1

    5 10 15 20

    0.2 0.1 0 0.1 0.2

    Su Mo Tu We Th Fr Sa Hol.

    0.06 0.04 0.02 0 0.02

    Eigenday #2

    5 10 15 20

    0.1 0 0.1 0.2 0.3

    Su Mo Tu We Th Fr Sa Hol.

    0.04 0 .0 2 0 0 .0 2 0 .0 4 0 .0 6

    Eigenday #3

    5 10 15 20

    0.2 0.1 0 0.1 0.2

    Su Mo Tu We Th Fr Sa Hol.

    0.1 0 0.1

    0.1 0 0.1 0.2

    Su Mo Tu We Th Fr Sa Hol.

    0.05 0 0.05

    0.4 0.2 0 0.2 0.4

    Su Mo Tu We Th Fr Sa Hol.

    0.1 0 0.1

    0.2 0.1 0 0.1 0.2

    Su Mo Tu We Th Fr Sa Hol.

    0.1 0 0.1

    Eigenday #6

    5 10 15 20

    Eigenday #5

    5 10 15 20

    Cells

    Eigenday #4

    5 10 15 20

    Figure 6: Visualization of the top six most dominant eigendays(E1 through E6). The larger matrix within an eigenday shows celloccupancy patterns over the 24 hours of a day. Patterns in the cal-endar segment of each eigenday are shown below each matrix ( cf.Fig. 4).

    of week and holiday), our models can handle arbitrary num-ber of additional features, such as season, predicted weather,social and political features, known traffic conditions, infor-mation extracted from the power spectrum of an individual,and other calendar features (e.g.,Is this the second Thursdayof a month?; Does a concert or a conference take place?).In the course of eigendecomposition, Far Out automaticallyeliminates insignificant and redundant features.

    Mean Day Baseline Model For the continuous GPS rep-resentation, the baseline model calculates the average lati-tude and longitude for each hour of day for each day type.In the discrete case, we use the mode of cell IDs instead ofthe average. To make a prediction for a query with certainobserved featureso, this model simply retrieves all days thatmatcho from the training data, and outputs their mean ormode. Although simple, this baseline is quite powerful, es-pecially on large datasets such as ours. It virtually eliminates

    all random noise for repeatedly visited places. Additionally,since the spatial distribution of sporadic and unpredictabletrips is largely symmetric over long periods of time, the er-rors these trips would have caused tend to be averaged outby this model (e.g., a spontaneous trip Seattle-Las Vegas isbalanced by an isolated flight Seattle-Alaska).

    Projected Eigendays Model First, we learn all principalcomponents (a.k.a. eigendays) from the training data as de-

    scribed above. This results in a n nmatrix P, with eigen-days as columns, wherenis the dimensionality of the origi-nal representation of each day (either 56 or 272).

    At testing time, we want to find a fitting vector of weightsw, such that the observed part of the query can be repre-sented as a weighted sum of the corresponding elements ofthe principal components in matrix P. More specifically,this model predicts a subjects location at a particular timetq in the future by the following process. First, we extractobserved features from tq, such as which day of week tqcorresponds to.The observed feature values are then writteninto a query vector q . Now we projectq onto the eigendayspace using only the observed elements of the eigendays.This yields a weight for each eigenday, that captures how

    dominant that eigenday is given the observed feature values:

    w pq qdiagp1qPc (4)

    where q is a row vector of lengthm(the number of observedelements in the query vector), Pcis am cmatrix (cis thenumber of principal components considered), and w is a rowvector of lengthc. Since we implement PCA in the space ofnormalized variables, we need to normalize the query vectoras well. This is achieved by subtracting the mean , andcomponent-wise division by the variance of each column .

    Note that finding an optimal set of weights can be viewedas solving (for w) a system of linear equations given by

    wPTc pq qdiagp1q. (5)

    However, under most circumstances, such a system is ill-conditioned, which leads to an undesirable numerical sen-sitivity and subsequently poor results. The system is eitherover- or under-determined, except when c m. Further-more, PTc may be singular.

    Theorem 1. The projected eigendays model learns weightsby performing a least-squares fit.

    Proof. If P has linearly independent rows, a general-ized inverse (e.g., Moore-Penrose) is given by P` PpPPq1 (Ben-Israel and Greville 2003). In our case,P P Rmc and by definition forms an orthonormal ba-sis. Therefore PP is an identity matrix and it follows that

    P` PT. It is known that pseudoinverse provides a least-squares solution to a system of linear equations (Penrose1956). Thus, equations 4 and 5 are theoretically equivalent,but the earlier formulation is significantly more elegant, ef-ficient, and numerically stable.

    Using Eq. 3, the inferred weights are subsequently used togenerate the prediction (either continuous GPS or probabil-ity distribution over cells) for time tq. Note that both training

  • 8/13/2019 Sadilek Krumm Far Out AAAI 12

    5/7

    0 10 20 30 40 50 600

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    Number of Principal Components (Eigendays) Used

    ErrorperPrediction[km]

    Mean Baseline

    Mean Baseline adjusted for drift

    Projected Eigendays

    Projected Eigendays, adjusted for drift

    Segregated Eigendays

    Segregated Eigendays, adjusted for drift

    Figure 7: Comparison in terms of absolute prediction error overall subjects as we vary the number of eigendays we leverage.

    and testing are efficient (Opcdmq, whered is the number of

    days) and completely nonparametric, which makes Far Outvery easy to apply to other domains with different features.

    Segregated Eigendays Model While the last two mod-els induced a single set of eigendays, this model learns aseparate library of eigendays for each day type, e.g., eigen-holiday-mondays, over only the location elements of the dayvectorsd. Prediction is made using Eq. 3, where the weightsare proportional to the variance each eigenday explains inthe training data.

    Adapting to Pattern Drift

    Since our models operate in a space of normalized variables,we can adapt to the drift of mean and variance of each sub-

    jects locations, which does occur over extended periods oftime. The basic idea is to weigh more recent training datamore heavily than older ones when de-normalizing a predic-tion (see Eq. 3). We achieve this by imposing a linear decaywhen learning and from the training data.

    Experiments and Results

    In this section, we evaluate our approach, compare theperformance of the proposed models, and discuss insightsgained. Unless noted otherwise, for each subject, we alwaystrain on the first half of her data (chronologically) and teston the remaining half.

    First, lets look at the predictions in the continuous GPSform, where the crucial metric is the median absolute error

    in distance. Fig. 7 shows the error averaged over all sub-jects as a function of the number of eigendays leveraged.We show our three model types, both with and without ad-dressing pattern drift. We see that the segregated eigendaysmodel is not significantly better than the baseline. One rea-son is that it considers each day type in isolation and there-fore cannot capture complex motifs spanning multiple days.Additionally, it has to estimate a larger number of param-eters than the unified models, which negatively impacts its

    Cars Pocket Loggers Shuttles Paratransit0

    20

    40

    60

    80

    100

    n=301 n=62 n=97 n=243

    Av

    erageAccuracy[%]

    Projected Eigendays, adjusted for drift

    Most Frequent Baseline, adjusted for drift

    Random Baseline

    Figure 8:Accuracy of cell-based predictions varies across subjecttypes, but the projected eigendays model outperforms its alterna-tives by a significant margin.

    performance, especially for subjects with smaller amountsof training data.

    By considering only the strongest eigendays, we extractthe dominant and, in a sense, most dependable patterns, and

    filter out the volatile, random, and less significant signals.This effect is especially strong in the projected model. Fi-nally, we see that modeling pattern drift systematically re-duces the error by approximately 27%.

    Now we focus on the evaluation of the same models, butthis time they operate on the cell representation. We ad-ditionally consider a trivial random baseline that guessesthe possible discrete locations uniformly at random. Oureigenday-based models predict based on maximum likeli-hood:

    ct,w argmaxc

    `PrpC c | T t,W wq

    .

    For the sake of brevity, we will focus on the projected eigen-

    days model adapted to pattern drift (with results averagedover c, the number of eigendays used), as our evaluationon the cell-based representation yields the same ordering inmodel quality as in Fig. 7.

    In Fig. 8, we see that the eigenday model clearly dom-inates both baselines, achieving up to 93% accuracy. Per-sonal cars are about as predictable as pocket loggers (84%),and paratransit vans are significantly harder (77%), as theydont have any fixed schedule nor circadian rhythms.

    Since we evaluate on a dataset that encompasses long pe-riods of time, we have a unique opportunity to explore howthe test error varies as we make predictions progressivelyfurther into the future and increase the amount of trainingdata. Fig. 9 shows these complex relationships for one of

    our subjects with a total of 162 weeks of recorded data. Bycontrast, virtually all work to date has concentrated on thefirst column ofpixelson the left-hand side of the plot. Thisis the region of short-term predictions, hours or days into thefuture.

    We see that the projected eigenday model systematicallyoutperforms the baseline and produces a low test error forpredictions spanning the entire 81 week testing period (cf.Figs. 9a and 9b). In general, as we increase the amount of

  • 8/13/2019 Sadilek Krumm Far Out AAAI 12

    6/7

    20 40 60 80

    10

    20

    30

    40

    50

    60

    70

    80

    1 2 3

    (a) Mean Baseline

    20 40 60 80

    1 2 3

    (b) PCA Cumulative

    20 40 60 80

    2 1 0 1

    (c) PCA Separate

    Figure 9: How test error varies depending on how far into thefuture we predict and how much training data we use. Each plotshows the prediction error, in km, as a function of the amount oftraining data in weeks (vertical axes), and how many weeks intothe future the models predict (horizontal axes). Plots (a) and (b)visualize cumulative error, where a pixel with coordinates px, yqrepresents the average error over testing weeks 1 through x, whenlearning on training weeks 1 through y. Plot (c) shows, on a logscale, the error for each pair of weeks separately, where we trainonly on weeky and test on x.

    training data, the error decreases, especially for extremelylong-term predictions.

    Fig. 9c shows that not all weeks are created equal. Thereare several unusual and therefore difficult weeks (e.g., testweek #38), but in general our approach achieves high accu-racy even for predictions 80 weeks into the future. Subse-quent work can take advantage of the hindsight afforded byFig. 9, and eliminate anomalous or confusing time periods(e.g., week #30) from the training set.

    Finally, decomposition of the prediction error along daytypes shows that for human subjects, weekends are most dif-ficult to predict, whereas work days are least entropic. Whilethis is to be expected, we notice a more interesting pattern,where the further away a day is from a nonworking day, themore predictable it is. For instance, Wednesdays in a regularweek are the easiest, Fridays and Mondays are harder, andweekends are most difficult. This motif is evident across allhuman subjects and across a number of metrics, includinglocation entropy, KL divergence and accuracy (cell-basedrepresentation), as well as absolute error (continuous data).Shuttles and paratransit exhibit the exact inverse of this pat-tern.

    Related Work

    There is ample previous work on building models of short-term mobility, both individual and aggregate, descriptive aswell as predictive. But there is a gap in modeling and pre-

    dicting long-term mobility, which is our contribution (seeTable 1).

    Recent research has shown that surprisingly rich modelsof human behavior can be learned from GPS data alone,for example (Ashbrook and Starner 2003; Liao, Fox, andKautz 2005; Krumm and Horvitz 2006; Ziebart et al. 2008;Sadilek and Kautz 2010). However, previous work focusedon making predictions at fixed, and relatively short, timescales. Consequently, questions such as Where is Karl go-

    Short Term Long Term

    Descriptive Previous work Previous workPredictive Previous work Only Far OutUnified Previous work Only Far Out

    Table 1:The context of our contributions.

    ing to be in the next hour?can often be answered with highaccuracy. By contrast, this work explores the predictabilityof peoples mobility at various temporal scales, and specifi-cally far into the future. While short-term prediction is oftensufficient for routing in wireless networks, one of the majorapplications of location modeling to date, long-term model-ing is crucial in ubiquitous computing, infrastructure plan-ning, traffic prediction, and other areas, as discussed in theintroduction.

    Much effort on the descriptive models has been motivatedby the desire to extract patterns of human mobility, andsubsequently leverage them in simulations that accuratelymimic observed general statistics of real trajectories (Kim,Kotz, and Kim 2006; Gonzalez, Hidalgo, and Barabasi 2008;

    Lee et al. 2009; Li et al. 2010; Kim and Kotz 2011). How-ever, all these works focus on aggregate behavior and do notaddress the problem of location prediction, which is the pri-mary focus of this paper.

    Virtually all predictive models published to date haveaddressed only short-term location prediction. Even workswith specific long-term focus have considered only predic-tions up to hours into the future (Scellato et al. 2011). Fur-thermore, each proposed approach has been specifically tai-lored for either continuous or discrete data, but not both. Forexample, (Eagle and Pentland 2009) consider only four dis-crete locations and make predictions up to 12 hours into thefuture. By contrast, this paper presents a general model forshort- as well as long-term (scale of months and years) pre-

    diction, capable of working with both types of data repre-sentation.

    Jeung et al. (2008) evaluate a hybrid location model thatinvokes two different prediction algorithms, one for queriesthat are temporally close, and the other for predictions fur-ther into the future. However, their approach requires select-ing a large number of parameters and metrics. Additionally,Jeung et al. experiment with mostly syntheticdata. By con-trast, we present a unified and nonparametric mobility modeland evaluate on an extensive dataset recorded entirely byreal-world sensors.

    The recent surge of online social networks sparked in-terest in predicting peoples location from their online be-havior and interactions (Cho, Myers, and Leskovec 2011;

    Sadilek, Kautz, and Bigham 2012). However, unlike ourwork, they address short-term prediction on very sparselysampled location data, where user location is recorded onlywhen she posts a status update.

    In the realm of long-term prediction, (Krumm and Brush2011) model the probability of being at home at any givenhour of a day.We focus on capturing long-term correlationsand patterns in the data, and our models handle a large (oreven unbounded, in our continuous representation) number

  • 8/13/2019 Sadilek Krumm Far Out AAAI 12

    7/7

    of places, not just ones home.

    Conclusions and Future Work

    This work is the first to take on understanding and predictinglong-term human mobility in a unified way. We show thatit is possible to predict location of a wide variety of hun-dreds of subjects even years into the future and with highaccuracy. We propose and evaluate an efficient and nonpara-metric model based on eigenanalysis, and demonstrate thatit systematically outperforms other strong candidates. Sinceour model operates over continuous, discrete, and proba-bilistic data representations, it is quite versatile. Addition-ally, it has a high predictive as well as descriptive power,since the eigendays capture meaningful patterns in subjectslives. As our final contribution, we analyze the difficulty oflocation prediction on a continuum from short- to long-term,and show that Far Outs performance is not significantly af-fected by the temporal distances.

    The cell-based modeling is especially amenable to im-provements in future work. Namely, since frequently visitedcells have a semantic significance, our probabilistic interpre-

    tation can be combined in a Bayesian framework with priorprobabilities from large-scale surveys1 and additional con-straints, such as physical limits on mobility, where candidatefuture locations are strictly constrained by ones current po-sition along with means of transportation available. Finally,it would be interesting to generalize the eigenday approachwith a hierarchy of nested eigen-periods, where each levelcaptures only the longer patterns the previous one couldnt(e.g., eigendayseigenweekseigenmonths.. . ).

    Acknowledgements

    We thank Krystof Hoder, Ashish Kapoor, Tivadar Papai, andthe anonymous reviewers for their helpful comments.

    References

    Ashbrook, D., and Starner, T. 2003. Using GPS to learnsignificant locations and predict movement across multipleusers. Personal Ubiquitous Comput.7:275286.

    Ben-Israel, A., and Greville, T. 2003. Generalized inverses:theory and applications, volume 15. Springer Verlag.

    Bettencourt, L., and West, G. 2010. A unified theory ofurban living. Nature467(7318):912913.

    Brigham, E., and Morrow, R. 1967. The fast Fourier trans-form. Spectrum, IEEE4(12):6370.

    Cho, E.; Myers, S. A.; and Leskovec, J. 2011. Friendship

    and mobility: User movement in location-based social net-works. ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining (KDD).

    Eagle, N., and Pentland, A. 2009. Eigenbehaviors: Identify-ing structure in routine. Behavioral Ecology and Sociobiol-ogy63(7):10571066.

    1e.g., American Time Use Survey, National Household TravelSurvey

    Gonzalez, M.; Hidalgo, C.; and Barabasi, A. 2008. Un-derstanding individual human mobility patterns. Nature453(7196):779782.

    Jeung, H.; Liu, Q.; Shen, H.; and Zhou, X. 2008. A hybridprediction model for moving objects. InData Engineering,2008. ICDE 2008. IEEE 24th International Conference on,7079. IEEE.

    Jolliffe, I. 2002. Principal component analysis. Encyclope-

    dia of Statistics in Behavioral Science.Kim, M., and Kotz, D. 2011. Identifying unusual days.Journal of Computing Science and Engineering5(1):7184.

    Kim, M.; Kotz, D.; and Kim, S. 2006. Extracting a mobilitymodel from real user traces. InProc. IEEE Infocom, 113.Citeseer.

    Krumm, J., and Brush, A. 2011. Learning time-based pres-ence probabilities. Pervasive Computing7996.

    Krumm, J., and Horvitz, E. 2006. Predestination: Inferringdestinations from partial trajectories. UbiComp 2006: Ubiq-uitous Computing243260.

    Lee, K.; Hong, S.; Kim, S.; Rhee, I.; and Chong, S. 2009.Slaw: A new mobility model for human walks. In INFO-COM 2009, IEEE, 855863. IEEE.

    Li, Z.; Ding, B.; Han, J.; Kays, R.; and Nye, P. 2010. Miningperiodic behaviors for moving objects. In Proceedings of the16th ACM SIGKDD international conference on Knowledgediscovery and data mining, 10991108. ACM.

    Liao, L.; Fox, D.; and Kautz, H. 2005. Location-based activ-ity recognition using relational Markov networks. In IJCAI.

    Musolesi, M., and Mascolo, C. 2009. Mobility models forsystems evaluation. a survey.

    Penrose, R. 1956. On best approximate solutions of linearmatrix equations. In Proceedings of the Cambridge Philo-sophical Society, volume 52, 1719. Cambridge Univ Press.

    Sadilek, A., and Kautz, H. 2010. Recognizing multi-agentactivities from GPS data. In Twent-Fourth AAAI Conferenceon Artificial Intelligence.

    Sadilek, A.; Kautz, H.; and Bigham, J. P. 2012. Finding yourfriends and following them to where you are. In Fifth ACMInternational Conference on Web Search and Data Mining.(Best Paper Award).

    Scellato, S.; Musolesi, M.; Mascolo, C.; Latora, V.; andCampbell, A. 2011. Nextplace: A spatio-temporal predic-tion framework for pervasive systems. Pervasive Computing152169.

    Tipping, M., and Bishop, C. 1999. Probabilistic principalcomponent analysis.Journal of the Royal Statistical Society.Series B, Statistical Methodology611622.

    Ziebart, B.; Maas, A.; Dey, A.; and Bagnell, J. 2008. Nav-igate like a cabbie: Probabilistic reasoning from observedcontext-aware behavior. In Proceedings of the 10th in-ternational conference on Ubiquitous computing, 322331.ACM.