29
Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=tgis20 Download by: [University of Tennessee, Knoxville] Date: 05 August 2017, At: 20:01 International Journal of Geographical Information Science ISSN: 1365-8816 (Print) 1362-3087 (Online) Journal homepage: http://www.tandfonline.com/loi/tgis20 Coupling mobile phone and social media data: a new approach to understanding urban functions and diurnal patterns Wei Tu , Jinzhou Cao, Yang Yue, Shih-Lung Shaw, Meng Zhou , Zhensheng Wang, Xiaomeng Chang, Yang Xu & Qingquan Li To cite this article: Wei Tu , Jinzhou Cao, Yang Yue, Shih-Lung Shaw, Meng Zhou , Zhensheng Wang, Xiaomeng Chang, Yang Xu & Qingquan Li (2017): Coupling mobile phone and social media data: a new approach to understanding urban functions and diurnal patterns, International Journal of Geographical Information Science, DOI: 10.1080/13658816.2017.1356464 To link to this article: http://dx.doi.org/10.1080/13658816.2017.1356464 Published online: 31 Jul 2017. Submit your article to this journal View related articles View Crossmark data

Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

Full Terms & Conditions of access and use can be found athttp://www.tandfonline.com/action/journalInformation?journalCode=tgis20

Download by: [University of Tennessee, Knoxville] Date: 05 August 2017, At: 20:01

International Journal of Geographical InformationScience

ISSN: 1365-8816 (Print) 1362-3087 (Online) Journal homepage: http://www.tandfonline.com/loi/tgis20

Coupling mobile phone and social media data: anew approach to understanding urban functionsand diurnal patterns

Wei Tu , Jinzhou Cao, Yang Yue, Shih-Lung Shaw, Meng Zhou , ZhenshengWang, Xiaomeng Chang, Yang Xu & Qingquan Li

To cite this article: Wei Tu , Jinzhou Cao, Yang Yue, Shih-Lung Shaw, Meng Zhou , ZhenshengWang, Xiaomeng Chang, Yang Xu & Qingquan Li (2017): Coupling mobile phone and social mediadata: a new approach to understanding urban functions and diurnal patterns, International Journalof Geographical Information Science, DOI: 10.1080/13658816.2017.1356464

To link to this article: http://dx.doi.org/10.1080/13658816.2017.1356464

Published online: 31 Jul 2017.

Submit your article to this journal

View related articles

View Crossmark data

Page 2: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

ARTICLE

Coupling mobile phone and social media data: a newapproach to understanding urban functions and diurnalpatternsWei Tu a,c, Jinzhou Caob, Yang Yuea,c, Shih-Lung Shawd, Meng Zhou a,c,e,Zhensheng Wanga,c, Xiaomeng Changa,c, Yang Xuf and Qingquan Lia,b,c

aShenzhen Key Laboratory of Spatial Information Smart Sensing and Services, School of Architecture andUrban Planning & Research Institute for Smart Cities, Shenzhen University, Shenzhen, China; bState KeyLaboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University,Wuhan, China; cKey Laboratory for Geo-Environmental Monitoring of Coastal Zone of the NationalAdministration of Surveying, Mapping and GeoInformation, Shenzhen University, Shenzhen, China;dDepartment of Geography, University of Tennessee, Knoxville, TN, USA; eDepartment of Geography, HongKong Baptist University, Hong Kong, China; fSenseable City Laboratory, SMART Centre, Singapore,Singapore

ABSTRACTUnderstanding urban functions and their relationships with humanactivities has great implications for smart and sustainable urban devel-opment. In this study, we present a novel approach to uncoveringurban functions by aggregating human activities inferred frommobilephone positioning and social media data. First, the homes and work-places (of travelers) are estimated frommobile phone positioning datato annotate the activities conducted at these locations. The remainingactivities (such as shopping, schooling, transportation, recreation andentertainment) are labeled using a hidden Markov model with socialknowledge learned from social media check-in data over a lengthyperiod. By aggregating identified human activities, hourly urban func-tions are inferred, and the diurnal dynamics of those functions arerevealed. An empirical analysis was conducted for the case ofShenzhen, China. The results indicate that the proposed approachcan capture citywide dynamics of both human activities and urbanfunctions. It also suggests that although many urban areas have beenofficially labeled with a single land-use type, they may provide differ-ent functions over time depending on the types and range of humanactivities. The study demonstrates that combining different data onhuman activities could yield an improved understanding of urbanfunctions, which would benefit short-term urban decision-makingand long-term urban policy making.

ARTICLE HISTORYReceived 2 January 2017Accepted 13 July 2017

KEYWORDSUrban function; humanactivity; mobile phoneposition data; social mediadata; data fusion

1. Introduction

Urban spaces, where citizens live, move and engage in different activities, are socialized anddynamic. Urban functions are defined as the recognized human uses of urban space, such asresidential areas, forests and commercial zones. These functions are important metrics forurban planning and management (Rodrigue et al. 2013). However, urban functions are

CONTACT Yang Yue [email protected]

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2017https://doi.org/10.1080/13658816.2017.1356464

© 2017 Informa UK Limited, trading as Taylor & Francis Group

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7

Page 3: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

recognized as macro-static and micro-dynamic (Ratti et al. 2006, Batty et al. 2012, Zhong et al.2014). From a long-term perspective, urban land is primarily designed to support one type ofhuman activity, e.g. working, shopping, recreation etc. (Liu et al. 2012, Crooks et al. 2015). In ashort-term view, urban land parcels may provide different functions hour by hour, as theyactually serve a variety of human activities through the day (Tu et al. 2016a). For example, EastAsian cities have mixed residential–commercial areas used for housing in the evening but forbusiness during the day. Understanding the detailed diurnal dynamics of these urban func-tions benefits many urban applications, such as managing traffic congestion, improvingpublic services and promoting smart urban planning (Ahas et al., 2015).

Numerous approaches have been developed to monitor urban land use over longtime periods. Field observations and interview questionnaires can produce land-usemaps, but they are costly and time consuming (Jiang et al. 2012). Remote sensing isan alternative methodology for capturing the physical characteristics of land use (Gongand Howarth 1990, Hu and Wang 2013). It relies on the image classification process.Although periodic satellite images, such as those from Landsat and SPOT, have beenused to monitor land-use change for some time, some challenges remain. One is the lackof social characteristics of urban land (Pei et al. 2014). Commercial zones, educationalareas and recreational locations cannot be easily classified from satellite images withoutadditional geographic information. Another challenge is presented by the diurnaldynamic of urban functions. Although many outcomes have been reached by inferringlong-term urban land use, it is necessary to investigate urban land’s function and diurnaldynamic to more deeply understand daily urban issues. The main challenge is to acquireurban function snapshots that have a high temporal resolution.

In the era of big data, massive human-tracking data that record individuals’ positionsand times are available (Yue et al. 2014, An et al. 2015, Li 2017). These include data frommobile phones (Cao et al. 2016, Yue et al. 2016, Cao et al. 2017), social media (Longley et al.2015), vehicle GPS (Tang et al. 2016, Tu et al. 2016b, Zhou et al. 2017) etc. These datacontain valuable knowledge about the human use of urban space and therefore providean alternative approach through which urban function can be inferred. For example,taking mobile phone data as a proxy of human activities, Pei et al. (2014) developed aclustering method to classify urban areas by residential, business, commercial, open spaceand other use. Using social media, trajectory and traffic data, Crooks et al. (2015) presenteda bottom-up approach to infer the functions of buildings, streets and neighborhoods.These advanced studies support the feasibility of uncovering urban function through (big)data and leave a gap between massive human-tracking data and urban diurnal dynamics.

Each type of human-tracking data has its own shortcomings. Mobile phone datasuffer from the lack of semantic information (Calabrese et al. 2014). Social media data aresparse in space and time (Huang and Wong 2016). Alone, neither of them is able toprovide high temporal resolution for urban functions. To accomplish this, multisourcedhuman-tracking data should be fused (Steiger et al. 2015). The aim of this article is toreveal urban functions and their diurnal dynamics by combining mobile phone position-ing data with social media check-in data. Unlike the indirect land-use inferenceapproach, this method follows a ‘data-activity-function’ stream to investigate temporallydetailed urban functions from the people who define urban spaces by their dailyactivities. Citywide human activities are first extracted by integrating mobile phonepositioning data with social media check-in data. Homes and workplaces are extracted

2 W. TU ET AL.

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7

Page 4: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

from mobile phone positioning data by simple spatial–temporal rules used to recognizein-home and working activities. Other activities (such as shopping, schooling, transpor-tation, recreation and entertainment) are annotated with an improved hidden Markovmodel (HMM) that uses social knowledge learned from long-term social media check-indata. By aggregating identified human activities, hourly urban functions are inferred andthe diurnal dynamics of those functions are revealed. An experiment in Shenzhen, China,suggests that many urban cells provide different functions at different hours in a daydespite their official designation under one type of land use.

The main contributions of this article lie in two aspects. First, a big data-drivenframework is developed for understanding urban dynamics, which bridges the gapbetween raw tracking data, human activities and the diurnal patterns of urban function.The article also sheds lights on the data fusion of multisourced human-tracking data forhuman and urban studies. Second, urban functions and their diurnal dynamics asdefined by human activities are observed for the first time. The results demonstratethat urban functions are not only spatially distributed but also change hour by hour. Thisdynamic knowledge about a city will benefit urban decision-making processes, such astraffic congestion management, smart urban planning and urban governance.

The remainder of this paper is organized as follows: the next section reviews the relatedliterature in the domain of mobile phone data analysis, social media data analysis and urbandynamics. Section 3 introduces the fundamental design, including the study area anddataset used. Section 4 describes the proposed data fusion approach. Section 5 reportsthe experiment and the analysis of the results. Finally, we conclude the paper in Section 6.

2. Literature review

This section reviews related studies on mobile phone data analysis, social media dataanalysis and urban dynamics.

2.1. Mobile phone data analysis

Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate and carry-on usage of mobile devices, mobilephone data are of great value to human and urban research domains (Chirag and Storpera2015), such as human mobility (Gonzalez et al. 2008, Song et al. 2010, Calabrese et al. 2013,Chen et al. 2016), mobility landscape (Ratti et al. 2006) and urban spatial structure (Louailet al. 2014).

One strand of research usingmobile phone data focuses on human activity (Xu et al. 2016).As mobile phones are always at hand, mobile phone data are recognized as good proxies ofpeople’s activities (Tranos and Nijkamp 2015). The position and time of implied humanactivities can be found through massive mobile phone data. Using call detail records(CDRs), Yuan et al. (2012) extrapolated travel activities and examined activity space inHarbin, China. Xu et al. (2016) identified anchor points (i.e. homes) from mobile phonepositioning data and then quantified home-based mobility patterns. These studies yieldinsights into important aspects of mobile phone data-driven human activities.

However, the absence of semantic content in mobile phone data hampers deeperanalysis. Information on types of activities cannot be directly obtained from mobile

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 3

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7

Page 5: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

phone data because of their low spatial resolution (usually more than 200 m).Understanding citywide human activities and the urban dynamics behind them facegreat challenges. Although many approaches have been developed for trajectorysemantic enrichment, i.e. Bayes activity inferences (Gong et al. 2016) and transportationsegmentation (Shin et al. 2015), labeling activity information using mobile phone data isnot easy. Recently, combining mobile phone data with urban land-use data, Widhalmet al. (2015) developed a probability approach for extracting daily activities (including in-home activities, working, shopping and leisure) from mobile phone data and analyzedcitywide activity patterns. Fusing mobile phone data with data from other sources isanother promising approach. Diao et al. (2016) integrated 4 months of mobile phonedata and household travel surveys to infer activity information to support a longitudinalinvestigation of individual activities in Boston. These studies indicate the promise ofmobile phone data for collecting activities citywide.

Another research stream focuses on urban land use. Recognizing the similarity ofcalling patterns in one type of land use, Pei et al. (2014) developed a semi-supervisedfuzzy c-means clustering method to classify urban land use and achieved a detectionrate of 58.03%. Using time series phone call records, Lenormand et al. (2015) presented afunctional network approach to automatically detect four types of land use (residential,business, logistics/industrial and nightlife). These studies rely on mobile phone data overa lengthy period to measure the similarity of urban space and thus produce a staticland-use map. However, a direct linkage between mobile phone data, human activityand urban function dynamics is required. Unlike Diao et al. (2016), in this paper, wefused mobile phone positioning data instead of calls, messaging and web-browsingevents with social media check-in data to infer citywide human activities. Social activityknowledge was transferred from social media check-in data to potential human activitiesto enrich semantic information regarding activities. Furthermore, the diurnal patterns ofurban functions portrayed by the obtained citywide human activities were explored.

2.2. Social media data analysis

Social media data are generated when people post, comment or check-in on socialnetworking websites, such as Twitter, Foursquare and Weibo (Steiger et al. 2015). Socialmedia data contain a great deal of semantic information, including texts, pictures, voices,check-in records etc. In particular, a large amount of social media data are geo-referencedusing location-awareness technologies (e.g. GPS, WiFi localization, Bluetooth localizationetc.) and therefore provide important opportunities to research society, demography andcities (Wu et al. 2014, Croitoru et al. 2015, Crooks et al. 2015, Longley et al. 2015, Tsou 2015,Zhou et al. 2015). However, biases exist in social media data. For example, the penetrationand usage of social media are different for various social media user groups. By using well-designed data filters, long-term social media data can reduce this disadvantage (Longleyet al. 2015). Social media data are still able to capture aggregated patterns of humanactivities, especially social activities (Huang and Wong 2016).

Social media data support studies on human social activity. As human behaviors havetheir own consistency (Gonzalez et al. 2008), meaningful places (e.g. office, shoppingplaces and tourism destinations) can be mined from long-term social media data.Combining individual Twitter data and land-use data, Huang et al. (2014) developed

4 W. TU ET AL.

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7

Page 6: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

spatial–temporal clustering methods to infer the locations most visited by individuals onthe city-block level and then labeled the daily human activities. Furthermore, based onthe home locations, Huang and Wong (2016) inferred the in-home and work activities ofsocial media users. They further linked socioeconomic status and Twitter user groups viaurban space by fusing American Community Survey data and Twitter data. Using the 15million check-in records collected over 1 year in Shanghai, China, Wu et al. (2014)extracted citywide transition activities (or travel demands) and investigated the law ofrelated human movements. These studies verified the valuable knowledge about humansocial activities behind massive social media data.

Social media data also have the potential to infer urban land use (Crooks et al. 2015).By considering the similarity of tweeting activities in urban regions, Frías-Martnez andFrías-Martnez (2014) developed an unsupervised learning method to automaticallydifferentiate land uses (including business, residential, night life, leisure, weekend andindustrial) in urban areas. Zhan et al. (2014) presented an unsupervised clusteringmethod and a supervised learning method to infer urban land use by utilizing a largeamount of social media check-in data. Zhou and Zhang (2016) mined Twitter andFoursquare data to extract six types of human activities and detected activity hotspotsand their citywide dynamics. These achievements reveal the hidden linkage betweenhuman activity patterns and underlying urban form (Crooks et al. 2015). They alsoprovide an alternative approach to capture static land use. However, dynamic patternsof urban functions should be further investigated. Furthermore, by combining socialmedia data with other source data, such as data from geographic information system(GIS), mobile phones and travel surveys, more detailed urban function information canbe mined from social media content, which can facilitate human and urban research(Steiger et al. 2015).

2.3. Urban dynamics

Urban dynamics refers to human movements and activities over space and time, asreflected in spatial interactions and changes in urban spatial structure through time(Batty 2009, 2010, Grinberger and Shoval 2015). Human activities play an important rolein the manifestation of urban dynamics. Recently, the proliferation of spatial–temporaldata through technology such as mobile phone data and social media data has openeda new horizon in understanding human movements and activities in space and time.

Spatial dynamics of human activity and land use in cities have been studied (Tranosand Nijkamp 2015). Using long-term CDRs, spatial variations in the intensity of collectivehuman activities have been observed (Sagl et al. 2014). Reades et al. (2009) proposed aneigen-decomposition method to identify recurring patterns of mobile phone usage andthen bridge the relationship between them with residential and business areas. Fromthe temporal view, MIT SENSable City Lab portrayed the urban mobility landscape usingmassive anonymous mobile phone data (Ratti et al. 2006). The intensity of humanactivities and their changes across space and time were explored. Zhong et al. (2014)used 3-year smart card data to reveal the yearly dynamic in the spatial structure of cityhubs, centers and borders in Singapore. These studies address the feasibility of thediscovery of urban function dynamics but fail to fill a diurnal pattern gap.

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 5

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7

Page 7: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

Recently, Ahas et al. (2015) detected spatial and temporal differences in everydayactivities in cities. Following the aggregated rhythm, social time (the time use difference)from human activities was delineated instead of the standard solar time. Their resultssuggest global temporal dynamics in people’s daily lives that cannot be ignored. Theyalso imply that urban lands have different functions at different hours. In this study, wemove forward to investigate urban functions and their diurnal dynamics via collectivehuman activities by coupling mobile phone data and social media data. This willenhance our understanding of human activities in the city and the associated urbanfunctions and diurnal patterns.

3. Study area and dataset

3.1. Study area

Shenzhen is China’s first special economic zone with a resident population of 12 millionand a mobile population of 4 million as of 2012. Its total area covers 1996 squarekilometers (Shenzhen Statistical Yearbook 2013). It has six administrative districts,shown in Figure 1. South Shenzhen, adjacent to Hong Kong, contains the downtown(Futian and Luhu) and high-technology zones (Nanshan). North Shenzhen, includingBaoan and northern Longgang, is a developing area with factories, lakes and farms. EastShenzhen, including Yantian and eastern Longgang, has an international port and anatural reserve area with country parks and beaches.

3.2. Dataset

A mobile phone positioning dataset is provided by a major Shenzhen cellular operatorthat has approximately 5689 cell towers in the city. The dataset contains the positions of9.2 million phone users (approximately 57.5% of the total population) for a workday inMarch 2012. The positions of mobile phone users were recorded at half-hour intervals;thus, there are 48 records for each person. Each record has four fields, including a user ID

Figure 1. Shenzhen and its six administrative districts.

6 W. TU ET AL.

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7

Page 8: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

(i), a time stamp (t), longitude (xit) and latitude (yit). Spatial resolution is restricted at thecell tower level, which is approximately 100–500 m. Table 1 shows an example of anindividual’s records. One row is a positioning record, and no semantic information canbe directly obtained. Figure 2(a) illustrates the distribution of records. Following Xu’sapproach (Xu et al. 2016), we divided the study area into a 500-m grid. Ultimately, 2498grid cells (624.5 km2) contain at least one cell tower, covering 73.3% of the built-up area(851 km2) (China City Statistical Yearbook 2013).

The social media dataset in this study includes check-in records reporting people’sactivities at points-of-intersects (POIs) with a time stamp (Table 2). It provides semanti-cally rich information about human activities in the city, as the POI suggests what peopledo in a particular place. The dataset was crawled from Weibo, the largest microblogservice provider in China with an open application programming interface (API), andcovered the year 2013. Similar to Longley et al. (2015), three data filters were applied toavoid shortcomings in the social media check-in data. (1) Users with more than 1000records in a year are not used. (2) Users with less than three records in a year areomitted. (3) Only two consecutive check-in records with an interval of 1–12 h were used.Finally, 5680,724 check-in records from approximately 520,000 Sina Weibo users werecollected and stored in a geodatabase for further processing. Figure 2(b) displays thedistribution of Sina Weibo check-in records. It should be noted that we aggregated thisdataset to the same grids used for mobile phone positioning data, which suggests that

Table 1. Examples of mobile phone positioning data (the position is marked for privacy).User ID Longitude Latitude Time (hh:mm:ss)

110103203112413 113.93* 22.52* 07:25:00110103203112413 113.88* 22.57* 07:55:00110103203112413 113.88* 22.57* 08:26:00110103203112413 113.88* 22.57* 08:56:00– – – –110103203112413 113.93* 22.52* 23:28:00

Figure 2. Distribution of human-tracking data. (a) Mobile phone positioning data and (b) Sina Weibocheck-in data.

Table 2. Examples of social media check-in data (the position is marked for privacy).User ID Longitude Latitude Time Check-in places

75500001 113.93* 22.52* 28/3/2013 09:30:00 Vanke city park7550042 113.78* 22.43* 15/4/2013 16:55:00 Tencent– – – – –7557746 113.73* 22.54* 22/11/2013 23:18:00 Queen club

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 7

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7

Page 9: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

the data have a similar spatial distribution; the density of mobile phone data and socialcheck-in records are generally higher in the southern part of Shenzhen.

In addition to the abovementioned data, additional GIS datasets, including locationsof cell towers, spatial data about water and forests etc., are also used in this study.

4. Methodology

The presented approach infers urban functions and their diurnal dynamics by fusingmassive mobile phone positioning data and social media check-in data. Following a‘data → activity→ function’ logic, citywide human activities are first extracted, and urbanfunctions are then discovered. Before the introduction of the detailed methodology,some useful definitions are given.

Definition 1: Activities denote the common activities associated with any person in aday (Zhong et al. 2014), such as housekeeping, working and other social actions to meethuman needs. To define an activity a, six tuples are necessary, including user ID (i), starttime (tsi), duration (tdi), longitude (xi), latitude (yi) and activity type (ai).

To filter out meaningless activities, the duration must be above a threshold, tdmin. Inthis study, we set 1 h because the time granularity of mobile phone positioning data is30 min. In other words, at least three sequential mobile phone records are needed toverify a possible activity in a place. This is reasonable because the durations of commonhuman activities (e.g. in-home, working, schooling etc.) are more than 1 h.

Definition 2: Social activities refer to nonhome and nonwork activities for humanneeds. In this research examining social media check-in data, five daily social activitiesare considered, i.e. S = {atransportation, aschooling, ashopping, arecreation, aentertainment}, where AS

denotes the set of social activities.

Here, we must note that transportation activity does not refer to travel within the citybut rather to air travel, railway travel and long-distance travel by bus. Recreation denotesleisure activities in outdoor places, such as parks and beaches. Entertainment refers toleisure in indoor spaces, e.g. game centers, bars etc. We label social activity from mobilephone positioning data with the knowledge discovered from social media check-in data.

Definition 3: Urban functions denote people’s common usage of urban space, whichinclude residential, working, commercial or educational functions. Generally, urbanfunctions are labeled the dominant human activities in the geographical space.

A data fusion framework is presented to discover urban functions by extractingcitywide human activities and labeling functions with them, as Figure 3 illustrates.Potential human activities are first detected from highly penetrated time-sequentialmobile phone positioning data. In-home and working activities are recognized usingtemporal rules. Then, by incorporating learned knowledge from long-term social mediacheck-in data, the remaining social activities are labeled with an improved HMM. Afterthat, a geodatabase containing citywide human activities is built. Finally, urban functions

8 W. TU ET AL.

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7

Page 10: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

are inferred from aggregated local human activities, and the diurnal dynamics areanalyzed using hourly urban functions.

4.1. Activity detection

Activity detection extracts potential activities from massive time-sequential mobilephone positioning data. Mobile phone records of an individual are first sorted by timeand connected as a spatial–temporal trajectory (Figure 4(a)). Then, if two consecutiverecords are at the same location, in other words, the person does not move, a potentialactivity is found, such as p1–p2, p6–p7, p9–p11 in Figure 4(b).

Spatial uncertainty exists in mobile phone positioning data because of the low spatialaccuracy of cell-tower-based location technology. Consecutive records will jump betweenadjacent cell towers (p3 in Figure 4(a–c)). Hence, potential activities could be fragmentedand shortened. To overcome this shortcoming, a threshold d is used to filter false moves: ifthe distance from the current point to the previous location is less than d, the move can beomitted and the current point can be merged into previous potential points of activity. AsFigure 4(c) shows, the record p3 can be merged into previous potential activity (p1–p2).After processing all mobile phone positioning data person by person, a citywide databaseof potential human activities without type information is constructed.

4.2. Recognition of home and work activities

In-home activities and working are two main daily behaviors of adult human beings(Schneider et al. 2013). The regularity of mobile phone positioning data indicatesinformation on homes and workplace (Xu et al. 2016). Considering the rhythms ofhuman beings, we label a person’s in-home and working activities using the followingrules.

Figure 3. The analytical framework of the presented approach.

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 9

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7

Page 11: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

● Home rule: For an individual, if the total duration in a place occupies more thanhalf of the early morning period [0:00–6:00], this place will be defined as home. Allactivities located at the home of this person are defined as in-home activities.

● Workplace rule: If the total duration in a place occupies more than half of the dailyworking period [9:00–12:00] and [14:00–17:00], this place will be defined as theworkplace. All activities located in the workplace of this person are defined asworking activities.

In total, we labeled the homes of 91.9% of mobile users and the workplaces of 64.8%of users and therefore annotated 75.7% of the potential activities.

4.3. Learning social activity knowledge from social media data

Social media check-in data contain the spatial–temporal characteristics of human activ-ities. Within the timeline of social media check-in data, the transition between activitiesis also implied. We mined the social media check-in data of 1 year to gain knowledgeabout social activities. First, Weibo check-in data are labeled as one type of activityaccording to their checked POIs in Table 3. Then, following the check-in sequence ofWeibo users, the probability of daily activity transition was generated. Figure 5 displaysthe distance and interval distribution of the consecutive potential human activitiesdescribed in Section 4.1 and those of the Sina Weibo check-in data. This figure indicatesthat both datasets show a similar decay pattern except at the 2-h interval. Therefore, it isreasonable to transfer social activities knowledge from Sina Weibo check-in data tomobile phone positioning data. Table 4 reports the daily activity transition matrix, whichseems skewed toward social activities. As 75.8% of potential activities were labeled in-home and working, this information is helpful for annotating the remaining activity.

The normal daily activity transition matrix has been refined in space and time to improveknowledge about social activity. By considering the spatial heterogeneity of human

Figure 4. Activity detection with mobile phone positioning data.

10 W. TU ET AL.

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7

Page 12: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

activities, we discovered the spatial-dependent activity transition matrix (Figure 6).According to the location of POI, the number of a type of activity a in a grid cell i, Ni

a, iscalculated. Following an individual’s timeline, the number of transitions between two types

of activities (activity a to activity b) and fromone place to another (i–j), Nijab, is also calculated.

By dividingNia by the total number of activities in grid i,

Pa2A

Nia, the generation probability of

one type of activity a at grid i, p(a|i), is calculated as Equation (1). By dividing Nijab by the total

number of activity transitions between grid i and j, Nijab, the transition probability between

activities, pðabjijÞ, is also calculated as Equation (2).

p ajið Þ ¼ NiaP

a2A Nia

(1)

Table 3. Labeling for social media check-in information types.Check-in POIs Activity type

Home, residential locations In-homeOffice building, government, company, industrial park, hospital, bank, post office WorkingShopping mall, supermarket, store ShoppingAirport, railway station, long-distance bus station, customs ports TransportationUniversity, primary school, high school, scientific research institution, library SchoolingPark, gym, beach, soccer field, zoo, museum, scenic spots RecreationChinese/Western restaurant, tea room, coffee shop, diner, nightclub, pub, bar, theater, beauty salon,Karaoke, bath massage, Internet bar, arcade, temple

Entertainment

POI: Points-of-intersects.

Figure 5. Distribution of potential human activities and consecutive Sina Weibo check-in data.

Table 4. Daily activity transition matrix from social media data.

Activity

Second activity

In-home Working Shopping Transportation Schooling Recreation Entertainment

First activity In-home 0.20 0.16 0.15 0.11 0.11 0.12 0.14Working 0.16 0.18 0.14 0.13 0.12 0.13 0.14Shopping 0.15 0.13 0.18 0.11 0.09 0.13 0.16Transportation 0.11 0.14 0.13 0.31 0.09 0.14 0.11Schooling 0.13 0.15 0.12 0.11 0.42 0.14 0.12Recreation 0.12 0.13 0.14 0.13 0.10 0.22 0.13Entertainment 0.14 0.12 0.15 0.11 0.08 0.12 0.21

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 11

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7

Page 13: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

p abjijð Þ ¼ NijabP

a2A;b2A Nijab

(2)

Considering the temporal rhythm of human activities, we divided the day into threeintervals T1 = [0:00–6:00), T2 = [6:00–17:00), T3 = [17:00–24:00). Then, the time-depen-dent activity transition probabilities were calculated as Equation (3), where p, q denotesan interval, and p is no later than q. Thus, a spatial–temporal activity transition matrixwas generated and used later for social activity labeling.

p abjijpqð Þ ¼ NijpqabP

a2A;b2A Nijpqab

(3)

4.4. Labeling social activities with the HMM

The HMM could recognize unobserved (hidden) states from observed states (Rabiner 1989,Eddy 2004). We used the HMM to infer information about the remaining activity types basedon prior knowledge of in-home and work activities (in Section 4.2) and the spatial–temporalactivity transition knowledge from social media check-in data (in Section 4.3).

In the HMM, each unobserved state is associated with a probability distribution.Transitions among the hidden states are fit with a set of transition probabilities. In a specificstate, an observation can be generated by the associated probability distribution. However,the hidden state is not directly observable to an external observer. Formally, there are fivecomponents in the HMM, HMM ¼ H;O;A; B; πf g.

● ¼ h1; h2; . . . ; hNð Þ is the set of hidden states and N is the number of hidden statesin the model. Here, we denote activities as hidden states.

● O ¼ o1; o2; . . . ; oMð Þ is the set of observations and M is the number of distinctobservation symbols per state. In this study, we denote the activity locations as theobservation O. Therefore, M is the number of distinct grids.

● A ¼ aij� �

is the state transition probability distribution, aij ¼ prfatþ1 ¼ Sjjat ¼ Sig,1 � i; j � N, where at refers to the state at time t.

Figure 6. Learning activity knowledge from social media data (a, b, c denote types of activities).

12 W. TU ET AL.

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7

Page 14: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

● B ¼ bi kð Þf g is the observation probability distribution in each of the states,bi kð Þ ¼ prfvk at tjqt ¼ Sjg, 1 � j � N, 1 � k � M.

● π ¼ πif g is the initial state distribution, πi ¼ pr q1 ¼ Sif g, 1 � i � N.

Therefore, for a series of observations, O ¼ o1; o2; . . . ; oTð Þ at time T, where eachobservation oi 2 T , 1 � i � T , the HMM finds the most likely hidden state sequenceQ ¼ q1; q2; . . . ; qTð Þ, where qi 2 H, 1 � i � T .

Figure 7 presents the adoption of the HMM for social activity labeling. The hiddenstates refer to the activity set. The observation refers to the activity location. The transitionbetween hidden states corresponds to the transition between consecutive activities as aperson moves in the city. The observation probability denotes the spatial distribution of atype of activity. The initial state is a person’s first activity in a day. When the first potentialactivity is detected as an in-home or working, the initial state is known. Otherwise, it israndomly selected from the activity set and determined by the following Viterbi algorithm.

The Viterbi algorithm (Viterbi 1967) is used to search the most likely activity sequenceA ¼ a1; a2; . . . ; aTð Þ for a given sequence of potential activity in Section 4.1. A Viterbivariable is defined as

γtþ1 jð Þ ¼ maxi

γt ið Þ � aij� �

� bj Otþ1ð Þ; 1 � t � T (4)

where γt ið Þ is the highest probability along a single potential activity sequence at state t,aij is the state transition probability from i to j, and bj Otþ1ð Þ is the observation prob-ability at state j. To obtain the most likely social activity, μtþ1 jð Þ is defined as

γtþ1 jð Þ ¼ argmax γt ið Þ � aij� �

; 1 � t � T (5)

Figure 7. Labeling social activities in the hidden Markov model.

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 13

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7

Page 15: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

It should be noted that the ordinary HMM must be improved for the variations in humanactivities. The probability of Equation (4) and (5) approaches the spatial–temporalactivity transition identified in Section 4.3 such that the temporal fluctuation effect ofhuman activity can be alleviated, and the obtained human activities are more reliable.

4.5. Inferring urban functions

Using citywide human activities, the functions of each urban cell can be inferred. First,the proportions of seven types of activities are calculated as variables to characterizeurban cells. The proportion of one activity pa in a cell is calculated as Equation (6), whereNa is the percentage of a type of activity a and A is the activity set. Then, urban cells areclassified with a hierarchical cluster analysis (Smith and Dubes 1980). Next, according tothe average proportions of the seven types of activities, we set the urban cells’ urbanfunctions, including the residential function (RF), industrial function (IF), commercialfunction (CF), educational function (EDF), transportation function (TF), recreational func-tion (REF), entertainment function (ENF) and mixed function (MF), with the dominantactivity. Using hourly human activity information, spatial–temporal high-resolutionurban functions are obtained, allowing their diurnal dynamics to be analyzed. Toinvestigate the change in the diversity of urban functions, their entropies are calculatedin Equation (7), where pi is the percentage of urban functions i and F is the function set.

pa ¼ NaPa2A Na

(6)

E ¼Xi2F

�pi log pi (7)

5. Results and analysis

Massive mobile phone positioning data and Sina Weibo check-in data in Section 3 werecombined by using a developed program with the Arc Engine 10.3 application program-ming interface. Urban functions were inferred with the help of the citywide humanactivities database. This section reports the results of human activities and the diurnaldynamics of urban functions.

5.1. Human activities

In total, 31,669,042 activities were contributed by 9.2 million persons in Shenzhen afterfusing mobile phone positioning data and Sina Weibo check-in data. On average, oneperson conducts 3.41 activities in a day. Table 5 reports the summary of humanactivities. In-home and working activities are the main daily human activities in thecity. There are 14,470,460 in-home activities and 10,459,657 working activities, whichaccount for approximately 45.7% and 33.0% of total activities, respectively. Socialactivities constitute the small remainder. The most frequent social activity is entertain-ment (2408,597, approximately 7.6% of daily activities), while the least frequent socialactivity is schooling (872,231, approximately 2.8% of daily activities).

14 W. TU ET AL.

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7

Page 16: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

The activity results were first validated using the available population data. A correla-tion analysis has been conducted for in-home activities at night and the population withinthe traffic analysis zone reported by the Shenzhen Yearbook 2013. The Pearson correlationcoefficient is 0.92, which indicates that the analyses are significantly correlated with aconfidence level p ¼ 0:01. Therefore, the count for in-home activities is credible.

To further validate the results, we compared the activity ratios with those derivedfrom the 2010 Shenzhen household travel survey. The comparison indicates these ratiosmatch well for transportation, shopping and recreation activities, with a gap less than1.0%. The gap for in-home, working, schooling and entertainment activities appearsslightly larger, which could be for following reasons. One is that the travel survey focusesless on the mobile population, which contributes less to schooling but more to working.The second reason is that the travel survey does not investigate entertainment activity.The third reason is that students in primary and middle school are forbidden to accessmobile phones. Because their schooling activities are not detected, the identified school-ing activity is lower than that obtained in the travel survey.

Citywide human activities have a significant temporal rhythm. Figure 8(a) displays thefluctuation of in-home and working activities. The number of in-home activities reachesits peak of 7.82 million in the early morning (3:00), falls to 2.53 million in the afternoon(15:00) and recovers to near 6.64 million at midnight (23:00). Working activities show areverse trend, beginning with only 0.54 million in the early morning, rising to a peak atnoon (4.47 million at 15:00) and then declining to less than 2.0 million at night.

Social activities also display a typical temporal wave. In total, 6738,923 social activities areconducted each day. As with working activities, social activities begin the lowest in the earlymorning (0.22 million at 4:00), reach their first peak in the morning (1.05 million at 10:00)and their second peak at night (1.85 million at 21:00), as Figure 7(a) shows. However, socialactivities display different rhythms. Figure 8(b) shows the fluctuation of all five types ofsocial activities, demonstrating that entertainment activities have two peaks, one at morn-ing (10:00) and the other at night (22:00). Transportation reaches its first maximum in themorning and its second peak value at night. The remaining four social activities have onlyone peak. Both shopping and recreation reach their peaks at night and fall in the earlymorning. Schooling is quite low in the early morning and increases in the day.

The composition of activities also varies with the time of day. Figure 8(c) displays thehourly activity percentages, suggesting that the proportions of in-home and workingactivities are larger than 75% at any time of day. Social activities account for approxi-mately 12% of the morning and afternoon. However, they account for 21% at 20:00 andreach their peak at 21:00. This fluctuation suggests different functions provided by urbanland at different times of day.

Table 5. Summary of daily activities in Shenzhen, China.

Activities In-home Working

Social activity

Transportation Schooling Shopping Recreation Entertainment

Number 14,470,460 10,459,657 1228,703 872,231 1428,106 801,288 2408,597Ratio 45.7% 33.0% 3.9% 2.8% 4.5% 2.5% 7.6%Ratio in Shenzhenhousehold travelsurvey 2010

42.85% 34.32% 2.96% 4.79% 4.13% 2.52% –

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 15

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7

Page 17: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

(a)

(b)

(c)

(a)

(b)

Figure 8. The fluctuation of daily human activities. (a) Human activity per hour, (b) social activity perhour and (c) the proportions of human activities throughout the day.

16 W. TU ET AL.

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7

Page 18: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

Figure 9 shows the spatial distribution of human activities. In-home activities arewidely distributed in the city (Figure 9(a)), appearing in many urban cells and spatiallyaggregating in many places, such as downtown and western and central Shenzhen.Unlike in-home activities, working activities occur with the highest density in the down-town area, which is the central business district with many tall buildings (Figure 9(b)).The density of working activities declines from south to north, although northernShenzhen is an industrial area with low-floor factories and farms. Five types of socialactivities are also scattered through the city but occur relatively less frequently in mostspatial cells, occurring at a high density in only few cells. For example, cells with high-density shopping activities are located where there are shopping centers (Figure 9(c)).

(d)(c)

(b)(a)

(e) (f)

(g)

Figure 9. Spatial distribution of human activities: (a) In-home, (b) working, (c) shopping, (d) school-ing, (e) transportation, (f) recreation and (g) entertainment.

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 17

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7

Page 19: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

Cells of high-density schooling activities have universities and technology institutes,among others (Figure 9(d)). This distribution suggests an even urban functiondistribution.

5.2. Urban function dynamics

5.2.1. Results for urban functionsUsing daily human activity information, 2498 urban cells are classified into 8 clustersusing the proportions of human activities, as described in Section 4.5. Ultimately, eighttypes of clusters are obtained. Table 6 describes the average proportions of humanactivities in each cluster. By comparing the global mean proportion of human activity,we classified urban cells into eight functions: RFs with 0.501 in-home activities; IFs with0.501 working activities; CFs with 0.251 shopping activities, TFs with 0.211 transportationactivities; EDFs with 0.176 schooling activities; REFs with 0.244 recreation activities; ENFwith 0.318 entertainment activities and MF with 0.211 in-home, 0.374 working and 0.415social activities. This indicates that most human activities are mixed in all types of urbanfunctional cells because of their fragmented use in this city.

Table 7 reports the summary of urban cells with functions and demonstrates that RFand IF are the main functions provided by this city. Many cells feature RF (1438 cells) andIF (409 cells), as in-home and working activities are the main activities in this city. Socialfunctions compose an important part of urban functions. A few cells feature TF (183), EF(152) or REF (112), and fewer than 100 cells feature CF (94) or ENF (78); the fewest cellsfeature MF (32). Although there are fewer schooling activities than shopping andentertainment activities, there are more EDF cells than CF and ENF cells because school-ing activities are distributed in universities and schools, which occupy large spaces.However, most commercial activities occur in stores along the roads in Shenzhen, whileentertainment activities occur at restaurants and clubs, which are usually small andtherefore can be found within residential or IFs.

Figure 10 displays the distribution of urban functions and illustrates that cells with RF arespatially adjacent, forming a few neighborhoods, such as the residential areas in Nanshan,Baoan and Longgang. IF cells are also grouped at the center of Nanshan, the center ofFutian, the west part of Baoan and the port of Hong Kong. The remaining six types of urbanfunctions are sparsely scattered throughout the city, surrounding RF and IF areas.

5.2.2. Comparison with land zoning mapInferred urban functions were compared with the official Shenzhen land zoning map(2010–2020) from the Shenzhen Urban Planning, Lands and Resource Commission toevaluate the obtained results. In total, 58% of urban cells have the same urban functionand designed land use, similar to the results of Pei et al. (2014) in Singapore. Table 8presents the confusion matrix between land use and daily urban functions. It illustratesthat RF, IF, CF and EDF present a consistent accuracy level of no less than 50%, while TF,REF, ENF and MF present a consistent accuracy level below 50%, which implies differ-ences between the land zoning map and real urban functions in Shenzhen, China. Thereare two possible reasons for these differences. One is that some urban cells havechanged their functions, following the human usage of urban cells. In fact, Shenzhenis the fastest developing city in China. The usage of land parcels changes greatly. For

18 W. TU ET AL.

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7

Page 20: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

example, many old factories in south Shenzhen have been changed to residentialbuildings or commercial parks as a result of urban renewal. The other reason fordifferences is because function is measured for each urban cell, as land parcels aresubstantially fragmented and used for different purposes. Aggregating human activitiesto make urban function inferences may generate errors.

5.2.3. Dynamics of urban functionsUsing the proportions of human activities per hour, high temporal resolution urbanfunctions were inferred. Figure 11 displays the change in urban functions provided by

Table 6. The mean proportion of human activities of different urban functions.Urban functions In-home Working Shopping Transportation Schooling Recreation Entertainment

Global mean 0.436 0.341 0.043 0.046 0.031 0.030 0.074RF 0.501 0.310 0.038 0.031 0.021 0.022 0.077IF 0.279 0.501 0.046 0.053 0.031 0.026 0.064CF 0.357 0.324 0.251 0.012 0.011 0.003 0.041TF 0.452 0.283 0.011 0.211 0.009 0.004 0.031EDF 0.464 0.296 0.014 0.006 0.176 0.018 0.026REF 0.379 0.297 0.006 0.024 0.012 0.244 0.038ENF 0.271 0.338 0.026 0.011 0.016 0.021 0.318MF 0.211 0.374 0.110 0.080 0.044 0.085 0.095

RF: Residential functions; IF: industrial functions; CF: commercial functions; TF: transportation functions; EDF: educationfunctions; ENF: entertainment functions; MF: mixed functions. The bold texts indicate the proportion more than theglobal mean.

Table 7. Characteristics of urban functions.Functions RF IF CF TF EDF REF ENF MF

Number of cells 1438 409 94 183 152 112 78 32Ratio 57.6% 16.4% 3.8% 7.3% 6.1% 4.5% 3.1% 1.3%

RF: Residential functions; IF: industrial functions; CF: commercial functions; TF: transportation functions; EDF: educationfunctions; ENF: entertainment functions; MF: mixed functions; REF: Recreational functions.

Figure 10. Distribution of urban functions in Shenzhen.

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 19

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7

Page 21: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

the city and shows that urban functions are quite different at different times of day.Following the rhythms of human activities, the RF begins at a high percentage (approxi-mately 75%), falls in the morning, increases at noon, falls again in the afternoon andfinally recovers at night. The IF shows an almost reverse trend. It begins with a lowpercentage (less than 10%), increases in the morning and declines at night. The remain-ing six types of functions also vary with human activity rhythms. The percentage of EDFis low at night but high during the day, reaching peaks at 10:00 and 17:00. The MF isalmost nonexistent at night but occupies approximately 2–5% of total cells during theday. However, REF is more frequent at night and in the afternoon but less so in themorning, which may be a result of the lower value of other social activities, as Figure 10shows.

Table 8. The comparison of urban functions with land use.

Functions

Official land-use map

RF (%) IF (%) CF (%) TF (%) EDF (%) REF (%) ENF (%) MF (%)

This study RF 58 15 10 4 5 3 3 1IF 17 61 5 5 5 3 4 1CF 14 19 50 2 7 2 4 1TF 4 21 12 46 7 2 7 1EDF 18 12 7 1 51 3 7 1REF 5 16 15 4 6 46 6 0ENF 9 24 13 4 6 3 41 0MF 25 9 13 6 0 6 3 38

RF: Residential functions; IF: industrial functions; CF: commercial functions; TF: transportation functions; EDF: educationfunctions; ENF: entertainment functions; MF: mixed functions; REF: Recreational functions. The bold value indicatesthe ratio of the same function.

Figure 11. The rhythm of urban functions in Shenzhen, China.

20 W. TU ET AL.

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7

Page 22: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

Taking four places as examples, we examined the reliability of the inferred humanactivities and the associated urban function dynamic. Figure 12(a) displays the result ofYingrenshi village, a residential community. It illustrates that this cell has 8000 in-homeactivities during the night while about 2000 in the day; therefore, its function isdominated by the RF in the whole day. Figure 12(b) reports the observed activities inand the hourly function of Shenzhen North high-speed railway station. This figureindicates that human activities increase from night to day in this place, but transporta-tion activity rises very sharply, arriving the peak 2452 at 10 h, which is in accordancewith the schedule of high-speed railway. This activity diurnal dynamic leads to atransformation from the RF to transportation. Figure 12(c) illustrates the activities andfunction dynamic at a cell near Shenzhen University. The schooling and the workingactivities rise significantly in this cell, which supports the highest school 2854 activities(per cell) during the day; these functions are labeled educational in this time period.Workplaces in the central business district are indicated in Figure 12(d), which displaysthe change in in-home, working, recreation and entertainment activities. It indicates thatthe working activities are no more than 200 in the night and arrive the peak 7952 at 14 hand reduce to the fewest 100 in the night. The associated function transforms from

Figure 12. The functions of four places: (a) a residential community, (b) Shenzhen North high-speedrailway station, (c) Shenzhen University and (d) residential–working mixed area.

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 21

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7

Page 23: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

residential to industrial and back to residential. In summary, both activity and functionresults in these places suggest the effectiveness of the proposed data fusion approach.

Figure 13 displays the entropy of urban functions, which illustrates that the entropy isquite low and that diversity is quite low at night, as the main provided function is forresidential purposes. In the daytime, the entropy increases as the diversity of urbanfunctions rises. The maximum entropy occurs in the 19:00 h, which indicates that the cityprovides the greatest variety of functions. Then, entropy begins to decline as people goback to their homes and the city returns to being a residential space.

To further investigate the diurnal dynamics of urban functions, the function transi-tions at the same cell in typical time periods are analyzed. Figure 14 shows the observedtransitions between hourly urban functions and suggests that many cells change fromRF to IF and MF in the period [7:00–10:00] as people leave their homes for work.Therefore, RFs provided by urban land are weakened, but both IFs and MFs arestrengthened.

In the afternoon period [15:00–18:00] when people are off duty and return home,many cells with IFs change to provide RFs (Figure 15). However, half of the cells stillrepresent IFs, as work continues until 20:00. The number of ENF cells increases becausemany entertainment venues, such as bars and clubs, open at that time. MF urban cellsalso increase as people conduct quite different activities after work.

With respect to the night period (22:00–1:00), many people have returned home.Accordingly, the number of cells with non-RFs (including IFs, TFs, recreation functionsand ENFs) declines, but the number of cells with RFs significantly increases (Figure 16).Notably, there are still a few cells with IFs, as some factories operate on three shifts for 24 h.

6. Discussion and conclusion

The city is a complex system of human activities and their interactions with space and itsassociated dynamics. Understanding urban functions and their dynamics is essential forthe planning, managing and governing of a city. Understanding how urban functions

Figure 13. The entropy of urban functions in Shenzhen, China.

22 W. TU ET AL.

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7

Page 24: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

vary across space and time is very challenging. In the era of big data, massive human-tracking data are available (e.g. mobile phone data, GPS data, social media data etc.) andopen a new horizon for urban function inference. However, a single type of human-tracking data has its own drawbacks. For example, mobile phone data lack semanticinformation, while social media data are sparse in space and time. Multisourced human-tracking data should be fused to discover urban function dynamics.

Figure 14. The change in urban functions from 7:00 to 10:00.

Figure 15. The change in urban functions from 15:00 to 18:00.

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 23

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7

Page 25: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

Following a ‘data-activity-function’ philosophy, this article develops a data fusionapproach to uncover urban functions and their diurnal dynamics by coupling mobilephone positioning data and social media data. Citywide detailed spatial–temporalhuman activities are first recovered from multisourced spatial–temporal data. In-homeand working activities are identified from massive mobile phone positioning data. Theremaining five social activities are annotated with their type information (such as travel,schooling, shopping, recreation and entertainment) by the HMM inference with theknowledge learned from social media check-in data. Then, the percentage of humanactivities in each urban cell is calculated. By clustering spatial cells with similar composi-tions of human activities, urban functions are recognized. Their hourly dynamics areanalyzed with the obtained spatial–temporal high-resolution human activity data.

The experiment was conducted in Shenzhen, China. The obtained results demon-strate that both human activities and urban functions change in spatial and temporaldimensions. Urban cells with residential or IFs are clustered in the city. However, urbancells with essential social functions, such as commercial, educational and ENFs, aresparsely scattered throughout the city. Compared to the land-zoning map, 58% ofurban cells have the same inferred urban functions, which implies that there aredifferences between land-use planning and real urban functions in Shenzhen, China.

The results also reveal that urban functions change hour by hour. Following the cycleof human activities, the functions provided by urban cells undergo significant changesamong residential, industrial and social functions, and this study presents the firstdescription of this typical diurnal dynamic, providing an image of the transitions ofhuman activities and the associated urban functions, which suggests that althoughmany urban cells have officially been labeled with one type of land use, they mayprovide different functions at different times of day as they host different humanactivities.

Figure 16. The change in urban functions from 22:00 to 1:00.

24 W. TU ET AL.

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7

Page 26: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

The proposed study develops an alternative approach to investigating urbanfunctions and massive human-tracking data via human activities. The obtainedresults bridge the gap between big data, human activity analysis and urbandynamics. They also offer a deeper understanding of urban land, which will benefitshort-term urban decision-making, e.g. traffic management and emergency response,among others.

This study has several limitations. First, due to constraints in data availability,mobile phone positioning data in this study were used for only one working day.As many studies have investigated the consistency of human activities based onmobile phone data (Sevtsuk and Ratti 2010, Song et al. 2010), we believe that a 1-day mobile phone dataset is feasible to uncover urban function dynamics. The resultscan be further improved by analyzing mobile phone positioning data over a longerperiod. Second, due to the sparsity of social media data, a few differences betweenhuman activities from mobile phone data and social media data were observed (asFigure 5). Additionally, social media check-in data and mobile phone positioning datawere collected in 2014 and 1 day in 2012, respectively. Therefore, biases are inevi-tably imported into the learned social activity characteristics. In the further, multi-sourced big geographical data, such as POIs, geo-tagged photos and travel surveydata, should be integrated to enrich the knowledge obtained from social mediacheck-in data. The reliability of human activities and, subsequently, urban functionsmay be improved. Third, as a pioneering work regarding citywide human activities,the reliability of the results was not directly examined. Alternative approaches toharvesting high-resolution spatial and temporal data on citywide human activities areexpected.

Acknowledgments

We would like to thank three anonymous reviewers for their valuable comments on this article.

Disclosure statement

No potential conflict of interest was reported by the authors.

Funding

This work was supported by the National Natural Science Foundation of China: [Grant Numbers41401444, 41671387 and 91546106]; Shenzhen Scientific Research and Development FundingProgram: [Grant Number CXZZ20150504141623042]; Nature Science Foundation of ShenzhenUniversity: [Grant Number 2016065] andthe Open Fund of Key Laboratory of Urban LandResources Monitoring and Simulation, Ministry of Land and Resources: [Grant Numbers KF-2016-02-009 and KF-2016-02-010].

ORCID

Wei Tu http://orcid.org/0000-0002-0255-4037Meng Zhou http://orcid.org/0000-0001-5172-9967

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 25

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7

Page 27: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

References

Ahas, R., et al., 2015. Everyday space–time geographies: using mobile phone-based sensor data tomonitor urban activity in Harbin, Paris, and Tallinn. International Journal of GeographicalInformation Science, 29 (11), 2017–2039. doi:10.1080/13658816.2015.1063151

An, L., et al., 2015. Space–time analysis: concepts, quantitative methods, and future directions. Annalsof the Association of American Geographers, 105 (5), 891–914. doi:10.1080/00045608.2015.1064510

Batty, M., 2010. The pulse of the city. Environment and Planning B: Planning and Design, 37 (4), 575–577. doi:10.1068/b3704ed

Batty, M., et al., 2012. Smart cities of the future. The European Physical Journal Special Topics, 214,481–518. doi:10.1140/epjst/e2012-01703-3

Batty, M., 2009. Urban modeling. In: R. Kitchin and N. Thrift, eds. International encyclopedia ofhuman geography. Oxford: Elsevier.

Calabrese, F., et al., 2013. Understanding individual mobility patterns from urban sensing data: amobile phone trace example. Transportation Research Part C: Emerging Technologies, 26, 301–313. doi:10.1016/j.trc.2012.09.009

Calabrese, F., Ferrari, L., and Blondel, V.D., 2014. Urban sensing using mobile phone network data:a survey of research. ACM Computing Surveys, 47 (2), 1–20. doi:10.1145/2658850

Cao, J., et al., 2017. Spatio-temporal analysis of aggregated human activities based on massivemobile phone tracking data. Journal of Geo-Information Science, 19 (4), 467–474.

Cao, R., et al., 2016. Comparison of urban human movements inferring from multi-source spatial-temporal data. Intentional archive of photogrammetry and remote sensing, and spatial informa-tion science, XLI-B2, 471–476.

Chen, C., et al., 2016. The promises of big data and small data for travel behavior (aka humanmobility) analysis. Transportation Research Part C: Emerging Technologies, 68, 285–299.doi:10.1016/j.trc.2016.04.005

China City Statistical Yearbook, 2013. Beijing: China statistical press.Chirag, R. and Storpera, M., 2015. The digital skin of cities urban theory and research in the age of

the sensored and metered city, ubiquitous computing and big data. Cambridge Journal ofRegions, Economy and Society, 25 (1), 27–42.

Croitoru, A., et al., 2015. Linking cyber and physical spaces through community detection andclustering in social media feeds. Computers Environment and Urban Systems, 53, 47–64.doi:10.1016/j.compenvurbsys.2014.11.002

Crooks, A., et al., 2015. Crowdsourcing urban form and function. International Journal ofGeographical Information Science, 29 (5), 720–741. doi:10.1080/13658816.2014.977905

Diao, M., et al., 2016. Inferring individual daily activities from mobile phone traces: a Bostonexample. Environment and Planning B: Planning and Design, 43 (5), 920–940. doi:10.1177/0265813515600896

Eddy, S.R., 2004. What is a hidden Markov model? Nature Biotechnology, 22 (10), 1315–1316.doi:10.1038/nbt1004-1315

Frías-Martnez, V. and Frías-Martnez, E., 2014. Spectral clustering for sensing urban land use usingTwitter activity. Engineering Application of Artificial Intelligence, 35, 237–245. doi:10.1016/j.engappai.2014.06.019

Gong, L., et al., 2016. Inferring trip purposes and uncovering travel patterns from taxi trajectorydata. Cartography and Geographic Information Science, 43 (2), 103–114. doi:10.1080/15230406.2015.1014424

Gong, P. and Howarth, P., 1990. The use of structural information for improving land-coverclassification accuracies at the rural-urban fringe. Photogrammetric Engineering and RemoteSensing, 56 (1), 67–73.

Gonzalez, M.C., Hidalgo, C.A., and Barabasi, A.L., 2008. Understanding individual human mobilitypatterns. Nature, 453 (7196), 779–782. doi:10.1038/nature06958

Grinberger, A.Y. and Shoval, N., 2015. A temporal-contextual analysis of urban dynamics usinglocation-based data. International Journal of Geographical Information Science, 29 (11), 1969–1987. doi:10.1080/13658816.2015.1049951

26 W. TU ET AL.

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7

Page 28: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

Hu, S.G. and Wang, L., 2013. Automated urban land-use classification with remote sensing.International Journal of Remote Sensing, 34 (3), 790–803. doi:10.1080/01431161.2012.714510

Huang, Q., Cao, G., and Wang, C., 2014. From where do tweets originate?: a GIS approach for userlocation inference. ACM SIGSpatial International Workshop on Location-Based Social Networks,Nov 2014, Dallas, Texas, USA. New York: ACM, 1–8.

Huang, Q. and Wong, D.W.S., 2016. Activity patterns, socioeconomic status and urban spatialstructure: what can social media data tell us? International Journal of Geographical InformationScience, 30 (9), 1873–1898. doi:10.1080/13658816.2016.1145225

Jiang, S., Ferreira, J., and Gonzalez, M.C., 2012. Clustering daily patterns of human activities in thecity. Data Mining and Knowledge Discovery, 25 (3), 478–510. doi:10.1007/s10618-012-0264-z

Lenormand, M., et al., 2015. Comparing and modelling land use organization in cities. Royal SocietyOpen Science, 2 (12), 150449. doi:10.1098/rsos.150449

Li, Q., 2017. From geomatics to urban informatics. Geomatics and Information Science of WuhanUniversity, 42 (1), 1–6.

Liu, Y., et al., 2012. Urban land uses and traffic ‘source-sink areas’: evidence fromGPS-enabled taxi datain Shanghai. Landscape and Urban Planning, 106 (1), 73–87. doi:10.1016/j.landurbplan.2012.02.012

Longley, P.A., Adnan, M., and Lansley, G., 2015. The geotemporal demographics of twitter usage.Environment and Planning A, 47 (2), 465–484. doi:10.1068/a130122p

Louail, T., et al., 2014. From mobile phone data to the spatial structure of cities. Scientific Reports, 4,5276. doi:10.1038/srep05276

Pei, T., et al., 2014. A new insight into land use classification based on aggregated mobile phonedata. International Journal of Geographical Information Science, 28 (9), 1988–2007. doi:10.1080/13658816.2014.913794

Rabiner, L.R., 1989. A tutorial on hidden Markov models and selected applications in speechrecognition. Proceedings of the IEEE, 77 (2), 257–286. doi:10.1109/5.18626

Ratti, C., et al., 2006. Mobile landscapes: using location data from cell phones for urban analysis.Environment and Planning B: Planning and Design, 33 (5), 727–748. doi:10.1068/b32047

Reades, J., Calabrese, F., and Ratti, C., 2009. Eigenplaces: analysing cities using the space-timestructure of the mobile phone network. Environment and Planning B: Planning and Design, 36(5), 824–836. doi:10.1068/b34133t

Rodrigue, J.-P., Comtois, C., and Slack, B., 2013. The geography of transport systems. London:Routledge.

Sagl, G., Delmelle, E., and Delmelle, E., 2014. Mapping collective human activity in an urbanenvironment based on mobile phone data. Cartography and Geographic Information Science,41 (3), 272–285. doi:10.1080/15230406.2014.888958

Schneider, C.M., et al., 2013. Unravelling daily human mobility motifs. Journal of the Royal SocietyInterface, 10 (84), 20130246. doi:10.1098/rsif.2013.0246

Sevtsuk, A. and Ratti, C., 2010. Does urban mobility have a daily routine? Learning from theaggregate data of mobile networks. Journal of Urban Technology, 17 (1), 41–60. doi:10.1080/10630731003597322

Shenzhen Statistical Yearbook, 2013. Shenzhen Statistical Yearbook. Available from: http://www.sztj.gov.cn/nj2013/start%202013.htm. [Accessed 13 October 2016].

Shin, D., et al., 2015. Urban sensing: using smartphones for transportation mode classification.Computers, Environment and Urban Systems, 53, 76–86. doi:10.1016/j.compenvurbsys.2014.07.011

Smith, S.P. and Dubes, R., 1980. Stability of a hierarchical clustering. Pattern Recognition, 12 (3),177–187. doi:10.1016/0031-3203(80)90042-4

Song, C., et al., 2010. Limits of predictability in human mobility. Science, 327 (5968), 1018–1021.doi:10.1126/science.1177170

Steiger, E., De Albuquerque, J.P., and Zipf, A., 2015. An advanced systematic literature review onspatiotemporal analyses of twitter data. Transactions in GIS, 19 (6), 809–834. doi:10.1111/tgis.12132

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 27

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7

Page 29: Coupling mobile phone and social media data: a new ... · Mobile phone data are generated when phones connect to mobile communication net-works. Because of the high penetration rate

Tang, L.L., et al., 2016. A network Kernel density estimation for linear features in space–timeanalysis of big trace data. International Journal of Geographical Information Science, 30 (9), 1717–1737. doi:10.1080/13658816.2015.1119279

Tranos, E. and Nijkamp, P., 2015. Mobile phone usage in complex urban systems: a space–time,aggregated human activity study. Journal of Geographical Systems, 17 (2), 157–185. doi:10.1007/s10109-015-0211-9

Tsou, M.-H., 2015. Research challenges and opportunities in mapping social media and Big Data.Cartography and Geographic Information Science, 42, 70–74. doi:10.1080/15230406.2015.1059251

Tu, W., et al., 2016a. Optimizing the locations of electric taxi charging stations: a spatial–temporaldemand coverage approach. Transportation Research Part C: Emerging Technologies, 65, 172–189. doi:10.1016/j.trc.2015.10.004

Tu, W., et al., 2016b. Spatial-temporal variation of public transit ridership using smart card data andmobile phone data. Association of American Geographers Annual Meeting 2016, April 2016, SanFrancisco, USA. Washington: AAG.

Viterbi, A., 1967. Error bounds for convolutional codes and an asymptotically optimum decodingalgorithm. IEEE Transactions on Information Theory, 13 (2), 260–269. doi:10.1109/TIT.1967.1054010

Widhalm, P., et al., 2015. Discovering urban activity patterns in cell phone data. Transportation, 42(4), 597–623. doi:10.1007/s11116-015-9598-x

Wu, L., et al., 2014. Intra-urban human mobility and activity transition: evidence from social mediacheck-in data. PLoS ONE, 9 (5), e97010–e97010. doi:10.1371/journal.pone.0097010

Xu, Y., et al., 2016. Another tale of two cities: understanding human activity space using actively trackedcellphone location data. Annals of the American Association of Geographers, 106 (2), 489–502.

Yuan, Y., Raubal, M., and Liu, Y., 2012. Correlating mobile phone usage and travel behavior – a casestudy of Harbin, China. Computers, Environment and Urban Systems, 36 (2), 118–130.doi:10.1016/j.compenvurbsys.2011.07.003

Yue, Y., et al., 2014. Zooming into individuals to understand the collective: a review of trajectory-based travel behaviour studies. Travel Behaviour and Society, 1 (2), 719–723. doi:10.1016/j.tbs.2013.12.002

Yue, Y., et al., 2016. Measurements of POI-based mixed use and their relationships with neighbour-hood vibrancy. International Journal of Geographical Information Science, doi:10.1080/13658816.2016.1220561

Zhan, X., Ukkusuri, S.V., and Zhu, F., 2014. Inferring urban land use using large-scale social media check-in data. Networks and Spatial Economics, 14 (3–4), 647–667. doi:10.1007/s11067-014-9264-4

Zhong, C., et al., 2014. Detecting the dynamics of urban structure through spatial network analysis.International Journal of Geographical Information Science, 28 (11), 2178–2199. doi:10.1080/13658816.2014.914521

Zhou, M., et al., 2015. Inferring individual physical locations with social friendships. In: Proceedingsof geoinfomatics 2015. Wuhan, China: IEEE, 1–6.

Zhou, M., et al., 2017. Impacts of weather on public transport ridership: results from mining datafrom different sources. Transportation Research Part C: Emerging Technologies, 75, 17–29.doi:10.1016/j.trc.2016.12.001

Zhou, X.L. and Zhang, L., 2016. Crowdsourcing functions of the living city from Twitter andFoursquare data. Cartography and Geographic Information Science, 43 (5), 393–404.doi:10.1080/15230406.2015.1128852

28 W. TU ET AL.

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 20:

01 0

5 A

ugus

t 201

7