Analysis of Automated Fare Collection Data from Montevideo, Uruguay for Planning Purposes
by
Catalina Parada Hernandez
A thesis submitted in conformity with the requirements for the degree of Master of Applied Science
Department of Civil & Mineral Engineering University of Toronto
© Copyright by Catalina Parada Hernandez 2018
ii
Analysis of Automated Fare Collection Data from Montevideo, Uruguay for Planning Purposes
Catalina Parada Hernandez
Master of Applied Science
Department of Civil & Mineral Engineering
University of Toronto
2018
Abstract
Automated Fare Collection (AFC) and smartcard systems have been rapidly adopted by transit
systems all over the world. The datasets produced by these systems are extensive and can be
analyzed for planning purposes and system evaluation. This thesis analyzes the AFC data from
the bus transit system in Montevideo, Uruguay and proposes methods to reconstruct itineraries,
identify the alighting locations of transactions, and understand travel behaviour of smartcard
users. The methods used successfully build itineraries for 97% of bus runs, identify 87.7% of
alighting locations, and recreate 67.5% of complete trip chains of smartcard users.
The complete trip chain data is then used to pair smartcards with individuals from the
Montevideo travel survey. As the trip chain and travel survey datasets have different parameters,
spatial and temporal windows were used to enable matching. Only 10 to 15% of individuals from
the survey could be paired with smartcards. These findings have important implications for
incorporating AFC datasets in transportation planning and evaluating transit systems.
iii
Acknowledgements
This thesis is the result of many hours of work, failures followed by successes, and unexpected
challenges. While the scope of the work and the obstacles changed, the people supporting me did
not. One wise woman once said, “behind every successful woman there is a tribe” and I could
not relate more.
The closest members of my tribe are, of course, my family. I would like to thank my parents,
Yenny and Ricardo, and my brothers, German and Sergio, for always being a text or call away
and willing to offer advice or simply listen to me talk about anything from research to social
injustice. You have supported me in so many different ways, and I would not be here or be the
person I am without you, literally.
To my supervisor and mentor Eric J. Miller, my sincere gratitude for trusting me with this pilot
project in Montevideo. You have become a mentor, providing me with advice in several facets of
life, and with unending support to my sometimes-unusual ideas and approaches.
I would also like to thank the funding and data providers, and some collaborators of the project –
Corporación Andina de Fomento (CAF), Diego Hernandez, Antonio Mauttone, and Verónica
Orellano – for their invaluable support in accessing and interpreting the data.
I am also very grateful to the newer members of my tribe, my friends and colleagues, for their
unofficial guidance and advice, and for creating such a welcoming and friendly environment.
Special thanks to my partner Donelle and closest friends Laura, Chris, Alex, Brittany, and
Gozde. Donelle, for knowing me better than anyone else and thus providing me with food
incentives, strategies, and support to always do my best. Laura and Alex, for all the picnics,
weekend adventures, and workouts; Chris, for showing me where to get past exams and free
pizza, and for helping me navigate through the University’s complicated structure. Gozde, for
being my Turkish sister, and Brittany, for organizing the tennis and volleyball games.
Finally, the feline support of rescue cats Ariel and Mia for keeping me company during the
writing process and reminding me of the important things in life, such as petting them.
iv
Table of Contents
Table of Contents ........................................................................................................................... iv
List of Tables ................................................................................................................................. vi
List of Figures .............................................................................................................................. viii
List of Abbreviations .......................................................................................................................x
Introduction .....................................................................................................................1
1.1 Study Objective ....................................................................................................................1
1.2 Study Motivation .................................................................................................................1
1.3 Thesis Structure ...................................................................................................................2
Literature Review ............................................................................................................4
2.1 Analysis of Smartcard Data for Estimating Trip Destinations ............................................4
2.2 Analysis of Smartcard Data for Transit Operations and Performance Measures ................7
2.3 Transit User Regularity ........................................................................................................8
2.4 Integration of Smartcard Data with Travel Surveys ............................................................9
Data ...............................................................................................................................11
3.1 Data Description ................................................................................................................12
3.1.1 Boarding Records......................................................................................................12
3.1.2 Bus lines and branches ..............................................................................................14
3.1.3 MHMS ......................................................................................................................15
3.2 Data Analysis for Monday, August 15 ..............................................................................17
Data Preparation ............................................................................................................21
4.1 Preliminary query (Query #1) ............................................................................................21
4.2 Method for invalid bus runs ...............................................................................................22
4.3 Query for OD estimation and incorporation strategies (Query 2) .....................................24
4.4 Data cleaning MHMS ........................................................................................................26
Acknowledgements .....................................................................................................,.................. iii
v
Building Itineraries from Boarding Transactions ..........................................................27
5.1 Method ...............................................................................................................................27
5.2 Results ................................................................................................................................31
Origin and Destination Estimation ................................................................................38
6.1 Method ...............................................................................................................................38
6.1.1 Incorporation of smartcard transactions without alighting location identified and
single riders ............................................................................................................41
6.1.2 Incorporation of no-card users ..................................................................................42
6.2 Results ................................................................................................................................43
6.2.1 Analysis for smartcard users .....................................................................................43
6.2.2 Analysis for no-card users ........................................................................................48
6.2.3 Spatial analysis of travel behaviour ..........................................................................52
Analysis of Travel Survey Riders and Smartcard Users ...............................................63
7.1 Method ...............................................................................................................................63
7.2 Results ................................................................................................................................64
7.2.1 Comparison of MHMS with smartcard data .............................................................64
7.2.2 Pairing MHMS individuals with STM cards ............................................................67
Discussion and Conclusions ..........................................................................................70
Limitations and Future Work ........................................................................................73
References ......................................................................................................................................75
Appendices .....................................................................................................................................80
Appendix A - STM Card Types ................................................................................................80
Appendix B - Results for all days .............................................................................................81
Appendix C - Details of algorithm ............................................................................................86
vi
List of Tables
Table 3-1 Boarding records and descriptive statistics .................................................................. 13
Table 3-2 Boardings per STM card type for Monday, August 15th ............................................. 14
Table 3-3 Bus branches and bus UIDs .......................................................................................... 15
Table 3-4 Trips and legs of trips in the MHMS ............................................................................ 16
Table 4-1 Query criteria for smartcard data .................................................................................. 21
Table 4-2 Smartcard data - Query #1 ............................................................................................ 22
Table 4-3 Bun run classification ................................................................................................... 23
Table 4-4 Query criteria for OD estimation .................................................................................. 25
Table 4-5 Smartcard data - Query #2 ............................................................................................ 25
Table 4-6 No-card data - Query #2 ............................................................................................... 26
Table 4-7 Query criteria for MHMS ............................................................................................. 26
Table 4-8 MHMS - Query results ................................................................................................. 26
Table 5-1 Sample itinerary built from passenger transactions ...................................................... 29
Table 5-2 Itinerary building strategies for special cases ............................................................... 30
Table 5-3 Passenger service time ranges ...................................................................................... 34
Table 5-4 Characterization of stops with service time over 10 seconds ....................................... 34
Table 6-1 Criteria for OD incorporation of Smartcard users ........................................................ 42
Table 6-2 OD estimation results ................................................................................................... 44
Table 6-3 Alighting estimation rate based on time period ............................................................ 45
vii
Table 6-4 Alighting estimation rate based on bus run .................................................................. 45
Table 6-5 Trip chains based on STM card type ............................................................................ 46
Table 6-6 Assignment of alighting location to transactions with missing alighting location ....... 47
Table 6-7 Assignment of alighting location to single riders ......................................................... 47
Table 7-1. Comparison of legs and trips for MHMS and STM data ............................................ 65
Table 7-2. Strategies to compute boarding and alighting times.................................................... 67
Table 7-3 MHMS identification of individuals for different temporal windows ......................... 68
Table 7-4 Weekly analysis of MHMS and STM card pairs.......................................................... 69
viii
List of Figures
Figure 1-1 Organization of the thesis .............................................................................................. 3
Figure 2-1.Trips destinations and legs of trips................................................................................ 5
Figure 3-1 Census segments served by STM ................................................................................ 11
Figure 3-2 Transit trip data collected in MHMS (Montevideo et al., 2016) ................................. 16
Figure 3-3 Temporal distribution of STM card transactions ........................................................ 17
Figure 3-4 Temporal distribution of transactions without card .................................................... 18
Figure 3-5 Transactions for STM (left) and no-card (right) per time period ................................ 19
Figure 3-6 Transactions for STM cards ........................................................................................ 19
Figure 3-7 Transfers per STM card .............................................................................................. 20
Figure 4-1 Valid bus run assigned to invalid run .......................................................................... 23
Figure 4-2 Frequency of validation of invalid bus runs ................................................................ 24
Figure 5-1 Temporal distribution of dwell times .......................................................................... 32
Figure 5-2 Dwell time (minutes) vs. Passenger boardings ........................................................... 33
Figure 5-3 Stops with high dwell time .......................................................................................... 35
Figure 5-4 Example of itinerary .................................................................................................... 37
Figure 6-1 Schematic example of transactions for a smartcard user ............................................ 39
Figure 6-2 STM cards with similar transactions on the other weekdays for transactions with
unknown alighting location (left) and single riders (right) ........................................................... 48
Figure 6-3 Alighting location assignment to no-card passengers ................................................. 50
Figure 6-4 Bus loading profile for all passengers ......................................................................... 51
ix
Figure 6-5 AM Trip origins .......................................................................................................... 53
Figure 6-6 AM Trip destinations .................................................................................................. 54
Figure 6-7 AM boardings for no-card users ................................................................................. 55
Figure 6-8 AM alightings for no-card users ................................................................................. 56
Figure 6-9 PM Trip origins ........................................................................................................... 57
Figure 6-10 PM Trip destinations ................................................................................................. 58
Figure 6-11 AM Transfers ............................................................................................................ 59
Figure 6-12 PM Transfers ............................................................................................................. 60
Figure 7-1 Histograms of bus trips. Queried data (left) and using all MHMS data retrieved from
CAF et al. (2017) (right) ............................................................................................................... 65
Figure 7-2 Histogram of legs and trips for MHMS and STM data ............................................... 66
Figure 7-3 Comparison of trip frequency for MHMS and STM data ........................................... 67
x
List of Abbreviations
AFC – Automated Fare Collection
AMMON – Metropolitan Area of Montevideo (Translated from Spanish: Area Metropolitana de
Montevideo)
MHMS – Montevideo Household Mobility Survey
OD – Origin and Destination
STM – Metropolitan System of Transportation (Translated from Spanish: Sistema de Transporte
Metropolitano)
1
Introduction
1.1 Study Objective
The objective of this study is to analyze the capabilities of smartcard data in transit systems for
planning purposes and system operation metrics. These capabilities are explored from a
methodological perspective for the specific context of Montevideo, Uruguay, using data that has
not been analyzed for planning purposes before. There are three main objectives of this study:
1. Estimate the alightings location of boarding transactions of smartcard and non-smartcard
passengers.
2. Build itineraries and evaluate operations metrics at bus stops.
3. Analyze and compare the smartcard data with the transit riders from the 2016
Montevideo Home Mobility Survey (MHMS).
1.2 Study Motivation
Automated Fare Collection (AFC) and smartcard systems have been rapidly adopted by transit
systems all over the world. These systems benefit the transit operators and the passengers alike.
Even though these systems were built to facilitate fare collection (Trepanier, Tranchant, &
Chapleau, 2007) and for user convenience, the data stored can be used for multiple purposes.
AFC systems passively and continuously collect details of transit transactions, creating large
datasets of transit trips. The smartcard data is of particular interest as the transactions for each
card can be identified throughout days and longer periods of time. Moreover, the smartcard data
can be useful to transportation planners due to the large sample size of transit riders that use
smartcards (Hickman, 2017). This data has a variety of uses, including serving short and long-
term planning strategies, and complementing transit system operation, development, and
evaluation strategies (Schmöcker, Kurauchi, & Shimamoto, 2017).
This new method for collecting data has been explored by researchers for several transit systems.
There are challenges in working with this data for planning and operational purposes, such as
inconsistencies in network data (Hemily, 2015), absence of user demographic characteristics and
trip purpose (Schmöcker et al., 2017), and errors in the AFC systems that are reflected in the
2
quality of the data. Yet, due to differences of transit systems in terms of the data collected,
characteristics of the network, and sources of data available, there are also particular challenges
of working and analyzing AFC and smartcard data.
This study acknowledges the general and the particular challenges for the transit system Sistema
de Transporte Metropolitano (STM) in Montevideo. Using exclusively AFC data and the
network characteristics, this study proposes strategies to deal with the data limitations and
methods to process and analyze the AFC data for the STM. The methods proposed create
information for planning purposes and evaluation of the system and transit network.
1.3 Thesis Structure
The remainder of this thesis is structured into nine chapters that are organized systematically and
described as follows. Chapter 2 summarizes previous work with smartcard data for planning and
system operation purposes, and for integration with household travel surveys. Chapter 3 then
presents a quantitative description of the data available for this study and a through analysis of
the data for a single day to understand travel behaviour characteristics.
Chapter 4 describes the data preparation (cleaning and validation) procedures, which consists on
selecting Montevideo Home Mobility Survey (MHMS) transit riders and applying two queries
for the boarding records. This data is used in Chapters 5, 6, and 7. The relationships between the
data and an overall description of Chapter 4 and the subsequent chapters are shown in Figure
1-1. Chapter 5 contains the method to build itineraries, identifying stops with high dwell times
and presenting a sample of the built itineraries. Chapter 6 contains the method to identify
alighting locations for smartcard users based on their daily transactions and the strategies to
estimate the alighting location of particular cases of smartcards (single transactions or with
incomplete trips) and of no-card users. In addition, this chapter presents the results of these
procedures spatially and temporally.
This chapter is followed by Chapter 7, in which the results for smartcard users with complete
trips are compared to those of transit riders in the MHMS. This chapter also contains a method to
pair MHMS individuals with smartcards based on the location and time of their transit trips.
Lastly, Chapter 8 discusses the results and presents the conclusions and Chapter 9 mentions the
limitations of the study and future work.
3
Figure 1-1 Organization of the thesis
4
Literature Review
This literature review presents various studies that process smartcard data and serve as guides for
developing and applying methods to the smartcard data in Montevideo. The first part of the
review describes strategies to identify the destinations of public transit users based on their
smartcard transactions. The next part include works that evaluate the transit system’s
performance and compute operational metrics, highlighting the added value of using smartcard
data. The third presents strategies that have been proposed to quantify and analyze transit
ridership regularity; and the last part describes the efforts to integrate daily transactions of
smartcards with the reported travel in public transit in travel surveys
2.1 Analysis of Smartcard Data for Estimating Trip Destinations
One of the most interesting applications of smartcard data for transportation planning is the
determination of Origin-Destination (OD) matrices for public transit. For public transit systems
where passengers only validate their card while boarding (tap-on systems), researchers have
proposed methods for estimating alighting locations using the subsequent transactions of
passengers for a given day.
The earliest methods were developed by Barry, Newhouser, Rahbee, & Sayeda (2002) and
Trepanier et al. (2007). Several other researchers have adjusted these methods to improve the
alighting estimation, incorporate other sources of data such as AVL (Automated Vehicle
Location), and account for multi-modal transit systems, including Seaborn, Attanucci, & Wilson
(2009), Gordon (2012), and M. A. Munizaga & Palma (2012). There are some common
assumptions in these methods which are outlined by Hickman (2017) as follows:
▪ The destination of the last trip leg of a passenger’s daily trips is the same as the origin
of the first trip leg of the day.
▪ Passengers generally take the most direct walking paths between services, as
measured by time, distance, or some generalized time or cost.
▪ Passengers do not take other modes of transportation between transit trips.
▪ Passengers take the next service available after arriving at a stop.
5
Trepanier et al., (2007) presented a formal model to estimate the alighting stops of individuals
for a bus system. The model determines the alighting stop for a passenger by identifying the stop
of the route that is closest to the boarding stop on the subsequent route the passenger takes, as
illustrated in Figure 2-1.
Figure 2-1.Trips destinations and legs of trips.
This method estimated 66% alighting locations. It was further developed by M. A. Munizaga &
Palma (2012) to be implemented on multimodal transit systems and create OD matrices. The
major contributions are estimating the alighting location by minimizing the generalized time (on-
board and walking time) instead of distance between alighting stop and next boarding, and
building OD matrices with this data. The matrices can be aggregated at any level as the boarding
and alighting data is on the disaggregate stop level.
This method was later validated using three data sources: the smartcard data used in the method,
an OD survey for metro users, and a group of volunteers M. Munizaga, Devillaine, Navarrete, &
Silva (2014). This validation revelealed that the method proposed correctly estimates 84.2% of
alighting locations and distinguishes 90% of the legs of trips from trips.
Based on these results, Munizaga et al. (2014) propose four improvements to the methology:
allowing a walking distance greater than 1 kilometre between the alighting location and the next
boarding, considering the start of a day at the time period with the lowest transactions (4:00:00
a.m. for this case) instead of midnight, estimating the alighting location for single day
transactions by using the subsequent day trips, and recognizing separate trips by comparing the
6
Euclidean distance between the board and alight stops with the on-board distance travelled. This
last proposition is similar to the proposition by Robinson, Narayanan, Toh, & Pereira (2014) to
compute a “directness” ratio between the Euclidean distance and on-board distance. This ratio
allows to identify trips that were previously considered as legs part of the same trip, but that are
separate trips instead.
In addition to these suggestions, other researchers have incorporated into their methods distance
thresholds between potential alighting stops and the next boarding stop, and time thresholds to
identify transfers. Some of these researchers include Gordon (2012); Nassir, Khani, Lee, Noh, &
Hickman (2011); and Seaborn et al. (2009).
While many different assumptions were used, little efforts were being made to validate them
until A. A. Alsger, Mesbah, Ferreira, & Safi (2015). These researchers test the different transfer
time threshold, allowable walking distance, and last trip destination assumptions by applying the
OD methods to a dataset with tap-on and tap-off data. A. A. Alsger et al. (2015) found that
increasing the transfer time threshold from 15 to 90 minutes had small impacts on the estimated
alightings, and that more than 90% of passengers walked less than 10 minutes to their transfer
stops but spent most of the transfer time waiting. Also, 88% of the passengers returned to a stop
within 800 metres from the first boarding location.
Further research by A. Alsger, Assemi, Mesbah, & Ferreira (2016) focused on accuracy of OD
matrices using smartcard data; a 30 minute allowed transfer time provided more accurate OD
matrices and the accuracy was not improved with beyond a 800 metre walking distance
thresholds. However, they note that the actual destinations do not necessarily match with the
estimated ones due to individual passenger behaviours and use of other modes of transportation.
While the methods to estimate the alighting location have been validated and improved, another
obstacle to determining the alighting is for those passengers that only record one transaction on a
given day. For smartcards with single daily transactions, Trépanier et al. (2007) inspect previous
trips of the card that have similar boarding location and time and for which alighting location can
be identified, to assign the alighting stop to the single trip. Furthermore, He & Trépanier (2015)
propose a kernel density estimation to compute a spatial-temporal probability using historical
boarding and alighting records to assign destinations to unlinked trips.
7
2.2 Analysis of Smartcard Data for Transit Operations and Performance Measures
Data about passenger boarding and alighting at the stop and route level, obtained from the
estimation procedures previously discussed, can be used for a myriad of operations and
performance measures. Some of these include recreating bus trajectories (Fourie, Erath,
Ordonez, Charikov, & K.W, 2017), creating load profiles of individual buses and bus routes
(Trepanier et al., 2007) (Beltrán et al., 2011), analyzing on-route travel times and distances
(Trepanier & Morency, 2017), identifying spatiotemporal demand variations of bus routes, and
recognizing transfer points, volumes, and transfer times for passengers (Jang, 2010). These
measures can be aggregated at any spatial and temporal level to monitor, evaluate, and/or
propose improvements to the transit network.
Fourie et al. (2017) propose using smartcard data to reconstruct bus trajectories, and compute
travel and dwell times. The transactions at each stop are clustered to determine dwell times and
travel times between stops. For stops without transactions Fourie et al. (2017) obtain the time by
interpolating between known stops before and after that stop. Unusual smartcard records due to
glitches in smartcard readers and late tap-ons are disregarded. Furthermore, using the
reconstructed bus trajectories and itineraries, the on-board travel times and transfer times are
similar to those obtained in MATSim simulations.
Trepanier et al. (2007) and Beltrán et al. (2011) show examples of load profiles using the
alighting stop estimation results, that can be useful for transit operators. In addition to load
profiles, Trepanier & Morency (2017) compute Key Performance Indicators (KPI) using
smartcard data. These KPI include bus speeds, average trip time and duration, passenger-
kilometres and passenger-hours, schedule adherence, and others. Trepanier & Morency (2017)
highlight that these KPI from smartcard data provide advantageous measurements because they
come from the empirical demand and can be computed for every transit vehicle and the different
smartcard users.
Analysis of smartcard data can also help identify passenger travel times, Level of Service (LOS),
and locations with high transfer volumes and times (Jang, 2010). This valuable information can
reduce the need for other data collection efforts and help to identify the routes, locations, and
8
areas that need improvements. Additionally, the travel times can be used as inputs for mode
choice models (Jang, 2010).
The integration of smartcard data with other sources of data, such as AVL (Automated Vehicle
Location) can also provide valuable operational metrics. Using smartcard data with scheduling
and AVL data it is possible to compute commercial speeds (Beltrán et al., 2011; Trépanier,
Morency, & Agard, 2009) identify headway variation (Beltrán et al., 2011), and schedule
adherence (Trépanier et al., 2009).
Note that these measures can be computed with confidence for transactions using smartcards, but
cannot be applied for passengers without cards without understanding their travel behaviour first.
Smartcard users could have very different travel behaviours than no-card users depending on the
fare structure and incentives available to smartcard users. As the incentives differ among
transportation systems (Schmöcker et al., 2017), the travel behaviour for smartcard and non-card
passengers should be compared or studied independently to prevent obtaining biased results
(Park, Kim, & Lim, 2008).
2.3 Transit User Regularity
Transit users might make regular trips that can be analyzed over long periods of time. The travel
behaviour and regularity is of interest to identify regular users, propose incentives to regular
passengers, and detect frequent places and times of travel. Smartcard data presents an
unparalleled data source to understand transit regularity as it is continuously collected for all
days and all smartcard users (Hickman, 2017).
The first study to use smartcard for user regularity was Trepanier et al. (2007); this study
measured regularity for each user using monthly transactions and identifying similar transactions
(on the same route and around the similar time). A measure of distance and time is used to
determine the regularity of users across the month of transactions. Following this study, another
one measured transit user regularity for different card users for a 10-month period (Morency,
Trépanier, & Agard, 2007).
To measure user regularity, researchers use either data mining algorithms or spatial-temporal
windows. For data mining, there are supervised and unsupervised algorithms that identify spatial
9
and temporal clusters for different card types (Morency et al., 2007) and for individual travel
patterns (Kieu, Bhaskar, & Chung, 2015; Ma, Liu, Wen, Wang, & Wu, 2017). Characterizing
individual travel patterns allow researchers to classify passengers into different user categories
based on the regularity of their travel (Kieu et al., 2015) and to identify their residence and
workplace based on spatial and temporal considerations (Ma et al., 2017).
The temporal windows of unsupervised algorithms are not controlled and vary depending on the
cluster. On the other side, supervised ones have predefined temporal windows of 1 hour
(Morency et al., 2007) and 30 minutes (Ma et al., 2017). The spatial windows of these studies
have been handled with unsupervised algorithms and specified by considering neighbouring
stops or transactions that occur on the same transit route.
The research works that have discretely specified spatial windows have done so to identify trip
attractors and locations of residence, work, and study (Chu & Chapleau, 2010; Zou, Yao, Zhao,
Wei, & Ren, 2016). Chu & Chapleau (2010) defined a spatial window of 500 metres to identify
residence and study locations and Zou et al. (2016) a window of 1,000 meters to detect home
location and trip purpose. These studies also consider other travel behaviour factors such as the
time of travel and duration of activities to identify the trip purpose.
2.4 Integration of Smartcard Data with Travel Surveys
Household travel surveys and travel diaries can be integrated with the smartcard data to identify
the travel behaviour of individuals and extract demographic characteristics and trip purpose of
smartcard users. Hickman (2017) highlights the need for integrating smartcard data with
household surveys as only few authors have integrated smartcard data with surveys and travel
diaries.
These two data sources are inherently different as smartcard data is passively collected but
contains transaction details for all transactions in public transit, while household surveys contain
the reported trips by the individuals and their trip purpose of a sample of households. One would
assume that the reported public transit trips can be idenfitied on the smartcard dataset using
common attributes, such as boarding time, location, and service taken. Some researchers have
evaluated the information provided in surveys about public transit usage.
10
Spurr, Chu, Chapleau, & Piché (2015) proposed matching smartcard data with household travel
survey data using spatiotemporal windows regarding the daily transactions boarding times and
locations, as well as line numbers and subway stations. The dimensions of the windows are not
clearly defined and are variable as they are adjusted to find a match between a survey respondent
and a smartcard. With this approach and a sample of survey responses, the daily journeys of
50% of survey respondants that declared using public transit could be paired with at least one
smartcard. The 50% paired journeys comprise three matching scenarios: exact matches, partial
matches with undereporting of trips, and match with typical daily travel patterns instead of the
day asked on the survey.
This results are fairly similar to those obtained by Riegel (2013). The difference of this study
resides in that Riegel (2013) obtained the smartcard ID linked to survey respondants volunteers
and could pair exact survey responses to the transactions of a specific smartcard. For this study,
there were only 44% exact matches between reported daily trips and the smartcard data for the
card IDs.
Another application of integrating smartcard data with travel surveys was explored by Kusakabe
& Asakura (2017). These researchers estimate trip purposes for rail smartcard data by combining
this data with survey data using a Naïve Bayes classifier. The integration of data sets was based
on behavioural attributes, which include boarding and alighting times and locations. Even though
the datasets have different spatial and temporal accuracy, these are handled by approximation to
the closest hour. This method correctly identified over 80% of the commutting and home trips
but only over 20% of leisure trips; this is expected as leisure trips are less common and often
underreported in surveys.
11
Data
The data was provided and facilitated by the Smart Cities Technology group and the Intendencia
de Montevideo, the governmental agency that monitors, coordinates, and integrates the public
transportation system in the Metropolitan Area of Montevideo (AMMON), Uruguay. The
integrated transportation system STM serves Montevideo and the surrounding urban areas in
blue as shown in Figure 3-1.
Figure 3-1 Census segments served by STM
12
The system is composed of buses from four different operators: Coetc, Comesa, Cutcsa and
Ucot. It has 144 bus lines with 107 different destinations, and 4,835 stops (Montevideo, 2018).
There are four main components of the data:
1. Boarding records (tap-ons): Seven consecutive days of passenger boarding records,
including the five weekdays and a weekend from August 15th to August 21st, 2016 where
provided for this analysis. These records belong to smartcard (STM card) and no-card
passengers recorded by the system.1
2. Bus lines and branches: Information about bus routes including the direction and order of
stops. Each bus run or trajectory in one direction, is labeled with a unique identification
number that can be paired with this data to obtain the run’s line and branch.
3. Stops: Number, coordinates, and description of the closest intersection from the stop.
A fifth additional source of data is the 2016 Montevideo Home Mobility Survey (MHMS). This
is a household survey that collects trips by individuals from a sample of households in the
AMMON. The trips by bus are of interest in this study and the survey results can be used to
evaluate the OD method results.
3.1 Data Description
This section provides qualitative and quantitative descriptions of the five data sources used in
this study. It begins with an overall description of the boarding transactions for the seven days
and an explanation the differences between trips made with STM cards and without them. It then
presents a description of the bus lines and branches and the MHMS dataset. The section closes
with an in-depth analysis of the data for Monday, August 15 processed for overall understanding
of travel patterns and temporal distribution of transactions.
3.1.1 Boarding Records
The passenger boarding records correspond to smartcard and non-smartcard users during a
complete week (Monday-Sunday). The total boarding records for smartcards is 5,077,674 and for
no cards is 2,371,815, representing a 68% to 32% split.
1 The term smartcard is used interchangeable with STM when referring to boarding transactions and passengers.
13
Table 3-1 shows the volumes and some descriptive statistics for the boarding records. The day
start is considered here at 3 A.M. as the lowest volume of transactions occur at this time, as will
be shown in section 3.2 . For smartcards, the weekday average is 868,811 with Thursday having
the highest volume of 872,844 records and Friday having a significantly lower volume with
863,231. The weekend has low volumes with 454,576 records on Saturday and 279,043 on
Sunday. For records with no cards, the weekday average is 395,697 with Monday having the
highest volume of 404,217 records. The weekend has significantly lower volumes with 239,686
on Saturday and 153, 644 on Sunday.
Table 3-1 Boarding records and descriptive statistics
Boarding Records
Boarding Records
Weekdays Smartcard No-card
Weekend Smartcard No-card
Monday 869,898 404,217
Saturday 454,576 239,686
Tuesday 870,437 392,990
Sunday 279,043 153,644
Wednesday 867,645 388,127
Weekend total 733,619 393,330
Thursday 872,844 395,517
Average 366,810 196,665
Friday 863,231 397,634
Standard
deviation 87,766 43,021
Weekday total 4,344,055 1,978,485
Average 868,811 395,697
Standard
deviation 3,243 5,310
Week total 5,077,674 2,371,815
There are several differences for passengers that use a smartcard and those who do not.
Smartcard users benefit from being able to transfer between buses within 1 or 2 hours, depending
on the trip type they choose, and they also pay reduced fares. Smartcard users can use their card
for people they travel with and benefit from fares and transfers between buses, as long as they
travel together. This is a unique characteristic of the Montevideo system, as most transportation
systems with smartcards permit only one card per person. On the other hand, passengers without
cards cannot make transfers and pay higher fares than the users that have smartcards.
The passengers that do not have cards pay the fare as they board the bus and the system records
the time of boarding, ticket number, boarding stop, bus run unique identification number and bus
destination, fare details, and number of passengers. The users that have smartcards tap their STM
card on readers that are mounted on the buses and the system records the number of the card,
14
time of boarding, boarding stop, bus run unique identification number and bus destination, fare
details, card type and fare discount if applicable, ordinal of trip, and whether the tap is
considered a transfer (ordinal of trip≥1) or a new trip (ordinal of trip=1).
Furthermore, the system records the transactions considered as part of the same trip (trip with 2
or more trip legs) and assigns them a common trip ID. This information is essential to understand
the method proposed in section 6.1.
For smartcard users the fare discounts are associated with the different card types. These types
distinguish ordinary users from other user groups that benefit from reduced or subsidized trip
fares (see Appendix A).
Table 3-2 shows the boarding records for each smartcard type on Monday August 15. Note the
high percentage of boarding records made by students (Student A and Student Free).
Table 3-2 Boardings per STM card type for Monday, August 15th
STM card type Boardings Percentage
Standard 397,034 45.8%
Student A 170,134 19.6%
Student B 21,448 2.5%
Student Free 142,712 16.5%
Retired A 44,317 5.1%
Retired B 16,235 1.9%
Social Work 29,330 3.4%
Prepaid 23,608 2.7%
Others 21,651 2.5%
3.1.2 Bus lines and branches
This data can be paired to the bus runs of the 7-day period for which there are passenger records.
Each bus run has a unique identification number (UID) and this UID is attached to the passenger
transactions when they board the bus. In addition to the IUD, bus runs have the code of the line
and branch they are serving. In theory, the branches for the UIDs can be paired with the
branches in this dataset; however, the dataset is missing branches. Table 3-3 shows the share of
15
bus branches and UIDs that are in the dataset and those that are not, considered in this study as
valid and invalid runs, respectively.
Table 3-3 Bus branches and bus UIDs
Condition Bus branches UIDs
Valid 527 90,780
Invalid 640 27,084
Total 1,167 117,864
Note that even though 55% of the branches are missing, 77% of the bus runs (UIDs) operate on
valid runs. For the valid branches, the sequence of stops and characteristics of the branch are
known. The remaining branches are missing and their charactersitics could not be obtained from
the data provider. This data issue occurred due to outdated data and errors in the digitalization of
the bus branches.
The branches that do not appear on the database are problematic as their sequences of stops are
unknown. However, having the boarding records of people who boarded the UIDs running these
bus routes, a method is proposed in section 4.2 to validate the invalid runs by matching them
with valid runs.
3.1.3 MHMS
The data was collected during the period of August-October 2016 in the Metropolitan Area of
Montevideo (AMMON) (CAF et al., 2017). The size of this survey represents a 0.34% sample of
the households in the AMMON with 2,230 households interviewed. For detailed information
about the survey please refer to Montevideo et al. (2016), CAF et al. (2017), and Miller, Parada
Hernandez, & Habib (2017).
This study uses the trips of the MHMS made by bus. Of the total trips, 3,166 (2,136 in
Montevideo) representing 25.2% (28%) of trips in the AMMON (Montevideo) are made by bus
and they correspond mainly to home, work, and school trips. There are 3,844 (2,599 in
Montevideo) legs of trips corresponding to these trips. The data collected for the trips and legs of
trips is summarized on Figure 3-2.
16
Figure 3-2 Transit trip data collected in MHMS (Montevideo et al., 2016)
It is important to note that currently the STM does not serve the entire AMMON, it serves
Montevideo and some surrounding areas coloured in blue in Figure 3-1. This map shows the
level of data aggregation by the MHMS. These units are called census segments which are
groups of blocks (INE, 2009). The average census segment contains 12 blocks and the segments
in urban areas, such as the ones in urban Montevideo on the South, are smaller containing an
average of 6 blocks. The average area of the census segments served by the STM is 551,532
squared meters.
Figure 3-1 shows the 1,133 census segments served by the STM, of which 1,063 are in
Montevideo. Furthermore, Table 3-4 describes the bus trips and legs of trips collected by the
MHMS which occur in the census segments served by STM. Around 76% of the bus transactions
occur in the census segments served by the STM.
Table 3-4 Trips and legs of trips in the MHMS
Locations
Occurrences in STM Occurrences in AMMON
Census
segments Individuals
Census
segments Individuals
Legs of trip (Boardings and
alightings)
640 6,441 835 7,661
Trips (Origins and destinations) 665 5,108 871 6,296
Trips legs i,n•Boarding and alighting
location
•Walking distance to/from stop
•Wait time at stop
•Bus line
Trip i
•Origin and Destination
•Departure and arrival time
•Travel time
•Purpose
•Weekly frequency
Trip i
Trip leg i,jTrip leg i,
j+1
17
3.2 Data Analysis for Monday, August 15
For the subsequent parts of this study, the data corresponding to Monday, August 15 is used to
show the implementation of the methods and results. The results and statistics are similar across
all weekdays, therefore presenting these metrics for one day is sufficient. Appendix B contains
some of the most important results for all days, and sections in this study will indicate to see this
appendix to access the results for all the other days.
This section provides a thorough analysis of data for Monday August 15. This is done with the
aim of providing an in-depth description and validation of daily data, testing procedures and
assumptions, and developing methods that can be used for any other day. For this selected day,
there are 1,267,798 records with a split of 68% to 32%, corresponding to 869,868 and 404,217
STM card and no-card records respectively. Moreover, the smartcard records correspond to
302,516 STM cards with an average of 2.86 transactions per card.
Data is processed for overall understanding of travel patterns and temporal distribution of trips.
Smartcard and no-card data is processed separately to identify differences in travel patterns;
moreover, the smartcard users can be analyzed according to the card type. The temporal
distributions in Figure 3-3 and Figure 3-4 aggregated by 30-minute intervals reveal interesting
and different travel patterns for smartcard and no-card transactions.
There are three evident peak times for STM cards between 7 a.m. and 8 a.m., 1 p.m. and 2 p.m.,
and 5:30 p.m. and 6:30 p.m. Interestingly, the midday peak exceeds the morning and evening
peak volumes and the volumes after this peak are similar or higher than morning volumes until 7
p.m.
Figure 3-3 Temporal distribution of STM card transactions
0
10,000
20,000
30,000
40,000
0:3
0
1:3
0
2:3
0
3:3
0
4:3
0
5:3
0
6:3
0
7:3
0
8:3
0
9:3
0
10
:30
11
:30
12
:30
13
:30
14
:30
15
:30
16
:30
17
:30
18
:30
19
:30
20
:30
21
:30
22
:30
23
:30
Tran
sact
ion
s
Time of the day (30 minute intervals)
18
On the other hand, for no-card transactions there are two evident peaks between 8 a.m. and 9
a.m., and 5:30 p.m. and 6:30 p.m. There is no noticeable midday peak, instead there are high
transaction volumes starting at 12:30 p.m. until the evening peak. The volumes at midday and
evening times are relatively higher than morning ones.
Figure 3-4 Temporal distribution of transactions without card
These distributions are compared for statistical similarity using the Kolmogorov-Smirnov test.
Using a 90% confidence level, the hypothesis that the distributions are similar can be
rejected(𝐷𝑛 = 0.74).
In addition to the temporal travel pattern analysis, for STM card users the daily transactions and
transfers per card can be identified. Figure 3-6 shows the transactions per card. Just above half
of the cards (53.7%) have one or two transactions per day and 99.6% of the cards make 9 or less
transactions on this day.
From the previous analysis, the transactions are aggregated into four time periods that
differentiate volumes between the peaks: AM from 4 a.m. to 11 a.m., Midday from 11 a.m. to
3:30 p.m., PM from 3:30 p.m. to 10 p.m., and Overnight from 10 p.m. to 4 a.m. The midday
period is short (4.5 hours) compared to the other three, to prevent including typical morning
home-to-work and evening work-to-home trips. And even though it is short, Figure 3-5 illustrate
that almost a third of daily transactions occur during Midday.
The total number of passengers boarding buses with STM cards is 884,018 and these are shown
per time period in Figure 3-5 with the highest volume occurring during the PM period, followed
by the Midday. Note the number of passengers exceeds by 17,549 the number of STM cards
0
5,000
10,000
15,000
20,000
0:3
0
1:3
0
2:3
0
3:3
0
4:3
0
5:3
0
6:3
0
7:3
0
8:3
0
9:3
0
10
:30
11
:30
12
:30
13
:30
14
:30
15
:30
16
:30
17
:30
18
:30
19
:30
20
:30
21
:30
22
:30
23
:30
Tran
sact
ion
s
Time of the day (30 minute intervals)
19
boarding records. As previously discussed, this occurs as smartcard users can use their cards for
the trips of other individuals they are traveling with.
Figure 3-5 Transactions for STM (left) and no-card (right) per time period
The total passenger transactions without STM cards is 411,156 and the shares among the time
periods are shown in Figure 3-5. The highest volume occurs during the PM hours followed by
AM volumes.
Figure 3-6 displays the smartcards with different number of transactions per day. Most
smartcards have two transactions and 85% of have more than one transaction. For the cards with
more than one transaction, the transfers per card are shown in Figure 3-7. 94.3% of the users
transfer one or two times per day and 99.6% of the cards make four transfers or less.
Figure 3-6 Transactions for STM cards
0
20
40
60
80
100
120
1 2 3 4 5 6 7 8 9 10 11 12 >=13
STM
Car
ds
in t
ho
usa
nd
s
Transactions
20
Figure 3-7 Transfers per STM card
0
20
40
60
80
100
120
140
160
1 2 3 4 5 6 7 8 9 11
STM
car
ds
in t
ho
usa
nd
s
Transfers per Trip
21
Data Preparation
The boarding records, bus lines and branches, and MHMS data need to undergo a process that
removes invalid records and prepares the data for further analysis. The sections on this chapter
explain the query processes for the boarding records, the recovery of data from bus branches, and
the process of selecting the MHMS trips that occur within the study area.
4.1 Preliminary query (Query #1)
The purpose of this query is to keep the passengers who have normal travel behaviour and
remove the null transactions. Due to the differences between no-card and smartcard records, the
cleaning process differs. For non-smartcard records, the only records that can be removed are
those that are null. There are 0.4% null, leaving 2,362,786 boarding records.
The smartcard data is queried based on the travel patterns identified and described in section 3.2.
The query criteria are included in Table 4-1 and the queried transactions for each day in Table
4-2. The transactions account for over 98% of the smartcard transactions and over 99% of no-
card transactions.
Table 4-1 Query criteria for smartcard data
Query Criteria Value
Void No
Transfers per passenger <5
Passenger number <5
Transactions per day <8
22
Table 4-2 Smartcard data - Query #1
Date (Day start 3 am) Smartcard No-card
Monday Aug15 859,721 402,712
Tuesday Aug 16 860,653 391,580
Wednesday Aug 17 858, 106 386,752
Thursday Aug 18 864, 528 394,020
Friday Aug 19 852,441 396,131
Saturday Aug 20 450,810 238,652
Sunday Aug 21 277,264 152,941
4.2 Method for invalid bus runs
As indicated in section 3.1.2, the bus lines and branches dataset does not contain all of the
branches. This is one of the challenges of working with transit data as datasets with network
characteristics, routes, and stop are not updated on an ongoing basis (Hemily, 2015).
Over 50% of the branches covered by buses are missing. A method was developed to validate the
invalid runs by matching them with valid runs. The goal of this method is to determine if the
invalid bus runs can be matched with any valid run that contains all the stops in the invalid run.
This is done by identifying the stops where passengers board for each invalid run and
determining which of the valid runs contain all the stops. Figure 4-1 shows the stops associated
with an invalid bus run and the valid run that can be assigned to it. As there is one run that
contains all the stops from the invalid run, the characteristics of the valid run (line, branch, and
stop sequence) are assigned to the invalid one.
23
Figure 4-1 Valid bus run assigned to invalid run
The requirement of pairing an invalid run with only one valid run allows one to identify with
certainty the most likely route travelled by a bus, and discard the runs for which passengers
board in a few stops that are common to many valid runs.
This method is applied to 640 invalid bus runs and for 243 (38%) of them a valid bus run could
be identified as shown in Table 4-3. The remaining runs could not be identified due to either few
boarding records common to many runs, or boarding records at stops that do not coincide with
any valid run.
Table 4-3 Bun run classification
Classification Bus runs
Valid 527
Invalid Validated 243
Not fixed 397
Total 1,167
24
An analysis of the validation procedure is shown in Figure 4-2. This reveals that most invalid
runs can be validated for one day, which means that in one day the stops match with the ones on
other run, because on the other days the invalid run stops coincides with many more runs. The
matched runs were inspected to have more than 10 stops and belong to the same bus line as the
run they were matched with.
Without having any other data about the bus network, this was considered the most efficient
approach to recover data and be able to use transactions that occur on invalid runs. The
implications of including validated runs and passengers on these runs are explained in further
chapters, by comparing the results between valid and validated runs.
Figure 4-2 Frequency of validation of invalid bus runs
4.3 Query for OD estimation and incorporation strategies (Query 2)
After applying the method for invalid bus runs, a more rigorous query is needed to account for
the available network and bus data. This second query is applied to smartcard and no-card
transactions; the criteria is presented on Table 4-4.
Note that the query criteria that is exclusively applied to smartcard records is the one that
requires multiple daily transactions; this is because the OD algorithm in section 1.1.6 needs the
subsequent daily transactions of a user to estimate the destination of the previous transactions.
0%
5%
10%
15%
20%
25%
30%
35%
0
10
20
30
40
50
60
70
80
90
1 2 3 4 5 6 7
Per
cen
tage
Fre
quen
cy o
f V
alid
atio
n
Number of Days
25
The single riders (either smartcard or no-card) are later incorporated using the results from the
OD method. These procedures are explained in detail in section 1.1.6.
Table 4-4 Query criteria for OD estimation
Query Criteria Value Applied to…
Transactions per day >1 Smartcard data for OD
algorithm
Transactions on bus runs Valid/Validated Smartcard (OD
algorithm and single
riders) and no-card
Boarding on recognized
stops
Yes
Table 4-5 shows the query results for the smartcard transactions on Monday, August 15 and
Table 4-6 the results for the no-card transactions of the whole week.
Table 4-5 Smartcard data - Query #2
Condition No. of Cards No. of Transactions
Initial 303,917 869,898
Initial query results 298,993 (98.4%) 859,721 (98.8%)
Single ride per day 44,780 (14.7%) 44,780 (5.15%)
Valid bus runs 281,894 (92.8%) 661,522 (76.1%)
Validated bus runs 58,665 (19.3%) 69,001 (7.9%)
Invalid bus runs (Not
validated) 104,054 (34.2%) 138,917 (16.0%)
Invalid boarding stops 8,762 (2.9%) 9,114 (1.0%)
Total of records at valid boarding stops on valid and validated runs
Single rider 37,925 (12.4%) 37,925 (4.4%)
Cards for OD algorithm 155,694 (51.2%) 441,752 (50.8%)
From Table 4-5 it is evidenced that the issue with the invalid bus runs unfortunately removes a
significant number of users for the OD estimation algorithm. Interestingly, only 16% of
transactions are on invalid runs but 34% of users have at least one transaction on an invalid run
and are removed. The smartcard users that have a single transaction per day also remove a big
share of the users for the OD estimation, but only represent 4.4% of the valid and validated
transactions. In contrast, due to no-card passengers not being able to make transfers, 84.7% of
the transactions can be used as shown in Table 4-6.
26
The results for the query of smartcard data for the other days are included on Appendix B and
show very similar percentages for the number of cards and transactions that meet each condition.
Table 4-6 No-card data - Query #2
Condition No. of Transactions
Initial 2,362,786
Valid bus runs 1,865,586 (78.9%)
Validated bus runs 157,392 (6.6%)
Invalid bus runs (Not
validated) 339,808 (14.4%)
Invalid boarding stops 25,618 (1.1%)
Total of records with valid and validated runs
Records 2,001, 653 (84.7%)
4.4 Data cleaning MHMS
There are 1,333 census segments served by the STM and the bus trips of the MHMS used on this
study are those that occur within these segments. The process of data cleaning for the bus trips
and legs of trips consist on the criteria outlined in Table 4-7.
Table 4-7 Query criteria for MHMS
Query Condition Value
Trip origins and destinations Both served by STM
Board and alight for legs of trips Both served by STM
Line number Valid (Bus lines and branches dataset) or
validated
These criteria are applied to the total 3,166 trips and 3,844 legs of trips made by transit. The
results are included in Table 4-8 and include 84% of legs and 81% of trips from the survey.
Table 4-8 MHMS - Query results
Condition Occurrences
Trip location served by STM 2,266
Trip location served by STM 2,803 corresponding to 2,294 individuals
Correct bus line number for legs 2,624
Individuals for which all trips and legs
have valid location and line number
1,007 individuals corresponding to 2,572
legs and 2,150 trips
27
Building Itineraries from Boarding Transactions
Schedules are used to determine the time buses arrive at a certain location, which in turn
can be used to estimate the alighting times for passengers. In the absence of schedule data,
the itineraries can be created using the data available: the passenger boarding records and
the characteristics of the bus routes (lines and branches). This chapter contains the method
used to combine these data to create itineraries, identifying outliers on transaction records
and stops with high dwell times, and then presents the findings of the method and a sample
itinerary.
The method consists on a sequence scripts of code (Python 3.6) that output the results and
the itineraries for each day in text and CSV files.
5.1 Method
Each bus run has a unique identification number (UID) that is attached to the passenger
transactions when they board the bus. Both, smartcard and no-card records that pass the
preliminary query criteria (Query #1) have an UID and the boarding location and time,
regardless of run validity. The records are grouped by UID and stop number to obtain the
dwell times2 and average boarding times, considered as the arrival times in the itineraries.
The dwell times are analyzed before using the average boarding times to create the
itineraries.
It is necessary to compare the passenger flow time with an acceptable service time to
identify bus stops with erratic dwelling times and prevent them from creating inaccurate
itineraries. Robinson et al. (2014) highlight that late tap-ons can cause severe impacts on
smartcard analysis. Usually the smartcard systems only allow tap-ons when the transit unit
is close to a stop and malfunctions of the system is one of the main causes of erroneous
2 The term dwell time is used to refer to the passenger boarding flow time, disregarding the doors opening
and closing times. The flow time is computed as the time between the first and last passenger of all the
passengers boarding at a stop (tapping the STM card or paying the fare with cash).
28
smartcard data. For the Montevideo system, the passengers get a receipt with the time,
location, and trip type they purchase. It is believed that the driver verifies and may correct
the location but there could be errors in the transaction times.
To identify these errors, outliers are identified for the clustered transactions at each stop
using the Interquartile Range (IQR), which measures statistical dispersion between the first
and third quartiles. The boarding transactions out of range3 are outliers and disregarded for
computing the bus arrival time at a stop.
The next step to identify unusually high dwelling times is to compute the passenger service
time for stops that are geocoded and are part of valid or validated bus runs. The service
time is the time per passenger boarding and is used to obtain typical dwell times (Transit
Capacity and Quality of Service Manual, 2003). The stops with high dwelling times are
those with service times that exceed a critical service time, as is explored in section 5.2, and
are disregarded for computing the bus arrival times. This dwell time analysis excludes
terminals and the first and last stop of each route; as the time spent there is terminal time
and is established by the operator.
Note that only boarding transactions are recorded, therefore the time of arrival at a stop is
used as the alighting time. Moreover, as passengers do not board at every stop on a route,
the itineraries created from passenger records need to incorporate all the stops on a bus
route. The stops arrival times for the UIDs are concatenated with the sequence of stops on
the bus route, organized by the stop ordinal. Table 5-1 shows an example of the built
itinerary for a bus route with some stops that do not have arrival times highlighted in blue.
3 Range for identifying outliers: (𝐹𝑖𝑟𝑠𝑡 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 − 1.5 ∗ 𝐼𝑄𝑅, 𝑇ℎ𝑖𝑟𝑑 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 + 1.5 ∗ 𝐼𝑄𝑅)
29
Table 5-1 Sample itinerary built from passenger transactions
UID Branch Stop ID Arrival time Interpolated time Stop Ordinal
1.653E+10 1763 2521 4:13:26 PM 4:13:26 PM 1
1.653E+10 1763 6153 0 4:14:22 PM 2
1.653E+10 1763 2522 4:15:17 PM 4:15:17 PM 3
1.653E+10 1763 2523 4:15:34 PM 4:15:34 PM 4
1.653E+10 1763 2524 4:16:40 PM 4:16:40 PM 5
1.653E+10 1763 2022 4:17:25 PM 4:17:25 PM 6
1.653E+10 1763 2023 4:18:29 PM 4:18:29 PM 7
1.653E+10 1763 2525 4:19:39 PM 4:19:39 PM 8
1.653E+10 1763 2526 0 4:20:32 PM 9
1.653E+10 1763 2527 4:21:24 PM 4:21:24 PM 10
1.653E+10 1763 2528 0 4:21:55 PM 11
1.653E+10 1763 2529 0 4:22:25 PM 12
1.653E+10 1763 2530 0 4:22:56 PM 13
1.653E+10 1763 2531 0 4:23:26 PM 14
1.653E+10 1763 2532 0 4:23:57 PM 15
1.653E+10 1763 2533 0 4:24:28 PM 16
1.653E+10 1763 2534 0 4:24:58 PM 17
1.653E+10 1763 2535 4:25:29 PM 4:25:29 PM 18
The arrival time for these stops are calculated by using simple interpolation between
previous and subsequent stops that have arrival times as shown in the column “interpolated
arrival time” in Table 5-1. This is a common approach used in the literature to reconstruct
bus trajectories and obtain arrival times (Fourie et al., 2017).
Interpolation is only applied between the first and last stops for which arrival time is
available from the boarding records. This interpolation technique is adapted from the
technique used by Fourie et al. (2017) to reconstruct bus trayectories using smartcard and
GPS data. Moreover, Table 5-2 outlines the strategies to incorporate stops with specific
characteristics.
30
Table 5-2 Itinerary building strategies for special cases
Condition Strategy
Stops with unknown stop ordinal Cannot be used to interpolate schedule. Are
incorporated to the finalized schedule
based on the arrival time.
Intermediate stops of a bus route with high
dwell time
Disregard arrival time from boarding
records; estimate arrival time using
interpolation.
Stops at terminals or first/last stop of bus
route
- Intermediate terminal: Consider the
first and last transaction times as
different arrival and departure
times.
- For first stop: Last transaction time
considered as departure time (last
passenger that boards)
- For last stop: If known4, first
transaction time considered as
arrival time (first passenger that
boards)
Computing itinerary for stops after last boarding record
Passengers do not usually board on the last stops of a route, however they are likely to
alight. Therefore, it is key to be able to forecast the bus arrival times for the stops after the
last stop for which passenger records are available.
The arrival times between stops are interpolated with confidence for intermediate stops for
which previous and subsequent stop arrival times are known. After the last stop, the arrival
times are forecasted using the interpolation temporal step size for the immediately previous
interpolation corresponding to the bus route. This is done to consider the vehicle speed and
driving conditions of previous stops.
4 This occurs when passengers board a bus before the driver signals the start of a new bus run, departing from
the last stop of the previous bus run.
31
There are three main assumptions to use this method:
▪ The time and speed between stops after the last stop with boarding records is
assumed to be the same as the time and speed between previous stops.
▪ Changes in traffic conditions do not change
▪ The distance between stops is relatively similar
5.2 Results
The queried boarding records for smartcard and no-card passengers from the preliminary
query (Query #1) are used to build the itineraries. The boardings are grouped for each bus
run at each stop to compute arrival times; but before doing so, these occurrences5 are
analyzed to identify outliers and compute dwell and service times.
The outliers are recognized using the IQR range; they represent 1% of the transactions for
each day and are removed. The service times are analyzed using the Transit Capacity and
Quality of Service Manual (2003); the recommended passenger service time is “3.5
seconds per passenger with smartcard and 4.0 seconds per passenger that pays with change”
(p. 4-5). Using 3.75 seconds per passenger, assuming half of passengers are smartcard
users, results in 70% of occurrences with more than one passenger exceed the allowed time.
This can be partly due to the unusual AFC system in Montevideo that provides users a
small receipt after boarding, which likely increases the service time.
The recommended passenger service time does not seem appropriate for this transit system
and there is not an identifiable indicator of occurrences with high dwell times, as they occur
on all bus branches, at stops all over the network, and throughout the day as shown in
Figure 5-1.
5 The term “occurrence” refers to each group of boardings at a stop for a bus run.
32
Figure 5-1 Temporal distribution of dwell times
Graphing dwell times to passenger boardings gives a better indication of the passenger
service time for this system. The graph in Figure 5-2 indicates that increasing the passenger
volumes leads to longer dwell times, as it is expected, and the slope quantifies this
relationship as the service time per passenger. The intercept of the trendline is set to zero as
the passenger boarding volume is the only explanatory variable and the dwell time for zero
boardings should correspond to zero.
33
Figure 5-2 Dwell time (minutes) vs. Passenger boardings
The R2 value of the graph evidences a weak fit of the trendline but without other
alternatives to determine an acceptable service time, the slope is used. The value is 0.1853
minutes (11.1 seconds) approximated to 10 seconds per passenger.
Table 5-3 shows the number of passengers and occurrences at different service time ranges,
followed by the percentages on valid and validated runs from the 14,647 and 1,523 runs,
respectively. There are minor differences between valid and validated runs on the shares of
passengers for each range, and the shares of occurrences is rather similar.
For all bus runs, 89.8% of passengers and 93.9% occurrences are within the 10 second
threshold and are used to build the itineraries. The arrival time for these occurrence is the
average boarding time from the transactions.
34
Table 5-3 Passenger service time ranges
Passenger service time
(seconds) Passengers Occurrences
0 152,300 (14.31%) 152,297 (40.33%)
0 to 5 468,845 (44.06%) 131,534 (34.83%)
5 to 10 333,969 (31.38%) 70,728 (18.73%)
10 to 15 52,067 (4.89%) 12,261 (3.25%)
15 to 20 19,073 (1.79%) 4,383 (1.16%)
20 to 25 11,619 (1.09%) 2,229 (0.59%)
More than 25 26,307 (2.47%) 4,220 (1.12%)
Valid runs Validated runs Valid runs Validated runs
0 14.35% 13.89% 40.33% 40.32%
0 to 5 43.82% 46.46% 34.72% 35.96%
5 to 10 31.61% 28.35% 18.83% 17.68%
10 to 15 4.88% 5.05% 3.26% 3.12%
15 to 20 1.80% 1.72% 1.16% 1.19%
20 to 25 1.11% 0.95% 0.59% 0.62%
More than 25 2.36% 3.58% 1.12% 1.10%
Of the 23,093 occurrences with passenger service time over 10 seconds, Table 5-4 shows
that 85.49% (5.21% of all occurrences) occur at stops that are neither terminals nor the first
or last stops of bus routes. The arrival time at these stops and those at stops with unknown
ordinal are considered as stops with unusual high dwell time and are disregarded. Refer to
Table 5-2 for details about the estimation of arrival times for the occurrences that fall under
each category.
Table 5-4 Characterization of stops with service time over 10 seconds
Stop category Occurrences Passengers
Stop with unknown ordinal 618 (2.68%) 3,327 (3.05%)
Stop neither first or last of a route nor terminal 19,743 (85.49%) 84,132 (77.14%)
Stop at intermediate terminals 273 (1.18%) 21,607 (19.81%)
First or last stop of bus route including terminal 2,459 (10.65%)
35
The stops with unusually high dwell time are analyzed spatially to identify the locations
where they occur. These correspond to 2,260 stops (48.0% of all stops), shown in Figure
5-3, and they occur especially along major corridors and in downtown (inset map). There
are also few stops with high dwell times on the outskirts and outside Montevideo.
Figure 5-3 Stops with high dwell time
Having identified the stops with high dwell times, itineraries are built for the bus runs. The
itineraries are built for over 97% of the daily bus runs. The remaining 3% cannot be built
due to bus runs with all passengers boarding at a unique stop or bus runs where all stops
had high dwell times.
Figure 5-4 shows an example of the itinerary for five buses serving the bus line 19, branch
number 205. The times highlighted in blue correspond to stops with no passenger boardings
and those in grey, to stops with high dwell times. The arrival times for these cells were
36
interpolated, using the arrival times from the previous and subsequent stops that are not
highlighted. Note that the last stops of runs, highlighted in yellow, are forecasted using the
time step of the interpolation immediately before.
The forecasting of arrival times reduces the unknown arrival time at stops after the last stop
with boarding records; the unknown arrival times are reduced from 24.1 % to 6.0% for
weekdays and from 26.3% to 6.0% for the weekend.
37
Figure 5-4 Example of itinerary
Bus line 19 Run 1 Run 2 Run 3 Run 4 Run 5
Branch 205 Start time Start time Start time Start time Start time
Monday, August 15 5:23:15 5:56:48 6:19:47 6:26:06 6:42:01
Stop ordinal Stop Arrival
time
Arrival
time
Arrival
time
Arrival
time
Arrival
time
1 3079 - - - - -
2 3017 05:31:44 06:10:43 06:21:35 06:32:48 06:46:28
3 3019 05:33:21 06:11:36 06:22:09 06:33:33 06:47:17
4 3020 05:33:51 06:12:29 06:22:44 06:34:10 06:48:07
5 3021 05:34:22 06:13:15 06:23:19 06:34:53 06:48:43
6 3022 05:35:04 06:14:00 06:23:54 06:35:32 06:49:19
7 3023 05:35:47 06:15:02 06:24:31 06:36:21 06:50:11
8 3024 05:36:52 06:15:44 06:25:10 06:37:45 06:50:52
9 3025 05:37:58 06:16:27 06:25:50 06:38:11 06:51:32
10 3026 05:39:06 06:17:09 06:26:55 06:39:03 06:52:13
… … … … … …
60 3733 06:16:16 06:59:23 07:08:07 07:22:21 07:37:15
61 3734 06:17:00 07:00:10 07:08:52 07:23:21 07:38:06
62 3735 06:17:43 07:00:58 07:09:38 07:24:20 07:39:07
63 563 06:18:27 07:01:46 07:11:06 07:26:00 07:40:59
64 564 06:19:12 07:02:35 07:11:37 07:26:42 07:41:23
65 565 06:19:58 07:03:30 07:12:13 07:27:25 07:42:02
66 4578 06:20:44 07:04:28 07:12:50 07:28:08 07:42:41
67 566 06:21:30 07:05:27 07:13:27 07:28:51 07:43:20
68 3924 06:22:16 07:06:25 07:14:04 07:29:33 07:44:00
69 570 06:23:02 07:07:24 07:14:40 07:30:16 07:45:00
70 3925 06:23:48 07:08:22 07:15:17 07:30:59 07:46:00
71 4580 06:24:34 07:09:21 07:15:54 07:31:42 07:47:00
72 4615 06:25:20 07:10:19 07:16:35 07:32:29 07:48:06
73 4909 06:26:05 07:11:18 07:17:15 07:33:16 07:49:12
74 4040 06:26:51 07:12:16 07:17:56 07:34:03 07:50:19
75 4041 06:27:37 07:13:15 07:18:36 07:34:50 07:51:25
76 4763 06:28:23 07:14:13 07:19:17 07:35:37 07:52:31
77 4764 06:29:09 07:15:12 07:19:57 07:36:24 07:53:37
78 4765 06:29:55 07:16:10 07:20:38 07:37:11 07:54:43
79 5086 06:30:41 07:17:09 07:21:19 07:37:59 07:55:50
80 4766 06:31:27 07:18:07 07:21:59 07:38:46 07:56:56
81 4767 06:32:13 07:19:06 07:22:40 07:39:33 07:58:02
38
Origin and Destination Estimation
The STM has a tap-on scheme that validates and records the passenger transactions when
boarding. While the data collected contains the times and locations of the boardings, the
alighting locations and times are unknown. This chapter presents a method to identify the
alighting locations of passengers, understand their individual travel behaviour, and observe
the O-D flows of all transit users. The method and results in this chapter distinguish
between smartcard and no-card users, as the data and approaches for each are different.
The methods consist on different scripts of code (Python 3.6) that output the results in text
and CSV files.
6.1 Method
The method has three goals: 1. Estimate the alighting locations and times of transit
transactions (from STM and no-card users) 2. Identify the origin and destination of trips for
STM users 3. Compute travel behaviour metrics such as travel times, transfer walking
distance, location, and time for STM users. These are similar to the goals proposed by
Trepanier et al. (2007) and M. A. Munizaga & Palma (2012), except for the incorporation
of no-card users. Therefore, the method for smartcard users is similar to the methods
proposed by these researchers and includes the improvements proposed by A. Alsger et al.
(2016).
The no-card transactions constitute a significant share of records and this study explores
integrating them into the OD estimation. The following paragraphs describe in detail the
method for smartcard users and are followed by two subsections: the first explains the
incorporation of smartcard transactions for which the method cannot be applied to; and the
second explains the incorporation of no-card transactions.
First, some terms are defined to help understand the goals of this method:
▪ A trip is defined as the travel from an origin (e.g. home) to a destination for a
specific purpose (e.g. work).
39
▪ Trips can have one or multiple legs, identified by the transfers between bus services,
and can have a walking and waiting time portion on the transfers.
▪ The daily trips made by a smartcard user that start and end around the same location
constitute a tour.
The STM card transactions can be either trips or legs of trips. These are differentiated by
the trip ordinal and the trip ID fields assigned by the system. Transactions that are trips
have unique trip IDs that are not shared with any other transactions; while the transactions
that are legs of trips share trip IDs with the other legs of the trip (transactions) and their
ordinals of trip are labeled chronologically with an ordinal of 1 for the first trip leg and so
on. Figure 6-1 shows a schematic example of trips, legs of trips, and a tour for a smartcard,
where the variables and indices refer to:
𝑛 = 𝑇𝑟𝑖𝑝 𝑛𝑢𝑚𝑏𝑒𝑟 (𝑇ℎ𝑒 𝑓𝑖𝑟𝑠𝑡 𝑡𝑟𝑖𝑝 𝑖𝑠 𝑛 = 1)
𝑙 = 𝐿𝑒𝑔 𝑜𝑓 𝑡𝑟𝑖𝑝 𝑛𝑢𝑚𝑏𝑒𝑟
𝑂𝑛 = 𝑂𝑟𝑖𝑔𝑖𝑛 𝑜𝑓 𝑡𝑟𝑖𝑝 𝑛
𝐷𝑛 = 𝐷𝑒𝑠𝑡𝑖𝑛𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡𝑟𝑖𝑝 𝑛
𝑎𝑛, 𝑙 = 𝑎𝑙𝑖𝑔ℎ𝑡𝑖𝑛𝑔 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 𝑡𝑟𝑖𝑝 𝑛, 𝑙𝑒𝑔 𝑜𝑓 𝑡𝑟𝑖𝑝 𝑙
𝑏𝑛, 𝑙 = 𝑏𝑜𝑎𝑟𝑑𝑖𝑛𝑔 𝑙𝑜𝑐𝑐𝑎𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 𝑡𝑟𝑖𝑝 𝑛, 𝑙𝑒𝑔 𝑜𝑓 𝑡𝑟𝑖𝑝 𝑙
𝑑 = 𝑤𝑎𝑙𝑘𝑖𝑛𝑔 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑠𝑡𝑜𝑝𝑠
→ 𝐷𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡𝑟𝑎𝑣𝑒𝑙 𝑎𝑛𝑑 𝑠𝑒𝑞𝑢𝑒𝑛𝑐𝑒 𝑜𝑓 𝑠𝑡𝑜𝑝𝑠 𝑓𝑜𝑟 𝑎 𝑏𝑢𝑠 𝑟𝑢𝑛
Figure 6-1 Schematic example of transactions for a smartcard user
40
From Figure 6-1 one can infer the data needed to estimate the alighting location for
transactions: the boarding location for the transactions, whether they are trips of legs of
trips; the direction and stop sequence for the routes that correspond to the transactions, the
road and sidewalk network and the geographic location of the stops to obtain the walking
distance between alighting and boarding stops. Additionally, the time of alighting can be
retrieved from the bus routes itineraries. For technical details of the algorithm refer to
Appendix C.
The method is an algorithm that integrates and organizes these data sources for the
transactions of each STM card. For a card’s transaction, the algorithm analyzes which of
the subsequent stops of the bus route is closest to the next transaction’s boarding stop. The
closest stop is estimated as the alighting stop. For the last transaction of the day, the
algorithm considers the first boarding stop of the day to estimate the alighting stop for this
last transaction. When the alighting stop is estimated the algorithm retrieves the time of
arrival of the bus at this stop from the itinerary.
The algorithm estimates the alighting location based on the following considerations:
▪ Alighting location must be different than boarding stop and must be at a stop
subsequent of boarding stop.
▪ The maximum walking range allowed is of 1,000 metres. This is calculated using
the network characteristics and the ArcGIS Network Analyst tool.
After all the transactions of a STM card are processed the algorithm identifies the origins,
destinations, and transfer locations for the trips as well as travel and transfer times. To
identify the origins, destinations, and transfer locations, the algorithm takes into
consideration the trip ordinal and the trip ID fields of each transaction, but does not solely
rely on these as passengers can pay one fare and make more than one trip. Many transit
systems allow passengers to pay one fare and use the transit system within a period of time
(Hickman, 2017), and in Montevideo, passengers can choose between a 1-hour or a 2-hour
fare to use the system.
41
The system does not discern between a trip and 2 legs of a trip if they occur within the
chosen fare type. However, with this algorithm it is possible to capture some of the trips
that are made by a passenger in one fare. The two considerations used to capture trips are:
▪ The passenger transaction is on the same line as the previous transaction. Taking the
same line in the same direction indicates that the passenger had two destinations on
the same path; taking the same line in the opposite direction indicates that the
passenger went to a destination and returned.
▪ Transfer time lasts more than 30 minutes.
Having identified trips and legs of trips and therefore locations for boardings, alightings,
transfers, and origins and destinations of trips it is possible to compute travel behaviour
metrics. For each transaction, the on-board time can be computed and for every trip the
travel and transfer times, the latter if a trip has more than one leg.
6.1.1 Incorporation of smartcard transactions without alighting location identified and single riders
The OD estimation for smartcard users requires users to have more than one transaction per
day. For some users, the alighting locations for their transactions could be estimated for one
day but not for other(s). Other smartcard holders make single transactions in certain days
and several transactions in other days for which the OD algorithm can be applied to. These
two types of users are the focus of this section.
The results from the OD method for the entire week can be used to assign alighting
locations to the transactions for which alighting could not be estimated or which were
single daily transactions. This is done by observing the transactions for each smartcard and
identifying similar transactions in other days for which alightings could be estimated. To
assess if transactions are similar, the criteria in Table 6-1 for spatial-temporal windows are
proposed.
42
Table 6-1 Criteria for OD incorporation of Smartcard users
Condition Value
Temporal Window 1 hour
Spatial Window 1 kilometre
Bus lines Same line
There is not a consensus in the literature for determining temporal and spatial windows for
smartcard regularity as mentioned in section 2.3. The proposed spatial and temporal
windows are similar to the wider ones used in the literature. The spatial window accounts
for different boarding locations within walkable distance and the temporal window relaxes
the timing requirements, allowing different start times of trips. The condition for a
passenger to board the same line limits the person from taking other lines that are similar or
travel between similar areas in the city, but ensures that the trip direction is the same.
It is important to note that the alighting locations assigned using this method might vary in
different days. To account for this, the most likely alighting location is selected as that with
higher frequency. Weekdays and the weekend are considered separately due to the
differences in passenger transactions and expected travel behaviour.
6.1.2 Incorporation of no-card users
The no-card users have not been considered in OD strategies as their transactions can not
be identified throughout the day or other days. The transactions made without a card are
individually stored on the system with distinct ticket IDs. As mentioned in section 3.1.1,
these users have different characteristics than smartcard users, with higher fares and unable
to make transfers.
These passengers have not been integrated in ODs in the literature as their behaviour could
be different than that of smartcard users (Schmöcker et al., 2017). However, these
transactions constitute a significant share of records and this study explores the integration
of these users into the OD methods.
The no-card transactions are integrated by using the travel patterns of smartcard users. The
travel patterns of smartcard users are analyzed by branch and time of the day (AM, Midday,
43
PM, and Overnight) for the weekdays and weekend separately. For each bus branch, the
stops are classified and assigned weights corresponding to the volumes of alightings. The
no-card passengers are assigned an alighting stop based on the weights for each stop.
For instance, for a bus branch in the AM period 21% of smartcard passengers alight at a
given stop in downtown. Thus 21% of the no-card passengers are assigned that stop as their
alighting location. The alighting volumes are rounded to the nearest whole number and then
balanced.
This approach has a deterministic nature and assumes similar behaviour between smartcard
and no-card passengers; however, it takes into consideration the weekly behaviour of
smartcard users and identifies the stops with high alighting volumes, which are likely to be
trip attractors for all passengers.
6.2 Results
The boardings and alightings of smartcard and no-card transactions and the origins and
destinations (OD) of smartcard users are the focus of this section. Due to the differences in
methodologies used for the STM and no-card users, the results are presented in two
subsections. However, the results of both users are compared spatially in a third subsection.
6.2.1 Analysis for smartcard users
The algorithm to estimate alighting locations and ODs for smartcard users is implemented
for all weekdays and weekends. The average alighting estimation for weekdays is of 87.7%
with a lower rate for the transactions on Friday. The rate for the weekends is significantly
lower and this can be attributed to the different and irregular travel behaviour expected on
weekends. The results and statistics for each day are included on Table 6-2.
The on-board and travel time are computed using the boarding times of the transactions and
the alighting times extracted from the itinerary. Around 0.5% of alighting times could not
be retrieved due to bus runs with all passengers boarding at a unique stop or bus runs where
all stops had high dwell times. The on-board and travel times are very similar for all
weekdays and weekend and the walking distance between alightings and subsequent
boarding locations is longer for weekdays than weekends.
44
Table 6-2 OD estimation results
Result
Indicators
Monday,
August 15
Tuesday,
August 16
Wednesday,
August 17
Thursday,
August 18
Friday,
August 19
Saturday,
August 20
Sunday,
August 21
Original
transactions
and cards
Transactions:
441,751
Cards:
155,693
Transactions:
440,622
Cards:
155,310
Transactions:
446,587
Cards:
157,251
Transactions:
445,577
Cards:
156,951
Transactions:
434,019
Cards:
152,870
Transactions:
226,341
Cards:
83,660
Transactions:
142,318
Cards:
53,728
Alighting
location
identification
387,940
transactions
(87.8%)
387,764
transactions
(88.0%)
392,321
transactions
(87.8%)
391,171
transactions
(87.8%)
377,384
transactions
(87.0%)
192,094
transactions
(84.9%)
119,153
transactions
(83.7%)
Average
walking
distance
between alight
and next
boarding
176.85 m
(209.86m
disregarding 0
metre
distances)
177. 81m
(210.52m
disregarding 0
metre
distances)
179. 08m
(211.82m
disregarding 0
metre
distances)
179.31 m
(212.21m
disregarding 0
metre
distances)
180.92m
(214.40m
disregarding 0
metre
distances)
171.50m
(201.28m
disregarding 0
metre
distances)
171.72m
(201.71m
disregarding 0
metre
distances)
Transactions
and cards with
complete trip
chains
Transactions:
304,397
(68.9%)
Cards:
105,655
(67.9%)
Transactions:
305,325
(69.3%)
Cards:
106,065
(68.3%)
Transactions:
307,351
(68.8%)
Cards:
106,673
(67.8%)
Transactions:
306,251
(68.7%)
Cards:
106,220
(67.7%)
Transactions:
288,064
(66.4%)
Cards:
100,375
(65.7%)
Transactions:
142,580
(63.0%)
Cards:
51,875
(62.0%)
Transactions:
87,824
(61.7%)
Cards:
32,171
(59.9%)
Time average
for trip chains
On-board:
18.7min
Trip: 30.4min
On-board:
18.7min
Trip: 30.1min
On-board:
18.8min
Trip: 30.1min
On-board:
18.8min
Trip: 30.0min
On-board:
18.6min
Trip: 30.1min
On-board:
17.2min
Trip: 29.8min
On-board:
18.4min
Trip: 29.9min
45
The alighting location estimation results can be analyzed for different times of the day, bus
runs, and card type holders. The results for Monday, August 15 are analyzed based on these
three categories as follows.
Table 6-3 shows the estimation rate for the four time periods. The transactions on the PM
and Overnight periods have a lower alighting estimation success rate. This could be due to
some passengers not returning to the origin location of their first trip of the day and
passengers having unusual travel behaviour on the overnight hours as identified by
Trepanier et al. (2007) as a reason for low success rate.
Table 6-3 Alighting estimation rate based on time period
Second, the transactions are characterized based on the type of bus runs where they occur
and the results are included in Table 6-4. There is a small percentage of transactions that
occur on validated runs but they have a lower alighting estimation rate. Recall the validated
runs are assigned a valid run based on where passengers board, however these runs could
take a different direction or cover additional stops for passengers to alight. Therefore, the
lower alighting estimation rate is reasonably lower for validated runs.
Table 6-4 Alighting estimation rate based on bus run
Thirdly, the transactions are categorized based on the STM card holder types. Instead of
analyzing the alighting estimation rate, the trip chains are compared to identify the users
with traceable trips. The trip chains are compared to the boardings per card type (Table 3-2)
and shown in Table 6-5.
Time Period Alighting location estimated No estimation
AM (4 a.m. to 11 a.m.) 115,313 (88.94%) 14,337 (11.06%)
Midday (11 a.m. to 3:30 p.m.) 123,827 (89.15%) 15,077 (10.85%)
PM (3:30 p.m. to 10 p.m.) 136,276 (86.22%) 21,786 (13.79%)
Overnight (10 p.m. to 4 a.m.) 12,524 (82.75%) 2,611 (17.25%)
Transaction type Transactions Alighting location estimated
Normal (no need to validate
runs)
403,637 (91.4%) 355,385 (88.0%)
Corrected (with validated runs) 38,114 (8.6%) 32,555 (85.4%)
46
The share of boardings per card type differs from the share for which the trip chains can be
estimated. The standard users represent 45.8% of cardholders, but 39.4% of the cards for
which trip chains are estimated. In contrast, for students (particularly Student A and
Student free) and for retired cardholders, the trip chain percentage is 1 to 3% higher than
their percentage as cardholders. These differences indicate more traceable travel patterns
for students and retired users, who make all legs and trips of their daily travel by transit;
and less traceable patterns for standard users, which means that these users are more likely
to use multiple modes (e.g. car, taxi, car-pooling) on their daily travel.
Table 6-5 Trip chains based on STM card type
STM card type Boardings Complete trip chains
Standard 397,034 (45.80%) 120,043 (39.40%)
Student A 170,134 (19.60%) 65,090 (21.40%)
Student B 21,448 (2.50%) 7,786 (2.60%)
Student Free 142,712 (16.5%) 59,041 (19.4%)
Retired A 44,317 (5.10%) 19,355 (6.40%)
Retired B 16,235 (1.90%) 7,305 (2.40%)
Social Work 29,330 (3.40%) 10,263 (3.40%)
Prepaid 23,608 (2.70%) 9,078 (3.00%)
Others 21,651 (2.50%) 6,436 (2.10%)
The next step is to integrate the smartcard users with transactions for which the alighting
location could not be identified and/or who have single transactions in other day(s). This is
done by observing the transactions for each smartcard and identifying similar transactions
in other days for which the alighting stops were estimated.
First, the algorithm analyzes the output of the OD estimation and identifies the smartcard
users with transaction for which the alighting stop could not be estimated. The alighting
location for these transactions is assigned as the location of similar transactions of those
users during the other weekdays.
The alighting locations that could be assigned represent an average of 13.3% of the
transaction with unknown alightings for the weekdays but they only add a 1.64 % to the
47
total transactions (Original transactions in Table 6-2). The incorporation for weekends is
significantly lower and the results are shown on Table 6-6.
Table 6-6 Assignment of alighting location to transactions with missing alighting
location
Day Transactions with
assigned alighting
location
Percentage from the
transactions without
estimated alighting
Percentage from the
original transactions
Monday 7,282 13.53% 1.65%
Tuesday 7,056 13.35% 1.60%
Wednesday 7,378 13.60% 1.65%
Thursday 7,366 13.54% 1.65%
Friday 7,080 12.50% 1.63%
Saturday 341 1.00% 0.15%
Sunday 387 1.67% 0.27%
Second and similarly to the smartcards with similar transactions in other days, the
algorithm identifies smartcard users that have one transaction in at least one day. The
transactions in other days are analyzed and compared in terms of spatial and temporal
similarity. If the transactions are similar, the alighting locations are assigned to the single
transactions.
For the single STM users, Table 6-7 shows the single riders and those with an assigned
alighting location. The percentages of estimated alighting are very similar for all weekdays
and significantly lower for weekends.
Table 6-7 Assignment of alighting location to single riders
Day Single riders Estimated Alighting
Monday 37,925 7,456 (19.6%)
Tuesday 36,778 7,547 (20.5%)
Wednesday 37,101 7,602 (20.5%)
Thursday 37,701 7,500 (19.9%)
Friday 38,942 7,516 (19.3%)
48
Saturday 29,935 366 (1.2%)
Sunday 23,362 529 (2.3%)
Figure 6-2 shows the number of STM cards from Monday (with unknown alighting
location) that have similar boardings on the other weekdays. The cards that have
transactions with similar boardings on more than one day are assigned the alighting
location with higher frequency.
Figure 6-2 STM cards with similar transactions on the other weekdays for
transactions with unknown alighting location (left) and single riders (right)
6.2.2 Analysis for no-card users
The no-card transactions are integrated into this study by using the travel patterns of
smartcard users. The alightings are analyzed for each bus branch during the week for the
four time periods; the percentages of alighting at each stop are computed and assigned to
the no-card transactions.
Figure 6-3 shows an example of the transactions in the AM for branch number 205 during
the AM period with the boardings and alightings from STM and no-card passengers. The
alightings for smartcard users are estimated using the OD method and the incorporation
methods discussed in section 6.2.1. On the other hand, the alightings for no-card users are
determined using the weekly percentage of alightings for branch 205 on the AM period.
Note the residual passengers (STM users) on the stop “Unknown”; the alighting location
for them could not be estimated.
49
The incorporation of no-card users also allows to compute bus load profiles. Figure 6-4
shows the loading profile of one of the morning bus runs for branch number 205. The load
profiles can be studied for buses and loads can be analyzed for different bus lines, time
periods, and corridors.
50
Figure 6-3 Alighting location assignment to no-card passengers
Bus line 19 Branch 205 Total smartcard AM passengers: 1127 Total no-card AM passengers: 760
Stop Stop Ordinal Smartcard Smartcard
share of
alightings
No-card
Boardings Alightings Boardings Alightings
3079 1 1 2 0
3017 2 30 21 0
3019 3 8 1 0.10% 2 0
3020 4 21 15 0
3021 5 21 1 0.10% 4 1
3022 6 5 1 0.10% 4 1
3023 7 43 2 0.19% 55 1
3024 8 8 2 0.19% 6 1
3025 9 14 1 0.10% 15 1
3026 10 20 7 0.68% 12 5
3027 11 14 1 0.10% 7 1
3028 12 33 1 0.10% 6 1
3029 13 12 10 0.97% 7 7
3524 14 59 86 8.36% 36 65
3493 15 1 1 0.10% 2 1
… … … … … … …
565 65 2 9 0.87% 3 7
4578 66 5 0.49% 0 4
566 67 24 2.33% 1 18
3924 68 3 0.29% 4 2
570 69 1 0.10% 0 1
3925 70 1 9 0.87% 1 7
4580 71 16 21 2.04% 4 16
4615 72 10 0.97% 0 7
4909 73 18 1.75% 0 13
4040 74 18 13 1.26% 15 10
4041 75 1 57 5.54% 3 42
4763 76 4 0.39% 0 3
4764 77 18 1.75% 0 13
4765 78 2 0.19% 0 1
5086 79 10 0.97% 0 7
4767 81 15 1.46% 0 11
Unknown 99
51
Figure 6-4 Bus loading profile for all passengers
0
10
20
30
40
50
60
0
2
4
6
8
10
12
14
16
18
20
22
24
26
30
79
30
19
30
21
30
23
30
25
30
27
30
29
34
93
35
70
35
72
35
75
31
58
35
55
35
57
27
37
27
39
27
41
27
43
37
18
37
20
36
53
36
15
36
10
36
03
14
72
14
74
16
17
49
23
15
73
15
75
37
34
56
3
56
5
56
6
57
0
45
80
49
09
40
41
47
64
50
86
Un
kno
wn
Pas
sen
gers
on
-bo
ard
Pas
sen
ger
bo
ard
ings
an
d a
ligh
tin
gs
Stops
Boardings Alightings Load
52
6.2.3 Spatial analysis of travel behaviour
The origins, destinations, and transfers from STM and no-card transactions can be
visualized at any level of spatiotemporal aggregation. In this study, the transactions are
aggregated per census segment and transfers are analyzed at the disaggregate stop level.
Using ArcMap 10.2.2, here are presented some maps depicting the smartcard and no-card
travel behaviour on Monday. The maps contain an inset map for the downtown area and are
followed by a short description of the observed travel behavior and transfer locations.
▪ Figure 6-5 AM Trip origins
▪ Figure 6-6 AM Trip destinations
▪ Figure 6-7 AM boardings for no-card users
▪ Figure 6-8 AM alightings for no-card users
▪ Figure 6-9 PM Trip origins
▪ Figure 6-10 PM Trip destinations
▪ Figure 6-11 AM Transfers
▪ Figure 6-12 PM Transfers
53
Figure 6-5 AM Trip origins
54
Figure 6-6 AM Trip destinations
55
Figure 6-7 AM boardings for no-card users
56
Figure 6-8 AM alightings for no-card users
57
Figure 6-9 PM Trip origins
58
Figure 6-10 PM Trip destinations
59
Figure 6-11 AM Transfers
60
Figure 6-12 PM Transfers
61
The origins of trips in the AM period occur around the urban periphery and the high urbanized
areas on the northeast and northwest of Montevideo, as shown in Figure 6-5. There are also
many trips that originate in the downtown. Figure 6-6 shows that the destinations of these trips
occur on few census tracks, particularly in or close to downtown. There are also some clusters of
census segments in the east and northeast part of the city with moderate volumes of trips
destination.
Note that the trip destinations exceed the trip origins in the downtown. The destination volumes
in the downtown are high with volumes between 145 and 900 person-trips per census segment,
while the trip origin volumes are between 145 and 264 person-trips.
The boardings and alightings of no-card users in the AM, shown in Figure 6-7 and Figure 6-8,
are similar to the trip origins and destinations of smartcard users. The boardings occur across the
city and suburban areas, while the destinations occur mainly in the downtown area. There are
also few census segments on the northeast with numerous alightings.
The origins for trips in the PM period, in Figure 6-9, occur mainly in the downtown and there are
several segments with high volumes on the west and a single segment on the east side of the city.
This is interesting as the area on the west is rural. An inspection using Google maps reveals that
there are multiple hotels, industrial parks, sports complexes, and farms on the west segments and
the airport is on the east segment. The location of these places explains the high volumes of trip
origins as employees return home from work. Conversely, Figure 6-10 depicts the destinations of
the PM trips which are distributed all over the city, similarly to the AM trip origins.
The transfers during both the AM and PM time periods, shown in Figure 6-11 and Figure 6-12,
occur at specific locations: along major roads, the downtown area, terminals, and major stops.
The roads with the most transfers run from downtown to the west and to the northeast. As
expected, there are many transfers on the terminals, identified on the maps as yellow triangles.
Additionally, there are few stops with high transfer volumes in the periphery.
The AM and PM origins and destinations are similar to the ones on A. A. Alsger et al. (2015).
The morning origins and evening destinations are spread out throughout the city and the morning
62
destinations coincide with the evening origins. The latter pinpoint hot spots in the Central
Business District (CBD) and few hot spots in specific areas of the city.
63
Analysis of Travel Survey Riders and Smartcard Users
In this chapter the transit riders from the MHMS are compared with the STM card users. This
comparison is done at an aggregate level and then at a disaggregate level to identify the
individuals form the survey with STM cards. This sections on this chapter describe the method
and present the results.
7.1 Method
The OD estimation for smartcard users, particularly the trip chains, are of interest as they can be
joined with the transit trips on the MHMS. Even though these two datasets are different, they are
compared based on aggregate metrics such as legs and trips per person, and at a disaggregate
level by pairing the survey individuals with smartcard users using the trips’ locations and times.
The comparison at the disaggregate level takes into consideration the differences of the data. For
the MHMS, the locations are provided at the census segment level and the boarding and alighting
times are not provided but can be calculated. The boarding time is computed as the start time of
the trip minus the reported walking distance from the origin of the trip to the bus stop and the
wait time. The alighting time is similarly computed by subtracting the walking time from the
reported end of the trip. For the estimated smartcard trip chains, the boarding and alighting
locations and times are provided at the stop level and the closest second.
Moreover, the MHMS data is spatially assigned to the closest census segment and temporally
reported by individuals to the start and end of their trips to the closest 5 or 10-minute mark. In a
similar study, Spurr et al. (2015) noted that the respondents in surveys tend to approximate times
to the nearest half or quarters of hours. This approximation of time and distance induces bias
when joining of the datasets and is handled by using spatial and temporal windows.
To account for spatial and temporal precision differences this study uses spatial and temporal
windows. The spatial window considers neighbouring census segments, as this is smallest level
of aggregation. This window accounts for stops that could be reported in a stop located at a
corner of one segment that is common to 3 other segments. The temporal window is adjusted
based on the matching rate. The matching process verifies that the matching smartcard has the
64
lowest spatial and temporal differences to the MHMS individual, assimilating the method
proposed by Spurr et al. (2015). In their study there are spatial and temporal windows for which
precision is adjusted if there are many smartcards matches for one individual.
The method is applied for each weekday of the available data (August 15-August 19) as the
MHMS data was collected on days between August and October of 2016, but the specific dates
are unknown.
7.2 Results
This section has two subsections: The first one compares the MHMS with smartcard data at an
aggregate level and the second describes the results for identifying individuals with transit trips
on the MHMS with smartcards that have complete trip chains (recall the trip data collected by
the survey shown in Figure 3-2).
7.2.1 Comparison of MHMS with smartcard data
The report CAF et al. (2017) provides preliminary analyses of the data collected. While most of
the analyses about the STM include mode share, opinions about the system, and user
characteristics, there is a histogram with the travel time distributions.
Figure 7-1, compares histograms for the travel time distributions of bus trips to verify that the
queried data is representative. Also, in CAF et al. (2017) the average trip time for bus users is of
46 minutes and the queried data average trip time is of 46.2 minutes.
65
Figure 7-1 Histograms of bus trips. Queried data (left) and using all MHMS data retrieved
from CAF et al. (2017) (right)
It is also necessary to compare the queried data to the smartcard data. The data is compared in
Table 7-1 and graphically in Figure 7-2. For the smartcard data, the bus trips per person are
computed using the single cards from Query #2 and the trip chains, and the legs per trip using the
trip chains. Note that the trip chains correspond to 67.9% of STM users, thus 67.9% of the results
from the Query#2 are used.
Table 7-1 Comparison of legs and trips for MHMS and STM data
Comparison condition MHMS STM data Notes
Bus trips per
person (percentage)
1 15.09 21.23 Using 67.9% of single cards (Query #2)
2 66.53 60.51
3 10.53 11.72
4 6.85 5.61
>4 0.99 0.92
Average 2.13 2.04
Total 2,150 248,605 222,816 correspond to trip chains
Legs per trip
(percentage)
1 79.78 71.50
2 17.73 26.85
3 2.33 1.33
4 0.15 0.30
>4 0 0.01
Average 1.25 1.30
Total 2,572 304,397
66
Figure 7-2 Histogram of legs and trips for MHMS and STM data
The average trips per person and legs per trip are similar for the queried data and the smartcard
users. There are significant differences in the cells highlighted in blue: the share of single trips
from the smartcard data is higher than the reported trips in the survey, but the share for two trips
is lower. Conversely, the one-legged trips represent a higher percentage on the survey but the
two-legged ones a lower one. These differences range from 6% to 9% but interestingly there are
not as large as those in Spurr et al. (2015), who compared the travel survey responses with the
smartcard data in Montreal.
An interesting piece of information collected by the MHMS is the trip frequency per week6. The
reported frequency is compared with the STM cards that have trip chains in at least one weekday
in Figure 7-3. The frequency reported in the survey is considerably different than the frequency
observed for STM users. MHMS individuals reported that over half of their reported trips are
only made once per week but only 39% of STM cards do not similar trip chains in the other
weekdays. Moreover, the reported trips with a 5-day frequency have a higher percentage than the
ones in the STM data. Note that these differences can be attributed to only considering users with
complete trip chains and variations in travel behaviour (e.g. boarding and alighting at different
locations farther than one kilometre apart, taking different bus lines, and traveling at different
times of the day).
6 Individuals are asked “How often (in days per week) do you make this same trip?”(p. 33) Translated from Spanish
from Montevideo et al. (2016)
67
Figure 7-3 Comparison of trip frequency for MHMS and STM data
Having compared the travel behaviour of MHMS individuals with the smartcard trip chains, the
MHMS individuals are identified in the smartcards. The MHMS are reported for the day the
individuals are interviewed, hence the comparison to smartcards is done for the five weekdays
separately. The 143 individuals that only have one-legged trip are deleted because their
transactions cannot be matched with smartcards from the OD methods. The remaining 864
individuals are used for the identification.
7.2.2 Pairing MHMS individuals with STM cards
The identification of individuals in the STM dataset uses spatial and temporal windows. While
the locations of the MHMS trips are known, the boarding and alighting times are not. The start
and end times of the trips are reported with the walking distance to and from the stops, and the
waiting time before boarding. Table 7-2 includes the strategies for computing alighting and
boarding times.
Table 7-2. Strategies to compute boarding and alighting times
Considerations Strategy Notes
Walking time (MHMS
reports walking distance as
number of blocks)
Assume walking speed 1.05
m/s (3.5 ft/s)
3.5 ft/s recommended by the
FHWA (2009)
Walking distance not
reported
Assume 2 blocks Average distance is 1.3 blocks
for board and alight
Waiting time not reported Assume 10 minutes Average waiting time is 10
minutes
68
The method is applied for each weekday and two ways of identifying MHMS individuals are
proposed: the first one identifies individuals based on the board and alight times and locations,
and the second one, only based on their board times and locations. The results are shown in
Table 7-3 and the column “Increase rate” shows the percentage increase in individual
identification when only the boardings are matched.
Table 7-3 MHMS identification of individuals for different temporal windows
Day
Time
window
(minutes)
Board and alight Board only Increase rate
Monday
20 39 57 46.15%
30 60 72 20.00%
40 78 87 11.54%
60 92 137 48.91%
Tuesday
20 28 41 46.43%
30 59 62 5.08%
40 78 92 17.95%
60 94 133 41.49%
Wednesday
20 42 61 45.24%
30 69 71 2.90%
40 88 96 9.09%
60 107 146 36.45%
Thursday
20 31 42 35.48%
30 62 74 19.35%
40 89 89 0.00%
60 106 130 22.64%
Friday
20 30 43 43.33%
30 66 78 18.18%
40 82 85 3.66%
60 97 126 29.90%
The number of individuals identified increases as the time window is expanded but the match
rate of the 864 individuals is low: less than 10% with the 20 and 30-minute windows, and around
15% with the 60-minute one.
The days with higher matches are Monday and Wednesday; making these days as the most likely
days when individuals were interviewed (assuming they were interviewed the same day).
Another interesting observation is the decline of the increase rate at the 40-minute window for
most days.
69
Using this time window, the MHMS individuals and smartcard pairs are observed across the
week, identifying the number of days where MHMS individuals are paired with the same STM
card. Table 7-4 shows the results for this analysis and includes the pairs for which the reported
frequency matches the observed pair frequency.
Table 7-4 Weekly analysis of MHMS and STM card pairs
Number of days Pairs (Board and alight) Pairs (Board only)
1 103 130
2 14 72
3 14 9
4 13 3
5 27 1
Total 171 215
Match between
reported
frequency and
pair
42 81
The results for the pairs using only the boarding data capture more pairs but 94% of the pairs are
observed in one and two days. Meanwhile, for the board and alight 32% of the pairs are observed
in more than two days. This difference is striking and evidences that considering boarding and
alighting information could help identify regular transit riders. Furthermore, the low number of
pairs matched using only the boarding information can be attributed to the matching process
which pairs the STM card with the most similar transactions for each day, but does not take into
account the matched pairs of the other days.
70
Discussion and Conclusions
The previous chapters have provided a look into the potential of analysing AFC data for planning
purposes, analysis of the STM transit system, and integration with travel survey data. The
methods in this study can provide metrics and results for these objectives, as shown on the
results and examples, but they have strengths and weaknesses that need further discussion.
Moreover, some of these weaknesses arise from the collection methods and data availability,
reflected in all methods and their results. The challenges from using the STM dataset are
described in the following paragraphs before diving into each method.
The AFC system of the STM collects high quality data for passenger transactions, which include
the bus runs and the boarding locations and times. The location and bus run are not usually
collected on bus AFC systems and this is an advantage reflected in all the methods: from
computing dwell times of each occurrence to identifying the origins and destinations of trips. In
contrast, the STM network data (bus and branches dataset) was incomplete: 66% of the bus runs
were valid or validated. This caused the removal of half of the smartcard transactions and cards
and 15% of the no-card transactions for the OD procedures.
It is reasonable to believe that applying the method to a complete AFC dataset with updated
network data, or counting with AVL data to reconstruct trajectories of bus runs, would provide
similar results to the ones in this study. This is because the transactions on missing bus runs are
alike those on valid runs and the boardings on invalid runs do not have distinguishable
characteristics spatially nor temporally.
Having explored the challenges of the datasets, the next chapters present the strengths and
weaknesses for each method and potential reasons of the results obtained. These are introduced
by briefly describing the relevant assumptions.
On Chapter 5, the itineraries are built considering a threshold of 10 seconds for acceptable
passenger service time. This threshold was determined by considering the boarding transactions
only. The alightings, bus characteristics, and the loads of buses usually included in dwell time
71
calculations were disregarded. Including these factors requires to validate the OD methods and to
count with additional information about characteristics of the buses.
Even though the 10-second threshold has limitations, it is rigorous and captures 92% of the
occurrences and 95% of passengers. The high dwell times for these occurrences could be
attributed to the unaccounted alighting volumes, to boardings that occur during red traffic lights
and/or at stops close to schools or popular locations, or to passengers with physical disabilities.
Moreover, the bus runs with built itineraries represent 97% of all the runs. The remaining 3%
could not be estimated due to passengers boarding at only one stop or all boarding occurrences
having high dwell times that could not be considered. The bus runs with built itineraries are used
to determine the alighting times in the OD method for smartcards, and this was successfully done
for 95.5% of the transactions.
On Chapter 6, the method to estimate the alighting locations of transactions for smartcard users
has similar assumptions to methods previously proposed (Refer to section 2.1) but the estimation
results in this study are significantly higher: 88% for weekdays and 84% for weekends. This can
be attributed to several reasons: details of transactions collected (exact stop location and bus
route), passenger behaviour and transit culture and, characteristics of the urban environment.
The passenger behaviour and transit culture is reflected in the high percentage of passengers with
complete trip chains: 67.5% for weekdays and 61% for weekends. These passengers and trip
chains are important for future applications to infer trip purposes and analyze user regularity,
mentioned in Chapter 9.
The user regularity is briefly explored in Chapter 6 to determine the alighting location of
smartcards that have single transactions in one or more days and transactions for which alighting
could not be estimated using the OD method. Similar transactions are considered as those that
occur on the same bus line and happen within the specified spatial and temporal windows.
A weakness of these considerations is not accounting for bus lines and branches that travel
between the same areas or in parallel streets in Montevideo. This can be a reason of the low
success rate of 20% for single transactions and 13.3% for the transactions with missing alighting
location. Another reason could be irregular travel behaviour, but this could only be captured by
analyzing data for longer periods of time such as weeks or months.
72
Chapter 6 also includes no-card riders, which is something that had not been explored before.
Researchers recommend not including no-card passengers in smartcard studies, but given that
these passengers account for 32% of the transactions, they are included in this study. The
strategy proposed to assign the alighting locations of these users’ transactions assumes that their
travel behaviour is alike that of smartcard users. The implications of this assumption fall outside
the scope of this study and ways to study them are included on Chapter 9.
Lastly, Chapter 7 has the most challenges of all chapters as the STM and MHMS datasets are
inherently different and the dates of collection for the MHMS are unknown (households are
interviewed door by door during different days). The comparison of travel patterns between STM
users and MHMS individuals are similar to a certain extent.
Conversely, the pairing process between MHMS and STM cards has a very low matching rate
(10-15%). This low percentage was expected as similar studies have found only 40-50% of pairs
using high quality data and even the card IDs of households interviewed.
This study has continued to reinforce the potential of smartcard data as a powerful source of data
for transit studies. The methods proposed in this study use and incorporate the data sources
available, taking into consideration the data limitations. Even though the methods and their
assumptions have limitations and weaknesses, the results reveal the usefulness of these methods
for processing AFC data for transit planning purposes and computing and evaluating the system
operations and the transit network.
A final consideration for the methods is the time efficiency. The Python scripts were run in an
Intel Core i7 @ 3.40GHz machine with 16GB of RAM, and the script for the OD method of
smartcards with subsequent transactions had the longest processing time of 30 minutes per day of
transactions. The second longest was for pairing the MHMS individuals with STM users, at 5
minutes per day. The rest of the scripts took less than 1 minute to process.
73
Limitations and Future Work
There are limitations in terms of the amount and quality of the data available, and several
improvements that can be made to enhance the analysis and processing of data, and validate the
results. Overcoming the limitations and implementing the improvements would help to obtain
more accurate and comprehensive results and a better understanding of travel behaviour. These
are mentioned in the following bullet points in order of relevance, beginning with the most
important ones.
▪ The OD method was applied to only 50% of the smartcard transactions available, mainly
due to invalid bus runs (Refer to sections 3.1.1 and 4.3 for details). While using the
method to validate bus runs was the most effective way to incorporate more data, this
highlights the need for transit agencies to update their network datasets.
▪ The weekly transactions and travel behaviour of smartcards are used to incorporate
smartcards with single transactions and transactions for which alighting could not be
estimated. Using a week of data reveals some level of travel behaviour regularity for the
weekdays but not for weekends. Access to more data would provide better insights of
travel patterns for both weekdays and weekends.
▪ Validation of the results must be done to assess the proposed methods and results. The
itineraries can be validated using the schedule data (recently standardized and digitalized)
and AVL data. These sources of data can also be used to measure on-time performance
and adherence to schedules. The OD methods and incorporation of single-riders can be
validated using Automated Passenger Counter (APC) systems and by conducting on-
board surveys or collecting the transaction receipts when passengers alight.
▪ There is a significant share of the transit riders that do not use STM cards. This study
assumed that these riders have similar behaviour as smartcard users, however this
assumption cannot be verified nor rejected until the behaviour of these users is studied
empirically. This might not be necessary if all riders are required to use smartcards,
which is a recent trend adopted by transit agencies.
74
▪ The OD method can be improved by incorporating some of the additional
recommendations on M. Munizaga et al. (2014), described on section 2.3. This will likely
improve the alighting estimation and trip differentiation methods.
▪ The process of building itineraries provided a glimpse to dwell time models. A full model
could be developed using the boarding and alighting volumes, such as the one proposed
by Sun, Tirachini, Axhausen, Erath, & Lee (2014). These researchers model dwell time,
as defined in this study, using the boardings and alightings from smartcard data.
▪ There are two aspects to improve the identification of similar transactions by smartcard
users. First, transactions are only considered similar if the user rides the same bus line; by
considering bus lines that cover comparable paths or areas in the city, more transactions
could be regarded as similar. Second, a sensitivity analysis could be done using different
spatial-temporal windows.
▪ The low matching rate for identifying MHMS individuals with STM cards could be
increased by collecting additional data in the survey: the date households are surveyed
and the card IDs of individuals.
▪ This study presents some examples of the results and data analysis that can be done. The
transit agency could use the results, or use the methods to produce additional results, to
analyze specific bus lines, time periods, sectors of the city and/or the travel behaviour of
the different STM card types.
In addition to the future work proposed for the methods and results of these study, there are a
myriad of other applications for smartcard data analysis. To mention a few, understanding travel
patterns and ridership for different seasons and changes in the network or during disruptions,
identifying locations or routes with high volumes to implement changes, analyzing transit
assignment, and integrating into agent-based transportation models. Furthermore, some of the
limitations of the passively collected smartcard data can be overcome by inferring trip purpose
and collecting demographic data from users, or by actively collecting this through transit
surveys.
75
References
Alsger, A. A., Mesbah, M., Ferreira, L., & Safi, H. (2015). Use of Smart Card Fare Data to
Estimate Public Transport Origin–Destination Matrix. Transportation Research Record:
Journal of the Transportation Research Board, 2535, 88–96. https://doi.org/10.3141/2535-
10
Alsger, A., Assemi, B., Mesbah, M., & Ferreira, L. (2016). Validating and improving public
transport origin-destination estimation algorithm using smart card fare data. Transportation
Research Part C: Emerging Technologies, 68, 490–506.
https://doi.org/10.1016/j.trc.2016.05.004
Barry, J., Newhouser, R., Rahbee, A., & Sayeda, S. (2002). Origin and Destination Estimation in
New York City with Automated Fare System Data. Transportation Research Record:
Journal of the Transportation Research Board, 1817(2), 183–187.
https://doi.org/10.3141/1817-24
Beltrán, P., Cortés, C. E., Gschwender, A., Ibarra, R., Munizaga, M., Palma, C., … Zúñiga, M.
(2011). Obtención de información valiosa a partir de datos de Transantiago Pablo Beltrán,
Cristián E. Cortés, Antonio Gschwender, Richard Ibarra, Marcela Munizaga, Carolina
Palma, Meisy Ortega, Mauricio Zúñiga.
CAF, Montevideo, I. de, Canelones, I. de, Jose, I. de S., Publicas, M. de T. y O., Republica, U.
de la, & Uruguay, P. (2017). Principales Resultados e Indicadores - Encuesta de Movilidad
en el Área Metropolitana de Montevideo 2016. Montevideo.
Chu, K. K., & Chapleau, R. (2010). Augmenting Transit Trip Characterization and Travel
Behavior Comprehension. Transportation Research Record, 2183, 29–40.
https://doi.org/10.3141/2183-04
FHWA. (2009). Chapter 4F - MUTCD 2009 Edition - FHWA. Retrieved March 1, 2018, from
https://mutcd.fhwa.dot.gov/htm/2009/part4/part4e.htm
Fourie, P. ., Erath, A., Ordonez, S. A., Charikov, A., & K.W, A. (2017). Using Smartcard Data
for Agent-Based Transport Simulation. In F. Kurauchi & J.-D. Schmocker (Eds.), Public
76
Transport Planning with Smart Card Data (pp. 133–160). Boca Raton.
Gordon, J. B. (Jason B. (2012). Intermodal passenger flows on London’s public transport
network : automated inference of full passenger journeys using fare-transaction and vehicle-
location data. Retrieved from https://dspace.mit.edu/handle/1721.1/78242#files-area
He, L., & Trépanier, M. (2015). Estimating the Destination of Unlinked Trips in Public
Transportation Smart Card Fare Collection Systems. Transportation Research Board 94th
Annual Meeting. https://doi.org/10.3141/2535-11
Hemily, B. (2015). The Use of Transit ITS Data for Planning and Management , and Its
Challenges ; a Discussion Paper.
Hickman, M. (2017). Transit Origin-Destination Estimation. In F. Kurauchi & J.-D. Schmocker
(Eds.), Public Transport Planning with Smart Card Data (pp. 15–35). Boca Raton: CRC
Press.
INE. (2009). Encuesta Continua de Hogares (Vol. 2). Montevideo. Retrieved from
http://www.marispolymersecuador.com/DOCS/FichasTecnicas/MARISEAL 250° .pdf
Jang, W. (2010). Travel Time and Transfer Analysis Using Transit Smart Card Data.
Transportation Research Record: Journal of the Transportation Research Board, 2144,
142–149. https://doi.org/10.3141/2144-16
Kieu, L. M., Bhaskar, A., & Chung, E. (2015). Passenger segmentation using smart card data.
IEEE Transactions on Intelligent Transportation Systems, 16(3), 1537–1548.
https://doi.org/10.1109/TITS.2014.2368998
Kusakabe, T., & Asakura, Y. (2017). Combination of Smart Card Data with Person Trip Survey
Data. In F. Kuruauchi & J. D. Schmocker (Eds.), Public Transport Planning with Smart
Card Data (pp. 73–92). Boca Raton: CRC Press.
Ma, X., Liu, C., Wen, H., Wang, Y., & Wu, Y. J. (2017). Understanding commuting patterns
using transit smart card data. Journal of Transport Geography, 58, 135–145.
https://doi.org/10.1016/j.jtrangeo.2016.12.001
77
Metropolitano, S. de T. (2015). ESPECIFICACIÓN OPERATIVA Y DEL SISTEMA FASE I :
TRANSPORTE COLECTIVO URBANO. Montevideo.
Miller, E. J., Parada Hernandez, C., & Habib, K. M. N. (2017). Report-02 REVIEW OF THE
MONTEVIDEO HOME MOBILITY SURVEY.
Montevideo, I. de. (2018). Sistema de Transporte Metropolitano (STM). Retrieved March 1,
2018, from http://www.montevideo.gub.uy/transito-y-transporte/el-sistema
Montevideo, I. de, Jose, I. de S., Canaria, C., MTOP, Transporte, C. M. de, CAF, & Uruguay, P.
(2016). Encuesta de movilidad metropolitana. Retrieved from
http://www.consorciozaragoza.es/content/encuesta-de-movilidad-metropolitana
Morency, C., Trépanier, M., & Agard, B. (2007). Measuring transit use variability with smart-
card data. Transport Policy, 14(3), 193–203. https://doi.org/10.1016/j.tranpol.2007.01.001
Munizaga, M. A., & Palma, C. (2012). Estimation of a disaggregate multimodal public transport
Origin-Destination matrix from passive smartcard data from Santiago, Chile.
Transportation Research Part C: Emerging Technologies, 24, 9–18.
https://doi.org/10.1016/j.trc.2012.01.007
Munizaga, M., Devillaine, F., Navarrete, C., & Silva, D. (2014). Validating travel behavior
estimated from smartcard data. Transportation Research Part C: Emerging Technologies,
44, 70–79. https://doi.org/10.1016/j.trc.2014.03.008
Nassir, N., Khani, A., Lee, S., Noh, H., & Hickman, M. (2011). Transit Stop-Level Origin-
Destination Estimation Through Use of Transit Schedule and Automated Data Collection
System. Transportation Research Record: Journal of the Transportation Research Board,
2263, 140–150. https://doi.org/10.3141/2263-16
Park, J. Y., Kim, D.-J., & Lim, Y. (2008). Use of Smart Card Data to Define Public Transit Use
in Seoul, South Korea. Transportation Research Record: Journal of the Transportation
Research Board, 2063(2063), 3–9. https://doi.org/10.3141/2063-01
Riegel, L. (2013). Utilizing Automatically Collected Smart Card Data to Enhance Travel
Demand Surveys. MIT theses. Massachusetts Institute of Technology.
78
Robinson, S., Narayanan, B., Toh, N., & Pereira, F. (2014). Methods for pre-processing
smartcard data to improve data quality. Transportation Research Part C: Emerging
Technologies, 49, 43–58. https://doi.org/10.1016/j.trc.2014.10.006
Schmöcker, J.-D., Kurauchi, F., & Shimamoto, H. (2017). An Overview on Opportunities and
Challenges of Smart Card Data Analysis. In F. Kurauchi & J.-D. Schmöcker (Eds.), Public
Transport Planning with Smart Card Data (pp. 2–12). Boca Raton: CRC Press.
https://doi.org/978-1-4987-2659-7
Seaborn, C., Attanucci, J., & Wilson, N. H. M. (2009). Using Smart Card Fare Payment Data To
Analyze Multi-Modal Public Transport Journeys in London. Transportation Research
Record: Journal of the Transportation Research Board 2121.-1, (2121), 55–62.
https://doi.org/10.3141/2121-06
Spurr, T., Chu, A., Chapleau, R., & Piché, D. (2015). A smart card transaction “travel diary” to
assess the accuracy of the Montréal household travel survey. Transportation Research
Procedia, 11, 350–364. https://doi.org/10.1016/j.trpro.2015.12.030
Sun, L., Tirachini, A., Axhausen, K. W., Erath, A., & Lee, D. H. (2014). Models of bus boarding
and alighting dynamics. Transportation Research Part A: Policy and Practice, 69, 447–
460. https://doi.org/10.1016/j.tra.2014.09.007
Transit Capacity and Quality of Service Manual. (2003). Washington, D.C.
Trepanier, M., & Morency, C. (2017). Evaluation of Bus Service Key Performance Indicators
Using Smartcard Data. In F. Kurauchi & J. . Schmoker (Eds.), Public Transport Planning
with Smart Card Data (pp. 181–196). Boca Raton: CRC Press.
https://doi.org/10.1007/s11947-009-0181-3
Trépanier, M., Morency, C., & Agard, B. (2009). Calculation of Transit Performance Measures
Using Smartcard Data, 79–96.
Trepanier, M., Tranchant, N., & Chapleau, R. (2007). Individual Trip Destination Estimation in a
Transit Smart Card Automated, 11(1), 1–14. https://doi.org/10.1080/15472450601122256
Zou, Q., Yao, X., Zhao, P., Wei, H., & Ren, H. (2016). Detecting home location and trip
79
purposes for cardholders by mining smart card transaction data in Beijing subway.
Transportation, (3), 1–26. https://doi.org/10.1007/s11116-016-9756-9
Algorithms in this thesis were developed using Python from Python Software Foundation.
Python Language Reference, version 2.7. Available at http://www.python.org
Maps throughout this thesis were created using ArcGIS® software by Esri. ArcGIS® and
ArcMap™ are the intellectual property of Esri and are used herein under license. Copyright ©
Esri. All rights reserved. For more information about Esri® software, please visit www.esri.com.
80
Appendices
Appendix A - STM Card Types
Card types and description for STM users, retrieved from Metropolitano (2015) (p.25)
Group Description User Code Description
1 Normal 01/11 Normal
2 Student
21/121 Student A
22/122 Student B
23/123 Student FREE
3 Retired 31/131 Retired A
32/132 Retired B
4 Social Work 41 Special schools
42 Social benefits
5 Conventions
organisms
51 Entity with quotes
52 Employee with quotes
53 Entity without quote validation
54 Quote without quote validation
320/07
Ministry of National Defense (Special
characteristics)
6 Prepaid 61
Employee of authorized private companies
and public organizations
7 Linked
71 Employee with quotes
72 Retired
73 Investor without quotes
74 Relative of employee/investor
75 Employee of transport system
81
Appendix B - Results for all days
Results from Query # 2 for OD estimation and incorporation of single riders
▪ Tuesday August 16
Condition No. of Cards No. of Transactions
Initial 303,511 870,437
Initial query results 298,871 860, 653 (98.9%)
Cards that do not meet query
criteria
4,640 (1.5%) 9,574 (1.1%)
Cards with single ride per day 43,466 (14.3%)
Transactions with valid bus run 662,402 (76.1%)
Transactions with validated bus
run
67,894 (7.8%)
Transactions with invalid bus run
(cannot be validated)
140,140 (16.1%)
Transactions with invalid
boarding stops
9,156 (1.1%)
Totals
Single rider 36,779 (12.11%) 36,779 (4.22%)
Cards for algorithm (estimate
alighting)
155,310 (51.2%) 440,622 (50.6%)
▪ Wednesday August 17
Condition No. of Cards No. of Transactions
Initial 298,681 867,645
Initial query results 858,106 (98.9%)
Cards that do not meet query
criteria
4,625 (1.5%) 9102 (1.0%)
Cards with single ride per day 43,787 (14.7%)
Transactions with valid bus run 662,325 (76.3%)
Transactions with validated bus
run
68,686 (7.9%)
Transactions with invalid bus run
(cannot be validated)
136,197(15.7%)
Transactions with invalid
boarding stops
9,104 (1.0%)
Totals
Single rider 37,101 (12.42%) 37,101 (4.28%)
Cards for algorithm (estimate
alighting)
157,251 (52.6%) 446,587 (51.5%)
82
▪ Thursday August 18
Condition No. of Cards No. of Transactions
Initial 305,377 872,844
Initial query results 864,528 (99.0%)
Cards that do not meet query
criteria
4,585 (1.5%) 8,871 (1.0%)
Cards with single ride per day 44,499 (14.6%)
Transactions with valid bus run 664,418 (76.1%)
Transactions with validated bus
run
69,513 (79.6%)
Transactions with invalid bus run
(cannot be validated)
138,468 (15.9%)
Transactions with invalid
boarding stops
9,349 (1.1%)
Totals
Single rider 33,701 (11.04%) 33,701 (3.86%)
Cards for algorithm (estimate
alighting)
156,951 (51.4%) 445,577 (51.0%)
▪ Friday August 19
Condition No. of Cards No. of Transactions
Initial 301,848 863,231
Initial query results 852,441 (98.8%)
Cards that do not meet query
criteria
5,156 (1.7%) 10,283 (11.9%)
Cards with single ride per day 46,037(15.3%)
Transactions with valid bus run 656,800 (76.1%)
Transactions with validated bus
run
676,882 (78.4%)
Transactions with invalid bus run
(cannot be validated)
138,242 (16.0%)
Transactions with invalid
boarding stops
8,714 (1.0%)
Totals
Single rider 38,942 (12.90%) 38,942 (4.51%)
Cards for algorithm (estimate
alighting)
152,870 (50.6%) 434,019 (50.3%)
83
▪ Saturday August 20
Condition No. of Cards No. of Transactions
Initial 454,123 454,576
Initial query results 450,810 (99.2%)
Cards that do not meet query
criteria
1,834 3313 (0.7%)
Cards with single ride per day 35,866
Transactions with valid bus
run
346,345 (76.2%)
Transactions with validated
bus run
31,392 (6.9%)
Transactions with invalid bus
run (cannot be validated)
76,386 (16.8%)
Transactions with invalid
boarding stops
4,462 (1.0%)
Totals
Single rider 29,935 (6.59%) 29,935 (6.58%)
Cards for algorithm (estimate
alighting)
83,660 (18.42%) 226,341 (49.8%)
▪ Sunday August 21
Condition No. of Cards No. of Transactions
Initial 278,781 279,043
Initial query results 277,264 (99.4%)
Cards that do not meet query
criteria
851 1,517 (0.5%)
Cards with single ride per day 27,671
Transactions with valid bus
run
215,790 (77.3%)
Transactions with validated
bus run
17,406 (6.2%)
Transactions with invalid bus
run (cannot be validated)
45,585 (16.3%)
Transactions with invalid
boarding stops
2,509 (0.9%)
Totals
Single rider 12,622 (4.27%) 12,622 (4.52%)
Cards for algorithm (estimate
alighting)
53,728 (19.27%) 142,318 (51.0%)
84
Results from itinerary - Comparison and fixing stops with passenger service time over 10
seconds
▪ Tuesday August 16
Condition Occurrences Passengers
Stop with unknown ordinal 624 4,031
Stop neither first or last of a
route nor terminal 19,653 82,398
Stop at intermediate terminals 292
22,123 First or last stop of bus route
including terminal 2,406
▪ Wednesday August 17
Condition Occurrences Passengers
Stop with unknown ordinal 608 3,113
Stop neither first or last of a
route nor terminal 19,545 83,072
Stop at intermediate terminals 286
22,596 First or last stop of bus route
including terminal 2,437
▪ Thursday August 18
Condition Occurrences Passengers
Stop with unknown ordinal 632 3,871
Stop neither first or last of a
route nor terminal 19,971 85,792
Stop at intermediate terminals 330
22,784 First or last stop of bus route
including terminal 2,530
85
▪ Friday August 19
Condition Occurrences Passengers
Stop with unknown ordinal 594 4,189
Stop neither first or last of a
route nor terminal 19,591 84,066
Stop at intermediate terminals 312
23,154 First or last stop of bus route
including terminal 2,516
▪ Saturday August 20
Condition Occurrences Passengers
Stop with unknown ordinal 347 2,059
Stop neither first or last of a
route nor terminal 9,744 39,672
Stop at intermediate terminals 208
12,822 First or last stop of bus route
including terminal 1,498
▪ Sunday August 21
Condition Occurrences Passengers
Stop with unknown ordinal 263 1,952
Stop neither first or last of a
route nor terminal 6,773 27,822
Stop at intermediate terminals 132
7,194 First or last stop of bus route
including terminal 1,052
86
Appendix C - Details of algorithm
Recall the definition of variables and indices:
𝑛 = 𝑇𝑟𝑖𝑝 𝑛𝑢𝑚𝑏𝑒𝑟 (𝑇ℎ𝑒 𝑓𝑖𝑟𝑠𝑡 𝑡𝑟𝑖𝑝 𝑖𝑠 𝑛 = 1)
𝑙 = 𝐿𝑒𝑔 𝑜𝑓 𝑡𝑟𝑖𝑝 𝑛𝑢𝑚𝑏𝑒𝑟
𝑂𝑛 = 𝑂𝑟𝑖𝑔𝑖𝑛 𝑜𝑓 𝑡𝑟𝑖𝑝 𝑛
𝐷𝑛 = 𝐷𝑒𝑠𝑡𝑖𝑛𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡𝑟𝑖𝑝 𝑛
𝑎𝑛, 𝑙 = 𝑎𝑙𝑖𝑔ℎ𝑡𝑖𝑛𝑔 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 𝑡𝑟𝑖𝑝 𝑛, 𝑙𝑒𝑔 𝑜𝑓 𝑡𝑟𝑖𝑝 𝑙
𝑏𝑛, 𝑙 = 𝑏𝑜𝑎𝑟𝑑𝑖𝑛𝑔 𝑙𝑜𝑐𝑐𝑎𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 𝑡𝑟𝑖𝑝 𝑛, 𝑙𝑒𝑔 𝑜𝑓 𝑡𝑟𝑖𝑝 𝑙
𝑑 = 𝑤𝑎𝑙𝑘𝑖𝑛𝑔 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑠𝑡𝑜𝑝𝑠
→ 𝐷𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡𝑟𝑎𝑣𝑒𝑙 𝑎𝑛𝑑 𝑠𝑒𝑞𝑢𝑒𝑛𝑐𝑒 𝑜𝑓 𝑠𝑡𝑜𝑝𝑠 𝑓𝑜𝑟 𝑎 𝑏𝑢𝑠 𝑟𝑢𝑛
The transactions are identified by the index 𝑖, with the first transaction labelled 𝑖 = 1. The
algorithm consists of two parts. The steps for each part are outlined as follows:
First part: Estimation of alighting location and time
1. Identify all the transactions for a smartcard and organize them chronologically. Label the
transactions as 𝑖 , 𝑖 + 1, … , 𝑘. Starting with 𝑖 =1 and 𝑘 ≤ 9.
2. For transaction 𝑖, retrieve the bus UID and match with the corresponding valid bus run to
obtain the sequence of stops following the boarding location.
3. Pair the boarding location of the next boarding transaction (𝑖 + 1) with one of the stops
from step 1 that minimizes the walking distance (𝑑) between the stops and label this as
87
the alighting stop (𝑎𝑛, 𝑙) for transaction 𝑖. Thus, minimizing the walking distance for
passengers between the alighting stop and the next boarding.7
4. Retrieve the time of arrival for the alighting stop (𝑎𝑛, 𝑙) from the bus UID itinerary.
5. Repeat steps 2 through 4 for transaction 𝑖 = 𝑖 + 1 until reaching transaction 𝑘*.
*For transaction 𝑘, which is the last transaction of the day, use the boarding location for the first
transaction of the day (𝑖 = 1) as the next boarding location for step 3.
Second part: Estimation of trip origin and destination
1. Set variables 𝑛 = 1, 𝑙 = 1, count=0
2. Identify trip IDs for the transactions for a smartcard:
a. If transaction 𝑖 has unique trip ID:
i. Assign label 𝑛
ii. 𝑂𝑛 = Boarding stop transaction 𝑖
iii. 𝐷𝑛 = Alighting stop transaction 𝑖
b. If transaction 𝑖 shares trip ID with transaction 𝑖 + 1:
i. Retrieve and count subsequent transactions with shared trip ID and assign
them label 𝑛. Assign label 𝑙 for the first transaction, 𝑙 + 1 for the second,
and so on until all transactions are labeled.
ii. 𝑂𝑛 = Boarding stop transaction 𝑖
iii. 𝑎𝑛, 𝑙 = Alighting stop transaction 𝑖 (Note that the alighting is not the trip
destination as this is the first leg of the trip 𝑛)
iv. If transaction labeled 𝑙 + 1 is last transaction with shared 𝑛:
1. 𝑏𝑛, 𝑙+1 = Boarding stop
2. 𝐷𝑛 = Alighting stop
v. If transaction labeled 𝑙 + 1 is not last transaction with shared 𝑛:
1. 𝑏𝑛, 𝑙+1 = Boarding stop (transfer boarding stop for leg 𝑙 + 1)
2. 𝑎𝑛, 𝑙+1 = Alighting stop (transfer alighting stop for leg 𝑙 + 1)
vi. Repeat steps iv and v for subsequent transactions with shared 𝑛. Update
𝑙 = 𝑙 + 1.
3. Set variables 𝑛 = 𝑛 + 1, 𝑙 = 𝑙 + 1. Repeat step 2 transaction 𝑖 = 𝑖 + 1
4. Repeat steps 1 and 2 for next smartcard
7 The pairing process can be done by minimizing the distance between alighting and boarding
stops (Trepanier et al., 2007) or the generalized time (M. A. Munizaga & Palma, 2012). The
method proposed here considers minimizing the walking distance between stops and sets a
maximum walking distance of 1,000 metres.