Upload
bobby-jafari
View
189
Download
2
Embed Size (px)
Citation preview
BOBBY JAFARITRANSPORTATION MODE CHOICE MODELING
In this paper, a pared down version of the National Household Travel Survey is used in order to build a
model that explains the influence of various independent variables on the choice of travel mode of
individuals. The type of modeling used is “Discrete Choice Modeling” which is one of the most common
type of “micro-level modeling”. For this work, I have used Multinomial Logistic Regression technique
performed by SPSS to estimate the influence of various independent variables such as income,
population density and age on the choice of travel mode of the individuals. Data from NHTS First, I made
a simple frequency table of the mode choice as a reference for all 544,329 individuals in the sample
which shows “SOV” as the dominant mode among the individuals. Table 1, shows the result for the
travel mode.
MODE %Single Occupancy Vehicle (SOV)
51.0
High Occupancy Vehicle (HOV) 38.4Bus 0.7Rail 0.3Bike 0.8Walk 8.9TotalN (Trips)
100.0%544,329
Table1. Travel Mode Choice for All Trip Purposes, Respondents Aged 18-65, 2009 NHTS
Since, income is always one of the most important determinant of the mode choice, in the next step I
have divided the Households into 3 income groups of:
HHs making less than $20K
HHs making between $20K and $60K and,
HHs making more than $60K.
By dividing the income of the Household into 3 groups, we get a clear picture of the number of the
Households in each group category as well as their behavior relative to travel mode choice. Table 2
shows the midpoint income range of the respondents and the number of individuals in each range and,
tables 3, shows the travel mode choice frequency by the above Household income groups. “SOV”
remains the most dominant travel mode choice for respondents within all 3 categories but, as was
expected, the percentage of the “SOV” increases as the income increases within the Households.
MIDPOINT FREQUENCY %$2,500 4818 0.9$7,500 8438 1.6$12,500 11483 2.1$17,500 14339 2.6$22,500 13247 2.4$27,500 20957 3.9$32,500 15318 2.8$37,500 25337 4.7$42,500 14638 2.7$47,500 28764 5.3$52,500 15024 2.8$57,500 30733 5.6$62,500 13679 2.5$67,500 29154 5.4$72,500 15178 2.8$77,500 31781 5.8$90,000 70768 13.0$200,000 180673 33.2Total 544,329 100.0
Table2. Income Midpoints Range of All Respondents, Aged 18-65, 2009 NHTS
MODE FREQUENCY
Income<$20K
% FREQUENCY$20k<Income<$60
K
% FREQUENCY
Income>$60K
%
SOV 16,048 41.1 84,499 51.5 176,862 51.5HOV 15,394 39.4 63,648 38.8 130,196 38.8Bus 1,399 3.6 1,157 0.7 1,122 0.7Rail 208 0.5 444 0.3 963 0.3Bike 462 1.2 1,213 0.7 2,530 0.7Walk 39,078 100.0 13,057 8.0 29,560 8.0TOTAL 16,048 41.1 164,018 100.0 341,233 100.0
Table 3. Travel Mode Choice frequency by Income for all respondents, Aged 18-65, 2009 NHTSAs a visual help to the viewer, I have made a stacked bar chart of the 3 income groups to show the
attitude of the individuals utilizing the travel mode of choice within each income groups. One caveat is
as the income increases within the groups, the bus mode decreases significantly which is an indication of
unattractiveness of bus mode choice to the middle class and higher income groups within the society.
Figure 1 shows the stacked bar chart for the travel mode choice by income.
<$20K $20K - $60K >$60K0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
41.1%51.5% %
39.4%
38.8% %
%
14.2%8% %
Travel mode choice by ln_income
SOV HOV Bus Rail Bike Walk
Figure 1. Travel mode choice by income range, for all respondents, aged 18-65, 2009 NHTS
In order to accurately show the effect of the income on the travel mode choice on the model, I have
taken the natural logarithm of the income. Income have a non-linear effect on the mode choice and as
income increases (particularly on the higher income ranges), we see less of the effect on the dependent
variable (travel mode choice in this model). Table 4 shows the estimate effects of income on odds of an
individual choosing each of the various modes of travel. In this model, I have selected “SOV” as the
dominant or largest category option for reference to compare the other modes to it. By selecting “SOV”
as the largest outcome, we can find out how our explanatory variables cause individuals to switch
between the travel modes. Please note that the low Pseudo-R2 is due to the very limited effect of log-
transformed income to the travel mode choice as I have just selected one independent variable
(ln_income) in my model.
TRAVEL MODE REFRENCE CATEGORY IS
SOVB SIG EXP(B)
HOV Intercept 0.424 0ln (HH income) -0.063 0 0.939
Bus Intercept 5.301 0ln (HH income) -0.887 0 0.412
Rail Intercept -3.241 0ln (HH income) -0.17 0 0.844
Bike Intercept -2.485 0ln (HH income) -0.152 0 0.859
Walk Intercept -0.206 0.001ln (HH income) -0.138 0 0.871
N 544,329Pseudo-R2 (Nagelkerke) 0.08Table 4. Estimated Effects of Log-transformed Income on Mode Choice, Multinomial Logistic
Regression Model, All Trip Purposes, Respondents Aged 18-65, 2009 NHTS
By observing the “odds-ratio” column in our model (EXP “B”), we see in general that increase in the
income cause decrease in all travel mode choice relative to “SOV” outcome. The lower the number in
the EXP (B) column, means the stronger the effect of the income on the respected travel mode choice to
the SOV mode. So, based on the model, the increase in an individual income causes the biggest effect on
the bus travel mode relative to SOV, or in the other words, bus riding is the most sensitive between all
travel mode choices when an individual’s income increases by a unit in the income range. In the above
model, the effects of increase of income on all travel mode choices are statistically significant since “Sig”
column values (or P-value) are all less than 0.05. It should be noted that the above model is very simple
and naïve since, it only considers the effect of income on the travel mode choice.
As mentioned in the above, I have introduced more independent variables to the model in order to
observe the effects on each independent variable to individual’s travel mode choice and enhance the
strength of my model. The next variable used is “Trip purpose” which is categorized to 4 categories of
“work”, “shopping”, “family/personal”, and “social”. By adding trip purpose, we can find out how
including this independent variable influences travel mode choice, controlling for their log-transformed
income. Table 5 shows the frequency of the trip purpose for all respondents. As shown in the table, the
majority of respondents choose SOV mode for trip to work rather than the other modes of travel. Figure
2 shows the stacked bar of the table 5.
MODE FREQUENCY(WORK)
% FREQUENCY(SHOPPING)
% FREQUENCY(FAMILY/PERSONAL)
% FREQUENCY(SOCIAL)
%
SOV 101,647 82.1 63,778 51.3
67,408 43.4 44,576 31.6
HOV 16,013 12.9 53,423 43.0
71,085 45.8 68,717 48.7
Bus 1,408 1.1 703 0.6 915 0.6 652 0.5Rail 782 0.6 167 0.1 357 0.2 309 0.2Bike 1,066 0.9 543 0.4 516 0.3 2,080 1.5Walk 2,840 2.3 5,717 4.6 14,908 9.6 24,719 17.5TOTAL 123,756 100.
0124,331 100
.0155,189 100.
0141,053 100.
0Table 5. Travel mode choice frequency by Trip purpose for all respondents, Aged 18-65, 2009
NHTS
Work Shopping Family/personal Social0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
82.1%
51.3%43.4%
31.6%
12.9%
43%
45.8%
48.7%
2.3% 4.6% 9.6%17.5%
Travel mode choice by trip purpose
SOV HOV Bus Rail Bike Walk
Figure 2. Travel mode choice by trip purpose, for all respondents, aged 18-65, 2009 NHTSBy adding the trip purpose, it was interesting to note that new estimate shows the improvement of
pseudo-R2 in compare with the previous model. The following table (table 6) shows the results of the
model.
TRAVEL MODE REFRENCE CATEGORY IS SOV B SIG EXP(B)
HOV ln_income -0.052 0.000 0.950Trip Purpose (omitted: social)
Work -2.281 0.000 0.102 Shopping -0.618 0.000 0.539 Family/Personal -0.384 0.000 0.681
Intercept 1.018 0.000Bus ln_income -0.919 0.000 0.399
Trip Purpose (omitted: social) Work -0.008 0.860 0.992 Shopping -0.478 0.000 0.620 Family/Personal -0.172 0.001 0.842
Intercept 5.793 0.000Rail ln_income -0.206 0.000 0.814
Trip Purpose (omitted: social) Work 0.103 0.125 1.109 Shopping -1.009 0.000 0.365 Family/Personal -0.286 0.000 0.751
Intercept -2.659 0.000Bike ln_income -0.191 0.000 0.827
Trip Purpose (omitted: social) Work -1.494 0.000 0.225 Shopping -1.734 0.000 0.177 Family/Personal -1.823 0.000 0.161
Intercept -0.920 0.000Walk ln_income -0.158 0.000 0.854
Trip Purpose (omitted: social) Work -2.989 0.000 0.050 Shopping -1.849 0.000 0.157 Family/Personal -0.932 0.000 0.394
Intercept 1.191 0.000N 544,329Pseudo-R2 (Nagelkerke) 0.186
Table 6. Estimated Effects of Log-transformed Income and Trip purpose on Mode Choice, Multinomial Logistic Regression Model, All Trip Purposes, Respondents Aged 18-65, 2009
NHTSObserving the results of the new estimate and the new improved value of pseudo R2 from 0.08 to 0.186
indicate the importance and the influence of the trip purpose variable to the travel mode choice of the
individuals. It should be noted, even after adding “trip purpose” as an additional predictor to the model,
the pseudo R2 value, though has improved but, is still considered to be a low value. Furthermore,
pseudo R2 by itself should not be treated as very reliable measurement. By putting trip purpose into 4
bins of work, shopping, family/personal, and social, we can analyze the trip purpose variable in more
details. Based on the new model, we can conclude that the trip purpose, makes the odds of using non-
SOV modes decrease relative to SOV. Among all the modes, biking and walking are showing the most
sensitivity in the odds of being chosen by individuals relative to SOV mode when trip purpose variable is
added. It is worth mentioning, adding the work trip purpose to the model, the odds of HOV and walk
modes relative to SOV are significantly getting less which indicates the importance of SOV mode as the
dominant mode choice of travel to work within the respondents.
In the next step, I added another independent variable predictor called population density to my model.
Like the previous step, in order to refine my estimate, I made 4 dummy variables, dividing population
density variable into 4 categories as follows:
Population density < 1,000/square miles
1,000 < Population density < 2,500
2,500 < Population density < 10,000 and
Population density for +10,000
Since, both variables of ln_income and trip purpose have improved the predicting capability of the
estimate, it was unwise to remove them from the model. So this time, SPSS is used to estimate a new
model of mode choice as predicted by ln_income, trip purpose and population density. Before making a
new estimate, I made a frequency table for my dummy variables of population density. Table 7 shows
the results of travel mode choice frequency by population density for all respondents.
As shown in the table 7, increasing the population density per square mile variable causes decrease in
use of SOV as the dominant travel mode choice among the respondents. Though SOV still remains the
most popular option for travel mode choice. Figure 3 shows the stacked bar for travel mode choice by
population density’s categories.
MODE FREQUENCY(pop<1K)
% FREQUENCY(1K<pop<2.5K)
% FREQUENCY(2.5K<pop<10K)
% FREQUENCY(pop>10K)
%
SOV 94,452 51.9 66,318 52.5
105,129 51.0 11,510 38.6
HOV 73,328 40.3 48,807 38.7
77,582 37.6 9,521 31.9
Bus 342 0.2 409 0.3 1,624 0.8 1,303 4.4Rail 221 0.1 133 0.1 488 0.2 773 2.6Bike 1,088 0.6 825 0.7 1,939 0.9 353 1.2Walk 12,484 6.9 9,779 7.7 19,548 9.5 6,373 21.4TOTAL 181,915 100.
0126,271 100
.0206,310 100.
029,833 100.
0
Table 7. Travel mode choice frequency by population density for all respondents, Aged 18-65, 2009 NHTS
pop<1,000 1,000<pop<2,500 2,500<pop<10,000 pop>10,0000%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
51.9% 52.5% 51%38.6%
40.3% 38.7% 37.6%
31.9%
6.9% 7.7% 9.5%21.4%
4.4%2.6%
Travel mode choice by population density
SOV HOV Bus Rail Bike Walk
Figure 3. Travel mode choice by population density per square mile for all respondents, aged 18-65, 2009 NHTS
After adding the population density’s categories to SPSS, I ran the model and once again, we can see the
improvement in the value of the pseudo R2 from 0.186 to 0.208. Table 8 shows the results of the new
model in detail.
By observing table 8 results, we can conclude that adding population density has improved the
predictability of our model. What is interesting to note is the sensitivity of bus and rail modes relative to
SOV mode to the population density variable. As shown in the table 8, the odds of using bus and rail
relative to SOV, as the population is getting less per square mile, will increase significantly which shows
the dominant of SOV mode as the travel mode choice for respondents in the rural areas. In the urban
regions with the density of +10K/sqm, we see the increase of the odds of using bus and rail modes
relative to SOV mode. After bus and rail, walk in the least populated areas has the lowest odds of being
chosen as the travel mode choice relative to SOV for the respondents, controlling for other variables in
the model.
TRAVEL MODE REFRENCE CATEGORY IS SOV B SIG EXP(B)
HOV ln_income -0.049 0 0.952Trip purpose (omitted: Social) Work -2.284 0 0.102 Shopping -0.62 0 0.538 Family/Personal -0.384 0 0.681Population density/1,000/sqm (omitted: pop >10,000) Pop < 1,000 -0.071 0 0.932 1,000 < Pop < 2,500 -0.156 0 0.856 2,500 < pop < 10,000 -0.141 0 0.869Intercept 1.101 0 0.952
Bus ln_income -0.818 0 0.441Trip purpose (omitted: Social) Work -0.068 0.164 0.935 Shopping -0.555 0 0.574 Family/Personal -0.23 0 0.795Population density/1,000/sqm (omitted: pop >10,000) Pop < 1,000 -3.2 0 0.041 1,000 < Pop < 2,500 -2.586 0 0.075 2,500 < pop < 10,000 -1.788 0 0.167Intercept 6.706 0 0.441
Rail ln_income -0.073 0.008 0.93Trip purpose (omitted: Social) Work 0.013 0.848 1.013 Shopping -1.109 0 0.33 Family/Personal -0.359 0 0.698Population density/1,000/sqm (omitted: pop >10,000) Pop < 1,000 -3.364 0 0.035 1,000 < Pop < 2,500 -3.505 0 0.03 2,500 < pop < 10,000 -2.673 0 0.069Intercept -1.629 0
Bike ln_income -0.172 0 0.842Trip purpose (omitted: Social) Work -1.502 0 0.223 Shopping -1.745 0 0.175 Family/Personal -1.835 0 0.16
Population density/1,000/sqm (omitted: pop >10,000) Pop < 1,000 -1.011 0 0.364 1,000 < Pop < 2,500 -0.937 0 0.392 2,500 < pop < 10,000 -0.541 0 0.582Intercept -0.388 0.048
Walk ln_income -0.127 0 0.881Trip purpose (omitted: Social) Work -3.007 0 0.049 Shopping -1.872 0 0.154 Family/Personal -0.95 0 0.387Population density/1,000/sqm (omitted: pop >10,000) Pop < 1,000 -1.468 0 0.23 1,000 < Pop < 2,500 -1.394 0 0.248 2,500 < pop < 10,000 -1.147 0 0.318Intercept 2.058 0
N 544,329Pseudo-R2 (Nagelkerke) 0.208
Table 8. Estimated Effects of Log-transformed Income, Trip Purpose and Population Density on Mode Choice, Multinomial Logistic Regression Model, All Trip Purposes, Respondents
Aged 18-65, 2009 NHTS
In the next step, in order to realize if our model will further improve its predictability, I decided to add
“age” variable to the list of previous variables to find out the influence of the age of respondents to
select the other travel mode choice relative to SOV. Like the previous steps, to have a better
understanding of the impact of age, I made the age variable into 4 bins, as the follows:
Respondent age being 18 (minimum age of sample population) to 21
Respondent age being 21 to 35
Respondent age being 35 to 55 and finally,
Respondent age being 55 to 65 (maximum age of sample population)
I selected age of 18 to 21 to find out the influence of age on selecting all travel modes of choice relative
to SOV mode of choice for the respondents who have acquired their driver licenses either recently or
within the last 3 to 4 years. The rest of the age categories I made are quite typical. Once again, prior to
run the model, I made a travel mode choice frequency table by the variable age in its categories for all
respondents. Table 9 shows the results of the frequency distribution and figure 3 shows the stacked bar
of the same table.
MODE FREQUENCY(18<age<21)
% FREQUENCY(21<age<35)
% FREQUENCY(35<age<55))
% FREQUENCY(55<age<=65)
%
SOV 7,740 47.5 29,113 43.9
134,621 50.3 90,450 55.1
HOV 6,751 41.4 29,517 44.5
105,453 39.4 56,383 34.4
Bus 255 1.6 620 0.9 1,683 0.6 904 0.6Rail 44 0.3 286 0.4 728 0.3 453 0.3Bike 162 1.0 487 0.7 2,230 0.8 1,063 0.6Walk 1,347 8.3 6,287 9.5 23,083 8.6 14,798 9.0TOTAL 16,299 100.
066,310 100
.0267,798 100.
0164,051 100.
0
Table 9. Travel mode choice frequency by age for all respondents, Aged 18-65, 2009 NHTS
18<age<21 21<age<35 35<age<55 55<age<650%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
47.5% 43.9%50.3% 55.1%
% 44.5%39.4% 34.4%
8.3% 9.5% 8.6% 9%
Travel mode choice by age
SOV HOV Bus Rail Bike Walk
Figure 3. Travel mode by age for all respondents, aged 18-65, 2009 NHTS
After making the age variable into 4 above categories, I ran the model and table 10 shows the results of
the new model. Again, we see the new improved pseudo R2 value from 0.208 to 0.217 which indicates
the improvement of model’s predictability. Observing table 10 and its age variable categories, we notice
most of the values under EXP (B) column are bigger than 1. We can interpret this outcome by using the
term “relative risk”. Based on SPSS, the ratio of the probability of choosing one outcome category over
the probability of choosing the reference category is called relative risk (or odds, as used earlier). So we
can conclude the relative odds of choosing other modes of travel versus SOV mode will increase by
0.298 (for HOV), 0.912 (for bus) and 0.537 (for bike) if moving from the highest level category of age
(55<age<65) to the lowest level of age (18<age<21). In general, as the age of the respondents go up, the
level of using SOV mode increases relative to the other modes of travel within the respondents with one
exception of age category of 21 to 35 which shows the level of SOV travel mode decreases in compare to
the previous age group of 18 to 21.
TRAVEL MODE
REFRENCE CATEGORY IS SOV B SIG EXP(B)
HOV ln_income -0.054 0 0.947Trip purpose (omitted: Social)
Work -2.333 0 0.097 Shopping -0.616 0 0.54 Family/Personal -0.403 0 0.668
Population density/1,000/sq.mi. (omitted: pop >10,000)
Pop < 1,000 -0.034 0.029 0.966 1,000 < Pop < 2,500 -0.125 0 0.883 2,500 < pop < 10,000 -0.12 0 0.887
Age of individual (omitted: +56) 18 < Age < 21 0.298 0 1.347 21 < Age < 35 0.664 0 1.943 35 < Age < 55 0.345 0 1.413
Intercept 0.867 0Bus ln_income -0.81 0 0.445
Trip purpose (omitted: Social) Work -0.088 0.072 0.916 Shopping -0.524 0 0.592 Family/Personal -0.236 0 0.79
Population density/1,000/sqm (omitted: pop >10,000) Pop < 1,000 -3.173 0 0.042 1,000 < Pop < 2,500 -2.555 0 0.078 2,500 < pop < 10,000 -1.767 0 0.171
Age of individual (omitted: +56) 18 < Age < 21 0.912 0 2.49 21 < Age < 35 0.396 0 1.486 35 < Age < 55 0.239 0 1.27
Intercept 6.377 0Rail ln_income -0.064 0.02 0.938
Trip purpose (omitted: Social) Work 0 0.997 1 Shopping -1.105 0 0.331 Family/Personal -0.357 0 0.7
Population density/1,000/sqm (omitted: pop >10,000) Pop < 1,000 -3.345 0 0.035 1,000 < Pop < 2,500 -3.49 0 0.031
2,500 < pop < 10,000 -2.662 0 0.07Age of individual (omitted: +56)
18 < Age < 21 -0.067 0.642 0.935 21 < Age < 35 0.405 0 1.499 35 < Age < 55 -0.02 0.743 0.981
Intercept -1.78 0Bike ln_income -0.187 0 0.829
Trip purpose (omitted: Social) Work -1.545 0 0.213 Shopping -1.742 0 0.175
Family/Personal -1.859 0 0.156Population density/1,000/sqm (omitted: pop >10,000)
Pop < 1,000 -0.991 0 0.371 1,000 < Pop < 2,500 -0.918 0 0.399 2,500 < pop < 10,000 -0.531 0 0.588
Age of individual (omitted: +56) 18 < Age < 21 0.537 0 1.712 21 < Age < 35 0.377 0 1.458 35 < Age < 55 0.457 0 1.58
Intercept -0.53 0.007Walk ln_income -0.131 0 0.877
Trip purpose (omitted: Social) Work -3.039 0 0.048 Shopping -1.875 0 0.153 Family/Personal -0.962 0 0.382
Population density/1,000/sqm (omitted: pop >10,000) Pop < 1,000 -1.45 0 0.235 1,000 < Pop < 2,500 -1.379 0 0.252 2,500 < pop < 10,000 -1.137 0 0.321
Age of individual (omitted: +56) 18 < Age < 21 -0.092 0.002 0.912 21 < Age < 35 0.375 0 1.456 35 < Age < 55 0.195 0 1.215
Intercept 1.964 0.000N 544,329Pseudo-R2 (Nagelkerke) 0.217
Table 10. Estimated Effects of Log-transformed Income, Trip Purpose, Population Density and Age on Mode Choice, Multinomial Logistic Regression Model, All Trip Purposes, Respondents
Aged 18-65, 2009 NHTS