Transportation mode choice modeling

BOBBY JAFARITRANSPORTATION MODE CHOICE MODELING

In this paper, a pared down version of the National Household Travel Survey is used in order to build a

model that explains the influence of various independent variables on the choice of travel mode of

individuals. The type of modeling used is “Discrete Choice Modeling” which is one of the most common

type of “micro-level modeling”. For this work, I have used Multinomial Logistic Regression technique

performed by SPSS to estimate the influence of various independent variables such as income,

population density and age on the choice of travel mode of the individuals. Data from NHTS First, I made

a simple frequency table of the mode choice as a reference for all 544,329 individuals in the sample

which shows “SOV” as the dominant mode among the individuals. Table 1, shows the result for the

travel mode.

MODE %Single Occupancy Vehicle (SOV)

51.0

High Occupancy Vehicle (HOV) 38.4Bus 0.7Rail 0.3Bike 0.8Walk 8.9TotalN (Trips)

100.0%544,329

Table1. Travel Mode Choice for All Trip Purposes, Respondents Aged 18-65, 2009 NHTS

Since, income is always one of the most important determinant of the mode choice, in the next step I

have divided the Households into 3 income groups of:

HHs making less than $20K

HHs making between $20K and $60K and,

HHs making more than $60K.

By dividing the income of the Household into 3 groups, we get a clear picture of the number of the

Households in each group category as well as their behavior relative to travel mode choice. Table 2

shows the midpoint income range of the respondents and the number of individuals in each range and,

tables 3, shows the travel mode choice frequency by the above Household income groups. “SOV”

remains the most dominant travel mode choice for respondents within all 3 categories but, as was

expected, the percentage of the “SOV” increases as the income increases within the Households.

MIDPOINT FREQUENCY %$2,500 4818 0.9$7,500 8438 1.6$12,500 11483 2.1$17,500 14339 2.6$22,500 13247 2.4$27,500 20957 3.9$32,500 15318 2.8$37,500 25337 4.7$42,500 14638 2.7$47,500 28764 5.3$52,500 15024 2.8$57,500 30733 5.6$62,500 13679 2.5$67,500 29154 5.4$72,500 15178 2.8$77,500 31781 5.8$90,000 70768 13.0$200,000 180673 33.2Total 544,329 100.0

Table2. Income Midpoints Range of All Respondents, Aged 18-65, 2009 NHTS

MODE FREQUENCY

Income<$20K

% FREQUENCY$20k<Income<$60

K

% FREQUENCY

Income>$60K

%

SOV 16,048 41.1 84,499 51.5 176,862 51.5HOV 15,394 39.4 63,648 38.8 130,196 38.8Bus 1,399 3.6 1,157 0.7 1,122 0.7Rail 208 0.5 444 0.3 963 0.3Bike 462 1.2 1,213 0.7 2,530 0.7Walk 39,078 100.0 13,057 8.0 29,560 8.0TOTAL 16,048 41.1 164,018 100.0 341,233 100.0

Table 3. Travel Mode Choice frequency by Income for all respondents, Aged 18-65, 2009 NHTSAs a visual help to the viewer, I have made a stacked bar chart of the 3 income groups to show the

attitude of the individuals utilizing the travel mode of choice within each income groups. One caveat is

as the income increases within the groups, the bus mode decreases significantly which is an indication of

unattractiveness of bus mode choice to the middle class and higher income groups within the society.

Figure 1 shows the stacked bar chart for the travel mode choice by income.

<$20K $20K - $60K >$60K0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

41.1%51.5% %

39.4%

38.8% %

%

14.2%8% %

Travel mode choice by ln_income

SOV HOV Bus Rail Bike Walk

Figure 1. Travel mode choice by income range, for all respondents, aged 18-65, 2009 NHTS

In order to accurately show the effect of the income on the travel mode choice on the model, I have

taken the natural logarithm of the income. Income have a non-linear effect on the mode choice and as

income increases (particularly on the higher income ranges), we see less of the effect on the dependent

variable (travel mode choice in this model). Table 4 shows the estimate effects of income on odds of an

individual choosing each of the various modes of travel. In this model, I have selected “SOV” as the

dominant or largest category option for reference to compare the other modes to it. By selecting “SOV”

as the largest outcome, we can find out how our explanatory variables cause individuals to switch

between the travel modes. Please note that the low Pseudo-R2 is due to the very limited effect of log-

transformed income to the travel mode choice as I have just selected one independent variable

(ln_income) in my model.

TRAVEL MODE REFRENCE CATEGORY IS

SOVB SIG EXP(B)

HOV Intercept 0.424 0ln (HH income) -0.063 0 0.939

Bus Intercept 5.301 0ln (HH income) -0.887 0 0.412

Rail Intercept -3.241 0ln (HH income) -0.17 0 0.844

Bike Intercept -2.485 0ln (HH income) -0.152 0 0.859

Walk Intercept -0.206 0.001ln (HH income) -0.138 0 0.871

N 544,329Pseudo-R2 (Nagelkerke) 0.08Table 4. Estimated Effects of Log-transformed Income on Mode Choice, Multinomial Logistic

Regression Model, All Trip Purposes, Respondents Aged 18-65, 2009 NHTS

By observing the “odds-ratio” column in our model (EXP “B”), we see in general that increase in the

income cause decrease in all travel mode choice relative to “SOV” outcome. The lower the number in

the EXP (B) column, means the stronger the effect of the income on the respected travel mode choice to

the SOV mode. So, based on the model, the increase in an individual income causes the biggest effect on

the bus travel mode relative to SOV, or in the other words, bus riding is the most sensitive between all

travel mode choices when an individual’s income increases by a unit in the income range. In the above

model, the effects of increase of income on all travel mode choices are statistically significant since “Sig”

column values (or P-value) are all less than 0.05. It should be noted that the above model is very simple

and naïve since, it only considers the effect of income on the travel mode choice.

As mentioned in the above, I have introduced more independent variables to the model in order to

observe the effects on each independent variable to individual’s travel mode choice and enhance the

strength of my model. The next variable used is “Trip purpose” which is categorized to 4 categories of

“work”, “shopping”, “family/personal”, and “social”. By adding trip purpose, we can find out how

including this independent variable influences travel mode choice, controlling for their log-transformed

income. Table 5 shows the frequency of the trip purpose for all respondents. As shown in the table, the

majority of respondents choose SOV mode for trip to work rather than the other modes of travel. Figure

2 shows the stacked bar of the table 5.

MODE FREQUENCY(WORK)

% FREQUENCY(SHOPPING)

% FREQUENCY(FAMILY/PERSONAL)

% FREQUENCY(SOCIAL)

%

SOV 101,647 82.1 63,778 51.3

67,408 43.4 44,576 31.6

HOV 16,013 12.9 53,423 43.0

71,085 45.8 68,717 48.7

Bus 1,408 1.1 703 0.6 915 0.6 652 0.5Rail 782 0.6 167 0.1 357 0.2 309 0.2Bike 1,066 0.9 543 0.4 516 0.3 2,080 1.5Walk 2,840 2.3 5,717 4.6 14,908 9.6 24,719 17.5TOTAL 123,756 100.

0124,331 100

.0155,189 100.

0141,053 100.

0Table 5. Travel mode choice frequency by Trip purpose for all respondents, Aged 18-65, 2009

NHTS

Work Shopping Family/personal Social0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

82.1%

51.3%43.4%

31.6%

12.9%

43%

45.8%

48.7%

2.3% 4.6% 9.6%17.5%

Travel mode choice by trip purpose


Figure 2. Travel mode choice by trip purpose, for all respondents, aged 18-65, 2009 NHTSBy adding the trip purpose, it was interesting to note that new estimate shows the improvement of

pseudo-R2 in compare with the previous model. The following table (table 6) shows the results of the

model.

TRAVEL MODE REFRENCE CATEGORY IS SOV B SIG EXP(B)

HOV ln_income -0.052 0.000 0.950Trip Purpose (omitted: social)

Work -2.281 0.000 0.102 Shopping -0.618 0.000 0.539 Family/Personal -0.384 0.000 0.681

Intercept 1.018 0.000Bus ln_income -0.919 0.000 0.399

Trip Purpose (omitted: social) Work -0.008 0.860 0.992 Shopping -0.478 0.000 0.620 Family/Personal -0.172 0.001 0.842

Intercept 5.793 0.000Rail ln_income -0.206 0.000 0.814

Trip Purpose (omitted: social) Work 0.103 0.125 1.109 Shopping -1.009 0.000 0.365 Family/Personal -0.286 0.000 0.751

Intercept -2.659 0.000Bike ln_income -0.191 0.000 0.827


Intercept -0.920 0.000Walk ln_income -0.158 0.000 0.854


Intercept 1.191 0.000N 544,329Pseudo-R2 (Nagelkerke) 0.186

Table 6. Estimated Effects of Log-transformed Income and Trip purpose on Mode Choice, Multinomial Logistic Regression Model, All Trip Purposes, Respondents Aged 18-65, 2009

NHTSObserving the results of the new estimate and the new improved value of pseudo R2 from 0.08 to 0.186

indicate the importance and the influence of the trip purpose variable to the travel mode choice of the

individuals. It should be noted, even after adding “trip purpose” as an additional predictor to the model,

the pseudo R2 value, though has improved but, is still considered to be a low value. Furthermore,

pseudo R2 by itself should not be treated as very reliable measurement. By putting trip purpose into 4

bins of work, shopping, family/personal, and social, we can analyze the trip purpose variable in more

details. Based on the new model, we can conclude that the trip purpose, makes the odds of using non-

SOV modes decrease relative to SOV. Among all the modes, biking and walking are showing the most

sensitivity in the odds of being chosen by individuals relative to SOV mode when trip purpose variable is

added. It is worth mentioning, adding the work trip purpose to the model, the odds of HOV and walk

modes relative to SOV are significantly getting less which indicates the importance of SOV mode as the

dominant mode choice of travel to work within the respondents.

In the next step, I added another independent variable predictor called population density to my model.

Like the previous step, in order to refine my estimate, I made 4 dummy variables, dividing population

density variable into 4 categories as follows:

Population density < 1,000/square miles

1,000 < Population density < 2,500

2,500 < Population density < 10,000 and

Population density for +10,000

Since, both variables of ln_income and trip purpose have improved the predicting capability of the

estimate, it was unwise to remove them from the model. So this time, SPSS is used to estimate a new

model of mode choice as predicted by ln_income, trip purpose and population density. Before making a

new estimate, I made a frequency table for my dummy variables of population density. Table 7 shows

the results of travel mode choice frequency by population density for all respondents.

As shown in the table 7, increasing the population density per square mile variable causes decrease in

use of SOV as the dominant travel mode choice among the respondents. Though SOV still remains the

most popular option for travel mode choice. Figure 3 shows the stacked bar for travel mode choice by

population density’s categories.

MODE FREQUENCY(pop<1K)

% FREQUENCY(1K<pop<2.5K)

% FREQUENCY(2.5K<pop<10K)

% FREQUENCY(pop>10K)

%

SOV 94,452 51.9 66,318 52.5

105,129 51.0 11,510 38.6

HOV 73,328 40.3 48,807 38.7

77,582 37.6 9,521 31.9

Bus 342 0.2 409 0.3 1,624 0.8 1,303 4.4Rail 221 0.1 133 0.1 488 0.2 773 2.6Bike 1,088 0.6 825 0.7 1,939 0.9 353 1.2Walk 12,484 6.9 9,779 7.7 19,548 9.5 6,373 21.4TOTAL 181,915 100.

0126,271 100

.0206,310 100.

029,833 100.

0

Table 7. Travel mode choice frequency by population density for all respondents, Aged 18-65, 2009 NHTS

pop<1,000 1,000<pop<2,500 2,500<pop<10,000 pop>10,0000%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

51.9% 52.5% 51%38.6%

40.3% 38.7% 37.6%

31.9%

6.9% 7.7% 9.5%21.4%

4.4%2.6%

Travel mode choice by population density


Figure 3. Travel mode choice by population density per square mile for all respondents, aged 18-65, 2009 NHTS

After adding the population density’s categories to SPSS, I ran the model and once again, we can see the

improvement in the value of the pseudo R2 from 0.186 to 0.208. Table 8 shows the results of the new

model in detail.

By observing table 8 results, we can conclude that adding population density has improved the

predictability of our model. What is interesting to note is the sensitivity of bus and rail modes relative to

SOV mode to the population density variable. As shown in the table 8, the odds of using bus and rail

relative to SOV, as the population is getting less per square mile, will increase significantly which shows

the dominant of SOV mode as the travel mode choice for respondents in the rural areas. In the urban

regions with the density of +10K/sqm, we see the increase of the odds of using bus and rail modes

relative to SOV mode. After bus and rail, walk in the least populated areas has the lowest odds of being

chosen as the travel mode choice relative to SOV for the respondents, controlling for other variables in

the model.

TRAVEL MODE REFRENCE CATEGORY IS SOV B SIG EXP(B)

HOV ln_income -0.049 0 0.952Trip purpose (omitted: Social) Work -2.284 0 0.102 Shopping -0.62 0 0.538 Family/Personal -0.384 0 0.681Population density/1,000/sqm (omitted: pop >10,000) Pop < 1,000 -0.071 0 0.932 1,000 < Pop < 2,500 -0.156 0 0.856 2,500 < pop < 10,000 -0.141 0 0.869Intercept 1.101 0 0.952

Bus ln_income -0.818 0 0.441Trip purpose (omitted: Social) Work -0.068 0.164 0.935 Shopping -0.555 0 0.574 Family/Personal -0.23 0 0.795Population density/1,000/sqm (omitted: pop >10,000) Pop < 1,000 -3.2 0 0.041 1,000 < Pop < 2,500 -2.586 0 0.075 2,500 < pop < 10,000 -1.788 0 0.167Intercept 6.706 0 0.441

Rail ln_income -0.073 0.008 0.93Trip purpose (omitted: Social) Work 0.013 0.848 1.013 Shopping -1.109 0 0.33 Family/Personal -0.359 0 0.698Population density/1,000/sqm (omitted: pop >10,000) Pop < 1,000 -3.364 0 0.035 1,000 < Pop < 2,500 -3.505 0 0.03 2,500 < pop < 10,000 -2.673 0 0.069Intercept -1.629 0

Bike ln_income -0.172 0 0.842Trip purpose (omitted: Social) Work -1.502 0 0.223 Shopping -1.745 0 0.175 Family/Personal -1.835 0 0.16

Population density/1,000/sqm (omitted: pop >10,000) Pop < 1,000 -1.011 0 0.364 1,000 < Pop < 2,500 -0.937 0 0.392 2,500 < pop < 10,000 -0.541 0 0.582Intercept -0.388 0.048

Walk ln_income -0.127 0 0.881Trip purpose (omitted: Social) Work -3.007 0 0.049 Shopping -1.872 0 0.154 Family/Personal -0.95 0 0.387Population density/1,000/sqm (omitted: pop >10,000) Pop < 1,000 -1.468 0 0.23 1,000 < Pop < 2,500 -1.394 0 0.248 2,500 < pop < 10,000 -1.147 0 0.318Intercept 2.058 0

N 544,329Pseudo-R2 (Nagelkerke) 0.208

Table 8. Estimated Effects of Log-transformed Income, Trip Purpose and Population Density on Mode Choice, Multinomial Logistic Regression Model, All Trip Purposes, Respondents

Aged 18-65, 2009 NHTS

In the next step, in order to realize if our model will further improve its predictability, I decided to add

“age” variable to the list of previous variables to find out the influence of the age of respondents to

select the other travel mode choice relative to SOV. Like the previous steps, to have a better

understanding of the impact of age, I made the age variable into 4 bins, as the follows:

Respondent age being 18 (minimum age of sample population) to 21

Respondent age being 21 to 35

Respondent age being 35 to 55 and finally,

Respondent age being 55 to 65 (maximum age of sample population)

I selected age of 18 to 21 to find out the influence of age on selecting all travel modes of choice relative

to SOV mode of choice for the respondents who have acquired their driver licenses either recently or

within the last 3 to 4 years. The rest of the age categories I made are quite typical. Once again, prior to

run the model, I made a travel mode choice frequency table by the variable age in its categories for all

respondents. Table 9 shows the results of the frequency distribution and figure 3 shows the stacked bar

of the same table.

MODE FREQUENCY(18<age<21)

% FREQUENCY(21<age<35)

% FREQUENCY(35<age<55))

% FREQUENCY(55<age<=65)

%

SOV 7,740 47.5 29,113 43.9

134,621 50.3 90,450 55.1

HOV 6,751 41.4 29,517 44.5

105,453 39.4 56,383 34.4

Bus 255 1.6 620 0.9 1,683 0.6 904 0.6Rail 44 0.3 286 0.4 728 0.3 453 0.3Bike 162 1.0 487 0.7 2,230 0.8 1,063 0.6Walk 1,347 8.3 6,287 9.5 23,083 8.6 14,798 9.0TOTAL 16,299 100.

066,310 100

.0267,798 100.

0164,051 100.

0

Table 9. Travel mode choice frequency by age for all respondents, Aged 18-65, 2009 NHTS

18<age<21 21<age<35 35<age<55 55<age<650%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

47.5% 43.9%50.3% 55.1%

% 44.5%39.4% 34.4%

8.3% 9.5% 8.6% 9%

Travel mode choice by age


Figure 3. Travel mode by age for all respondents, aged 18-65, 2009 NHTS

After making the age variable into 4 above categories, I ran the model and table 10 shows the results of

the new model. Again, we see the new improved pseudo R2 value from 0.208 to 0.217 which indicates

the improvement of model’s predictability. Observing table 10 and its age variable categories, we notice

most of the values under EXP (B) column are bigger than 1. We can interpret this outcome by using the

term “relative risk”. Based on SPSS, the ratio of the probability of choosing one outcome category over

the probability of choosing the reference category is called relative risk (or odds, as used earlier). So we

can conclude the relative odds of choosing other modes of travel versus SOV mode will increase by

0.298 (for HOV), 0.912 (for bus) and 0.537 (for bike) if moving from the highest level category of age

(55<age<65) to the lowest level of age (18<age<21). In general, as the age of the respondents go up, the

level of using SOV mode increases relative to the other modes of travel within the respondents with one

exception of age category of 21 to 35 which shows the level of SOV travel mode decreases in compare to

the previous age group of 18 to 21.

TRAVEL MODE

REFRENCE CATEGORY IS SOV B SIG EXP(B)

HOV ln_income -0.054 0 0.947Trip purpose (omitted: Social)

Work -2.333 0 0.097 Shopping -0.616 0 0.54 Family/Personal -0.403 0 0.668

Population density/1,000/sq.mi. (omitted: pop >10,000)

Pop < 1,000 -0.034 0.029 0.966 1,000 < Pop < 2,500 -0.125 0 0.883 2,500 < pop < 10,000 -0.12 0 0.887

Age of individual (omitted: +56) 18 < Age < 21 0.298 0 1.347 21 < Age < 35 0.664 0 1.943 35 < Age < 55 0.345 0 1.413

Intercept 0.867 0Bus ln_income -0.81 0 0.445

Trip purpose (omitted: Social) Work -0.088 0.072 0.916 Shopping -0.524 0 0.592 Family/Personal -0.236 0 0.79

Population density/1,000/sqm (omitted: pop >10,000) Pop < 1,000 -3.173 0 0.042 1,000 < Pop < 2,500 -2.555 0 0.078 2,500 < pop < 10,000 -1.767 0 0.171


Intercept 6.377 0Rail ln_income -0.064 0.02 0.938

Trip purpose (omitted: Social) Work 0 0.997 1 Shopping -1.105 0 0.331 Family/Personal -0.357 0 0.7

Population density/1,000/sqm (omitted: pop >10,000) Pop < 1,000 -3.345 0 0.035 1,000 < Pop < 2,500 -3.49 0 0.031

2,500 < pop < 10,000 -2.662 0 0.07Age of individual (omitted: +56)

18 < Age < 21 -0.067 0.642 0.935 21 < Age < 35 0.405 0 1.499 35 < Age < 55 -0.02 0.743 0.981

Intercept -1.78 0Bike ln_income -0.187 0 0.829

Trip purpose (omitted: Social) Work -1.545 0 0.213 Shopping -1.742 0 0.175

Family/Personal -1.859 0 0.156Population density/1,000/sqm (omitted: pop >10,000)

Pop < 1,000 -0.991 0 0.371 1,000 < Pop < 2,500 -0.918 0 0.399 2,500 < pop < 10,000 -0.531 0 0.588


Intercept -0.53 0.007Walk ln_income -0.131 0 0.877

Trip purpose (omitted: Social) Work -3.039 0 0.048 Shopping -1.875 0 0.153 Family/Personal -0.962 0 0.382

Population density/1,000/sqm (omitted: pop >10,000) Pop < 1,000 -1.45 0 0.235 1,000 < Pop < 2,500 -1.379 0 0.252 2,500 < pop < 10,000 -1.137 0 0.321

Age of individual (omitted: +56) 18 < Age < 21 -0.092 0.002 0.912 21 < Age < 35 0.375 0 1.456 35 < Age < 55 0.195 0 1.215

Intercept 1.964 0.000N 544,329Pseudo-R2 (Nagelkerke) 0.217

Table 10. Estimated Effects of Log-transformed Income, Trip Purpose, Population Density and Age on Mode Choice, Multinomial Logistic Regression Model, All Trip Purposes, Respondents

Aged 18-65, 2009 NHTS

Documents

Transportation mode choice modeling