Private Personalized Dynamic Ride Sharing Preeti Goel

Department of Computing and Information Systems,The University of Melbourne

Private Personalized Dynamic Ride Sharing

Preeti Goel

Submitted in total fulfillment of the requirementsof the degree of Doctor of Philosophy

Produced on archival quality paper

July, 2016

2

Abstract

Alleviating metropolitan traffic congestion is one of the major issues faced throughoutthe world. On one hand, the falling car occupancy rates indicate that more than 80% ofthe congestion can be attributed to the empty car seats. A highly promising approachto combat traffic congestion is dynamic ride sharing, which aims to utilize the transportcapacity of the existing vehicles on roads. Dynamic ride sharing is a service that enablesshared vehicle rides (on a one time or recurrent basis) in real time and at short notice.The growing use and popularity of smart phones and GPS enabled devices provides uswith tools required to efficiently implement a location-based service such as dynamicride sharing and improve car occupancy rates. However, privacy and safety concernsare one of the main obstacles faced when encouraging people to use such a service.In this thesis, we present a personalized dynamic ride sharing model built ground upwith the objective to ensure privacy, safety and trust to the users. In the first part ofthe thesis, we develop a model for dynamic ride sharing in which users submit tripintents (rather than exact location and time information) to the service provider to findpotential matches. The proposed model highlights the privacy benefits of establishedpick up or drop off locations and selects these locations randomly from the set of arterialintersections, according to the population densities of suburbs. In the second work, wepresent models for optimal pick up or drop off location selection to enhance ride sharingsuch that passengers can be picked up collectively by drivers and passengers can haveprivacy guarantees. Finally, in the third part, we produce useful data to design theselection of pick up or drop off locations such that ride sharing can be designed to takecare of special events and traffic flows which do not follow the population density ofsuburbs. Our main contribution in this research is that we have presented a privacyaware dynamic ride sharing system. We show that it is feasible to combine privacy withconvenience while maintaining utility, and our system enhances opportunities for ride

3

sharing. We present extensive experimental evaluations that validate the effectivenessof our privacy protection models and demonstrate the efficiency of our algorithms. Wedemonstrated through our dynamic ride sharing that huge benefits can be achieved inreducing congestion and overall travel km, for the city of Melbourne, with 11.6 milliontrips per weekday and an average trip length of 10.2 km, our proposed dynamic ridesharing model would save 35.95 million km per weekday.

4

Declaration

This is to certify that

1. the thesis comprises only my original work towards the degree of Doctor of Phi-losophy except where indicated in the Preface,

2. due acknowledgment has been made in the text to all other material used,

3. the thesis is fewer than 100,000 words in length, exclusive of tables, maps, bibli-ographies and appendices.

Preeti Goel

5

6

Acknowledgments

I would like to express my deepest gratitude to my supervisors, Prof. Ramamohanarao(Rao) Kotagiri and Prof. Lars Kulik, for their constant support, encouragement andguidance during my PhD study. I am grateful for their confidence on my capabilities,which always inspired me to produce quality research. I consider myself fortunate tobe part of the discussions with them involving different aspects of life and society andhave benefited immensely from their valuable advice at different stages during my life.The discussions with my advisors helped me develop myself as a creative and maturedresearcher. This thesis would not have been possible without their patience, understand-ing and advice. I would also like to thank Prof. James Bailey for his support and adviceas the chair and member of my Advisory Committee.

I would like to thank the University of Melbourne, the CIS Department and person-ally the Head of the Department Prof. Justin Zobel for providing excellent conditionsfor cutting edge research.

I wish to thank all members of SUM Laboratory for being part of my research jour-ney. I will miss our interesting group meetings and the full filled BBQ events.

I am indebted to my parents for their unconditional support, love and affectionthroughout my life. My father always encouraged me to ask questions, which trans-lated into my passion and love for research. My parents have always been proud of mewhich inspires me to work hard towards achieving my goals. Special thanks to my sis-ter Karishma Goel for her belief in me and encouragements that cheered me up at somedifficult times during my PhD candidature.

My deepest thanks to Nitin, my dear husband, for his support, optimism and love.Nothing has made the importance of family more apparent to me than the arrival of myson Yuven in the final year of my PhD. Love to my little boys, Evaan and Yuven, whoseever-smiling faces have always refreshed me and given me so much joy.

7

8

Preface

This thesis contains six chapters. The first two chapters provide an introduction tothe problem and the background and related work. The last chapter summarizes andconcludes the thesis and proposes future research directions. The remaining chapterscover the core research topics. No part of the thesis has ever been submitted for anydegree; or ever been conducted while under employment.

Chapter 3 is based on the following:Preeti Goel, Lars Kulik, and Ramamohanarao Kotagiri. 2015. Privacy Aware DynamicRide Sharing. ACM Transactions on Spatial Systems and Algorithms, AcceptedNovember 2015.

Chapter 4 is based on the following:Preeti Goel, Lars Kulik, and Ramamohanarao Kotagiri. 2015. Optimal Pick up PointSelection for Effective Ride Sharing. Under Review (IEEE Transactions on Big Data,November 2015).

Chapter 5 is based on the following:Preeti Goel, Lars Kulik, and Ramamohanarao Kotagiri. Privacy aware trajectory deter-mination in road traffic networks. In Proceedings of the 20th International Conferenceon Advances in Geographic Information Systems, SIGSPATIAL ’12, pages 406-409,New York, NY, USA, 2012. ACM.

This thesis was prepared in LATEX. The algorithms included were written in Matlaband Java, run on the Windows 7 operating systems.

9

10

Contents

1 Introduction 231.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.1.1 Benefits of Ride Sharing . . . . . . . . . . . . . . . . . . . . . 26

1.1.2 Features of Ride Sharing . . . . . . . . . . . . . . . . . . . . . 27

1.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

1.2.1 Privacy Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

1.3 The Importance of Economics for Ride Sharing . . . . . . . . . . . . . 33

1.4 Our Research Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 33

1.4.1 Privacy Aware Dynamic Ride Sharing . . . . . . . . . . . . . . 34

1.4.2 Optimal Pick up Point Selection for Effective Ride Sharing . . . 36

1.4.3 Privacy Aware Trajectory Determination for Enhancing RideSharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

1.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

1.7 Papers Resulted from this Thesis . . . . . . . . . . . . . . . . . . . . . 40

2 Background 412.1 Data Privacy: Why and How . . . . . . . . . . . . . . . . . . . . . . . 42

2.1.1 k-Anonymity . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.1.2 l-Diversity and t-Closeness . . . . . . . . . . . . . . . . . . . . 44

2.1.3 Differential Privacy . . . . . . . . . . . . . . . . . . . . . . . . 46

2.2 Location Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2.2.1 Location Privacy: What and Why? . . . . . . . . . . . . . . . . 47

2.2.2 Methods for Ensuring Location Privacy . . . . . . . . . . . . . 50

11

2.2.2.1 Anonymity . . . . . . . . . . . . . . . . . . . . . . . 50

2.2.2.2 Obfuscation . . . . . . . . . . . . . . . . . . . . . . 51

2.2.3 System Architectures for Location Privacy . . . . . . . . . . . 52

2.2.3.1 Client Server Architecture . . . . . . . . . . . . . . . 52

2.2.3.2 Centralized Trusted Third Party Architecture (TTP) . 52

2.2.3.3 Peer to Peer Architecture . . . . . . . . . . . . . . . 53

2.2.4 People’s Perception of Location Privacy . . . . . . . . . . . . . 53

2.2.5 Location Privacy Laws . . . . . . . . . . . . . . . . . . . . . . 54

2.3 Vehicle Trajectory Privacy . . . . . . . . . . . . . . . . . . . . . . . . 55

2.3.1 Anonymization (k-anonymity, l-diversity) Based Techniques . . 56

2.3.2 Differential Privacy Based Techniques . . . . . . . . . . . . . . 61

2.3.3 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

2.3.4 Recording Partial Information . . . . . . . . . . . . . . . . . . 63

2.3.5 Comparison of Models . . . . . . . . . . . . . . . . . . . . . . 64

2.4 Transport Management . . . . . . . . . . . . . . . . . . . . . . . . . . 64

2.4.1 Transport Queries . . . . . . . . . . . . . . . . . . . . . . . . . 64

2.4.2 Origin-Destination (OD) Matrices . . . . . . . . . . . . . . . . 66

2.4.3 Trajectory Estimation . . . . . . . . . . . . . . . . . . . . . . . 67

2.4.4 Prediction Models . . . . . . . . . . . . . . . . . . . . . . . . 68

2.5 Ride Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

2.5.1 Definitions and Variants . . . . . . . . . . . . . . . . . . . . . 69

2.5.2 History and Trends . . . . . . . . . . . . . . . . . . . . . . . . 70

2.5.3 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . 70

2.5.4 Privacy and Ride Sharing . . . . . . . . . . . . . . . . . . . . . 74

2.5.5 Other Shared Vehicle Transport . . . . . . . . . . . . . . . . . 75

2.5.5.1 Hitchhiking . . . . . . . . . . . . . . . . . . . . . . 75

2.5.5.2 On Demand Transportation (Vanpools) . . . . . . . . 75

2.5.5.3 Taxi Sharing . . . . . . . . . . . . . . . . . . . . . . 76

2.5.5.4 Bike Sharing . . . . . . . . . . . . . . . . . . . . . . 77

2.5.6 Popular Ride Sharing Services . . . . . . . . . . . . . . . . . . 77

3 Privacy Aware Dynamic Ride Sharing 813.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

12

3.2 Fundamental Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 85

3.2.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

3.2.2 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

3.2.3 System Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

3.3 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

3.3.1 Model 1: eBay Model . . . . . . . . . . . . . . . . . . . . . . 93

3.3.1.1 Procedure . . . . . . . . . . . . . . . . . . . . . . . 93

3.3.2 Model 2: Match Maker Model . . . . . . . . . . . . . . . . . . 94

3.3.2.1 Procedure . . . . . . . . . . . . . . . . . . . . . . . 94

3.3.2.2 Privacy based Negotiation . . . . . . . . . . . . . . . 96

3.3.3 Comparison of Models . . . . . . . . . . . . . . . . . . . . . . 98

3.3.4 Rating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

3.4 Attacker Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

3.4.1 Passenger Location Tracking Attack . . . . . . . . . . . . . . . 101

3.4.2 Driver Trajectory Tracking Attack . . . . . . . . . . . . . . . . 102

3.4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

3.5 Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

3.5.1 Driver: Optimal Path Computation . . . . . . . . . . . . . . . . 105

3.5.1.1 Complexity Analysis . . . . . . . . . . . . . . . . . 106

3.5.2 Passenger: Optimal Driver Selection . . . . . . . . . . . . . . . 108

3.5.2.1 Complexity Analysis . . . . . . . . . . . . . . . . . 108

3.5.3 Complexity Analysis of eBay model . . . . . . . . . . . . . . . 108

3.5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

3.6 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 110

3.6.1 Simulation Setting . . . . . . . . . . . . . . . . . . . . . . . . 111

3.6.2 Simulation Environment . . . . . . . . . . . . . . . . . . . . . 111

3.6.3 Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . 114

3.6.4 Ride Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

3.6.4.1 Choice of Parameter k (Number of Passengers Nego-tiating Per Driver) . . . . . . . . . . . . . . . . . . . 117

3.6.4.2 Effect of Driver to Passenger (D/P) Ratio . . . . . . . 119

3.6.4.3 Effect of User Density . . . . . . . . . . . . . . . . . 122

3.6.4.4 Effect of Driver Ellipse Size . . . . . . . . . . . . . . 122

13

3.6.4.5 Travel Time . . . . . . . . . . . . . . . . . . . . . . 122

3.6.4.6 Number of Ride Sharing Rounds . . . . . . . . . . . 125

3.6.4.7 Economic Incentive of Ride Sharing vs Taxi . . . . . 126

3.6.5 Comparison of Negotiation Strategies . . . . . . . . . . . . . . 127

3.6.6 Effect of Ride Sharing Parameters on Negotiation Strategies . . 127

3.6.6.1 Effect of k (Number of Passengers Negotiating PerDriver) . . . . . . . . . . . . . . . . . . . . . . . . . 130

3.6.6.2 Effect of DE Size . . . . . . . . . . . . . . . . . . . 130

3.6.6.3 Effect of D/P Ratio . . . . . . . . . . . . . . . . . . 130

3.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

3.7.1 Space Time Prism and Imprecision . . . . . . . . . . . . . . . 131

3.7.2 Optimal Path Computation . . . . . . . . . . . . . . . . . . . . 133

3.7.3 GPS Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

3.7.4 Game Theoretic Approaches . . . . . . . . . . . . . . . . . . . 135

3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

4 Optimal Pick up Point Selection for Effective Ride Sharing 1374.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137


4.2.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

4.2.2 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

4.2.2.1 Anonymity . . . . . . . . . . . . . . . . . . . . . . . 145

4.2.2.2 Maximum Pick up Point Distance (MPD) . . . . . . . 146

4.2.2.3 Coverage . . . . . . . . . . . . . . . . . . . . . . . . 147

4.2.2.4 Partial Coverage . . . . . . . . . . . . . . . . . . . . 147

4.2.2.5 Walkability . . . . . . . . . . . . . . . . . . . . . . . 148

4.2.3 Location Privacy . . . . . . . . . . . . . . . . . . . . . . . . . 149

4.2.3.1 Impact on Location Privacy . . . . . . . . . . . . . . 149

4.2.4 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

4.3 PuP Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

4.3.1 Greedy Randomized Adaptive Search Procedure . . . . . . . . 152

4.3.2 Our Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

4.3.3 Optimal Coverage Solution . . . . . . . . . . . . . . . . . . . . 156

14

4.3.4 Guaranteed k-Anonymity Solution . . . . . . . . . . . . . . . . 158

4.3.5 PuP Ranking Based Solution . . . . . . . . . . . . . . . . . . . 159

4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

4.4.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . 160

4.4.1.1 Maximum Pick up point Distance . . . . . . . . . . . 162

4.4.1.2 Measurements . . . . . . . . . . . . . . . . . . . . . 162

4.4.2 Dynamic Ride Sharing without PuP Selection . . . . . . . . . . 163

4.4.3 Optimal Coverage based PuP Selection . . . . . . . . . . . . . 164

4.4.3.1 k-Anonymity vs PuP Selection Strategy . . . . . . . . 168

4.4.4 Guaranteed k-Anonymity PuP Selection . . . . . . . . . . . . . 173

4.4.5 Optimal Coverage vs Guaranteed k-Anonymity . . . . . . . . . 173

4.4.6 Ride Sharing for PuP Selection Strategies . . . . . . . . . . . . 175

4.4.7 PuP Ranking within Certain Distance of POI . . . . . . . . . . 177

4.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

4.5.1 Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

4.5.2 Public Transport Access Coverage . . . . . . . . . . . . . . . . 179

4.5.3 Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . 180

4.5.4 Facility Location . . . . . . . . . . . . . . . . . . . . . . . . . 181

4.5.5 Urban Computing . . . . . . . . . . . . . . . . . . . . . . . . 182

4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

5 Privacy Aware Trajectory Computation for Effective Ride Sharing 1855.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

5.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188


5.3.1 Road Network Model . . . . . . . . . . . . . . . . . . . . . . . 189

5.3.2 Vehicle Re-identifcation and Local Transition Matrix . . . . . . 190

5.3.3 k-Anonymous l-Grouping . . . . . . . . . . . . . . . . . . . . 191

5.4 Implementation Techniques . . . . . . . . . . . . . . . . . . . . . . . . 192

5.4.1 Local Transition Matrix . . . . . . . . . . . . . . . . . . . . . 192

5.4.1.1 Vehicle Re-identification using Vehicle Image Pro-cessing . . . . . . . . . . . . . . . . . . . . . . . . . 194

5.4.1.2 Vehicle Re-identification using RFID Tags . . . . . . 194

15

5.4.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . 1955.4.2 k-Anonymity . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

5.5 Trajectory Estimation and OD Matrix Estimation . . . . . . . . . . . . 1975.5.1 Trajectory Estimation . . . . . . . . . . . . . . . . . . . . . . . 1975.5.2 OD Matrix Determination . . . . . . . . . . . . . . . . . . . . 199

5.6 Aggregate Trip Information based Ride Sharing Solution . . . . . . . . 2005.7 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 201

5.7.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2015.7.2 Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . 2025.7.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

5.7.3.1 Effect of Number of Vehicles . . . . . . . . . . . . . 2035.7.3.2 Effect of Increasing the Network Size . . . . . . . . . 2055.7.3.3 Effect of Increasing the l-Grouping . . . . . . . . . . 2065.7.3.4 Effect of Increasing the Trajectory Length . . . . . . 2065.7.3.5 k-Anonymity . . . . . . . . . . . . . . . . . . . . . . 2085.7.3.6 PuP Selection with Traffic Flow Information . . . . . 210

5.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

6 Conclusion and Future Work 2136.1 Privacy Aware Dynamic Ride Sharing . . . . . . . . . . . . . . . . . . 2146.2 Optimal Pick up Point Selection for Effective Ride Sharing . . . . . . . 2156.3 Privacy Aware Trajectory Computation for Effective Ride Sharing . . . 2156.4 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

6.4.1 Dynamic Partial Ride Sharing . . . . . . . . . . . . . . . . . . 2166.4.2 Monetary Model for Multiple Passengers Sharing Rides . . . . 2176.4.3 Impact of Uncertainty due to Congestion and Other Events . . . 2176.4.4 Integrated Public Private Ride Sharing System . . . . . . . . . 2176.4.5 Highly Dynamic Ride Sharing System . . . . . . . . . . . . . . 218

Bibliography 218

16

List of Figures

1.1 Metropolitan Traffic Congestion[1] . . . . . . . . . . . . . . . . . . . . 24

1.2 Avoidable Congestion Cost, Australia [2] . . . . . . . . . . . . . . . . 24

1.3 Privacy-Utility Tradeoff . . . . . . . . . . . . . . . . . . . . . . . . . . 32

1.4 Privacy Preserving Dynamic Ride Sharing System . . . . . . . . . . . . 35

2.1 Link Attacks [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.2 A (2,δ ) Anonymity Set: Two trajectories with their cylindrical vol-umes. The central cylindrical 2-anonymous volume with radius δ /2[4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

2.3 Anonymization (a.)Trajectories tr1 and tr2 (b.)Anonymization of tr1

and tr2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

2.4 Reconstruction Process: Reconstruction to two new trajectories . . . . . 58

2.5 Prefix Tree [5] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

2.6 Distance Graph: The p% contemporary is calculatedfor trajectory T1 = (t1

1 ,x11,y1

1) . . . (t1n ,x1

n,y1n) and T2 =

(t21 ,x2

1,y21) . . . (t

2n ,x2

n,y2n) as the percentage overlap. p =

100 × min(40/Duration(T1),40/Duration(T2)) = 29%. The dis-tance between T1 and T2 in the distance graph is the distance of thelocation coordinates in the overlap time period. . . . . . . . . . . . . . 62

2.7 Relationship Between the Trajectory k-Anonymity Based Definitions . . 64

3.1 Driver Ellipse: F1 (source) and F2 (destination) are the two foci of theellipse. a and b are the respective major and minor axis. For any pointP in the ellipse PF1+PF2 < 2a. . . . . . . . . . . . . . . . . . . . . . 89

17

3.2 Path Cost: The driver path from src to dest is used to compute the cost(cdi

P ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

3.3 Match Maker Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

3.4 Privacy Based Negotiation . . . . . . . . . . . . . . . . . . . . . . . . 96

3.5 User Connectivity Graph . . . . . . . . . . . . . . . . . . . . . . . . . 99

3.6 Recursive Ellipse Computation . . . . . . . . . . . . . . . . . . . . . . 104

3.7 Relative Reduction in Vehicle km . . . . . . . . . . . . . . . . . . . . 115

3.8 Impact of k on Occupancy, Vehicles (km) and Traffic Load Reduction:k is the maximum number of passengers in every driver ellipse withat-most 3 passenger vacancy. . . . . . . . . . . . . . . . . . . . . . . . 118

3.9 Effect of D/P Ratio: Impact of driver to passenger ratio on occupancy,vehicles (km) and traffic load reduction. . . . . . . . . . . . . . . . . . 120

3.10 Effect of User Density: Impact of user density on occupancy, vehicles(km) and traffic load reduction. . . . . . . . . . . . . . . . . . . . . . . 121

3.11 Effect of DE Size: Impact of driver ellipse size on occupancy, vehicles(km) and traffic load reduction. . . . . . . . . . . . . . . . . . . . . . . 123

3.12 Travel Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

3.13 Comparison of Negotiation Strategies . . . . . . . . . . . . . . . . . . 128

3.14 Effect of Ride Sharing Parameters on Negotiation Strategies . . . . . . 129

4.1 Maximum PuP Distance (km) . . . . . . . . . . . . . . . . . . . . . . 141

4.2 User Location Estimation based on PuP Selection . . . . . . . . . . . . 142

4.3 k-Anonymity for PuP (Colored in Red): Number of individuals in itscatchment area. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

4.4 Equilateral Triangle Network with Circle at Every Vertex. X = R√

3 . . 147

4.5 Walkability: All three PuPs are within the maximum pick up point dis-tance for passenger pi. The walkability is the distance to the nearest PuPwhich is distance from pi.to to PuP3.c. However, during the ride sharingprocess the passenger might be able negotiate a ride successfully with adriver traveling to PuP1 changing the actual travel distance. . . . . . . . 148

4.6 Voronoi Cells: For the 10 selected PuPs, k-Anonymity of a PuP is thenumber of individuals in the Voronoi cell of the PuP. . . . . . . . . . . 149

4.7 Population Density for Melbourne SA2’s . . . . . . . . . . . . . . . . . 161

18

4.8 Area of SA2’s and MPD . . . . . . . . . . . . . . . . . . . . . . . . . 161

4.9 Number of PuPs within k-Anonymity Range . . . . . . . . . . . . . . . 164

4.10 Coverage Pareto Front for Population based PuP Selection . . . . . . . 165

4.11 Walkability Pareto Fronts for Population based PuP Selection . . . . . . 166

4.12 Walkability and k-Anonymity for a PuP Placement on Pareto Front (k =10000) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

4.13 Ride Sharing Occupancy for Dynamic Ride Sharing and Optimal Cov-erage Solutions (using selected solution from coverage pareto front) . . 167

4.14 Voronoi Cells for a PuP Placement on Pareto front (k = 10000) . . . . . 168

4.15 Mean k-Anonymity (Voronoi cell) Pareto Front for Population basedPuP Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

4.16 Minimum k-Anonymity (Voronoi cell) Pareto Front for Populationbased PuP Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

4.17 k-Anonymity with 2 Closest PuP (user can select any of the 2 closest)Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

4.18 Number of Users without a PuP within MPD . . . . . . . . . . . . . . 172

4.19 k-Anonymity and Walkability (km) vs PuP Selection Strategy . . . . . . 172

4.20 Coverage and k-Anonymity Pareto Fronts for Voronoi cell based PuPSelection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

4.21 Pareto Front for the Combined Set of Solutions . . . . . . . . . . . . . 175

4.22 Occupancy vs Number of PuPs . . . . . . . . . . . . . . . . . . . . . . 175

4.23 Ride Sharing Occupancy for PuP Selection Strategies . . . . . . . . . . 176

4.24 Average Vehicle Travel Distance per Traveler . . . . . . . . . . . . . . 177

4.25 Melbourne Train Network and Selected PuP Placement . . . . . . . . . 178

4.26 Coverage and k-Anonymity (Mean) for PuP Selection within 1 km ofTrain Stations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

5.1 Local Transition Matrix Mn5 . . . . . . . . . . . . . . . . . . . . . . . 191

5.2 Local Vehicle Re-identification . . . . . . . . . . . . . . . . . . . . . . 193

5.3 False Path: A false path between node M and R could be detected be-cause of the continuous flow of traffic between the nodes . . . . . . . . 196

5.4 Trajectory Estimation between A (source) and F (destination) . . . . . . 200

5.5 RMSE vs Number of Vehicles . . . . . . . . . . . . . . . . . . . . . . 204

19

5.6 Performance vs Number of Vehicles . . . . . . . . . . . . . . . . . . . 2055.7 Performance vs Network Size . . . . . . . . . . . . . . . . . . . . . . 2055.8 Effect of l-Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . 2075.9 Performance vs Trajectory Length . . . . . . . . . . . . . . . . . . . . 2085.10 System k-Anonymity, k ≥ 7 with Confidence = 80% . . . . . . . . . . . 2095.11 Boundary k-Anonymity vs Performance . . . . . . . . . . . . . . . . . 2095.12 Ride Sharing Occupancy for PuP Selection with Traffic Flow vs Opti-

mal Coverage Solution . . . . . . . . . . . . . . . . . . . . . . . . . . 210

20

List of Tables

2.1 Medical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.2 2-Anonymous Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.3 Attacks on 2-Anonymous Table . . . . . . . . . . . . . . . . . . . . . 442.4 Comparative Analysis of Vehicle Trajectory Privacy Models . . . . . . 652.5 Properties of Markov Chains Describing Road Network Properties . . . 682.6 History of Ride Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . 702.7 Popular Ride Sourcing Services . . . . . . . . . . . . . . . . . . . . . 79

3.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 873.2 Ride sharing Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 933.3 Melbourne Transport Indicators (VISTA) . . . . . . . . . . . . . . . . 1123.4 Experiment Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

4.1 Comparison of Travel Modes . . . . . . . . . . . . . . . . . . . . . . . 1384.2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

5.1 OD Matrix Row for Source Node A . . . . . . . . . . . . . . . . . . . 2005.2 Road Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2025.3 Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2025.4 Experiment Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

21

22

Chapter 1

Introduction

1.1 Motivation

Alleviating metropolitan traffic congestion (Figure 1.1) is one of the major issues facedthroughout the world. In Australia, the traffic volume has doubled in the last 25 years.It has been estimated that traffic congestion would cost the Australian economy morethan $35 billion by 2030 (Figure 1.2). Adding to the congestion are rising car ownershiprates which have increased by almost 40% in the last three decades [6]. On the otherhand, car occupancy rates have been falling in most developed countries, for instance, inAustralia the average car occupancy is between 1.15 and 1.25 [7]. The low occupancyrates suggest that more than 80% (assuming an average of 5 car seats) of this annualdelay cost is generated by the empty seats and fewer than 2 in 10 commuters sharingthe ride to work.

In the last two decades, the growth in traffic has been countered by measures such assignificantly enhancing the road network capacity (building new infrastructure such astunnels, freeways and increasing number of lanes) and improving the public transport.However, the high costs and amount of shrinking urban space limits the extent to whichnew road infrastructure can be built in the future. Public transport is also only viable indenser population areas. Use of smart cars which can help shorten the trip by findingalternate routes can be one of the future measures to congestion, but still the numberof vehicles on the roads remains the same. A key approach to mitigate congestion canbe transport sharing which utilizes the existing vehicles on road. Therefore, one of the

23

Chapter 1. Introduction

Figure 1.1: Metropolitan Traffic Congestion[1]

Figure 1.2: Avoidable Congestion Cost, Australia [2]

24


most promising approaches is sharing of transport or the existing vehicles on roads.This leads to the introduction of ride sharing as a complementary approach to addressthe growing traffic congestion.

Static ride sharing, in which the users share/post their trip information in advancehas been around since the time of World War II (1939-1945). The overhead of prear-rangement (where the users post their trips in advance), pays off by facilitating longertrips or recurrent commuter routes (rides from home to work and vice versa). The pre-arrangement requirement indicates that participants agree to provide their trip details inadvance (usually hours or a day or two), which makes static ride sharing attractive to re-current and scheduled user trips. Although, recurrent or static ride sharing can be easierto operate, optimize and implement it cannot fully exploit the potential opportunities forride sharing since significant amount of travel is inherently spontaneous. In such casesthe pre-commitment of time and route can be very limiting. In addition, most static ridesharing systems require users to post their trip information with additional identificationwhich raises privacy concerns. Car pooling and static high occupancy vehicle lanes areexamples of static ride sharing. In static systems, the need for prearrangement of ridesand additional information overhead can provide only limited benefits in increasing theuptake of ride sharing [8].

This motivates the need for dynamic ride sharing to increase the opportunities forshared instant trips which can increase car occupancy rates. Dynamic ride sharing isa service which facilitates shared rides instantly or on short notice. The drivers arematched to passengers with whom they can share one time trips while driving to theirrespective destinations. The growing use and popularity of smart phones and GPS en-abled devices provides us with tools required to efficiently implement dynamic ridesharing. Since dynamic ride sharing offers flexibility to its users, it can exploit the ex-isting vehicles on roads and enjoy economies of scale due to increased participation.Avego1, Uber2, Lyft3 classify as modern dynamic ride sharing applications. However,these services are more ride sourcing than ride sharing since there are dedicated driversand they are not individuals sharing rides on the way to their respective destinations.These services do offer the option to enable sharing rides, however the supply comes in

1https://carmacarpool.com/2https://www.uber.com/3https://www.lyft.com/

25

https://carmacarpool.com/

https://www.uber.com/

https://www.lyft.com/


the form of dedicated drivers similar to cabs (the current Taxi system can also accom-modate ride sharing as an option to the passengers).

1.1.1 Benefits of Ride Sharing

Ride sharing offers the following benefits:

• Congestion Reduction: Traffic congestion leads to an increase in average traveltimes, increased vehicle operating costs (due to the frequent start-stops) and in-creased driver stress and inconvenience. The aggregate congestion cost estimatesfor Australia, 2015 financial year are approximately $16.5 billion[2]. This na-tional metropolitan total is comprised of approximately $6 billion in private timecosts (losses from trip delay and travel time variability), $8 billion in business timecosts (trip delay plus variability), $1.5 billion in extra vehicle operating costs, and$1 billion in extra air pollution damage costs. By utilizing the existing vehicle ca-pacity, ride sharing can help bring down congestion specially in peak traffic hourswhen ride sharing is most feasible.

• Cost: Other than reducing the travel cost by reducing congestion ride sharingalso offers incentive to drivers and passengers by sharing the trip cost (fuel, toll,parking). The drivers can earn revenue by charging passengers for the trip and theincentive for passengers is that the ride sharing trip cost is lower than taking a cabor private vehicle.

• Travel time: Car travel times are expected to increase by at least 20% (even dou-ble) in urban Australia by 2030 [9]. In addition to the reduced travel time due todecreased congestion (specially in peak traffic hours), ride sharing vehicles canalso take advantage of the existing High Occupancy Vehicle (HOV) lanes.

• Convenience: Ride sharing services give an alternate travel option to users otherthan taking the public transport, cab or private vehicle. Some ride sharing servicesalso let users give their preferences for type of vehicle, for example: Uber has aluxury alternative which provides luxury vehicles for passengers at an added cost.Also, since there are a limited number of parking spots in cities and popular work

26


destinations, ride sharing can also help reduce driver stress to find a parking spotby reducing the number of cars heading for a destination.

• On Demand: Unlike cabs, which are fixed in number for a city, ride sharing vehi-cles can operate on demand basis. Every transport system goes through periods ofpeak demands, the public transport systems (including cabs) cannot be designedfor these periods of peak demands and this is when a model like ride sharing canhelp to ease traffic congestion by utilizing the existing vehicles on the roads.

• Environment: The impact of congestion on environment is catastrophic and citiesare now taking desperate steps to combat the growing levels of heavy air pollu-tion. For example, recently in the city of Beijing a red alert day was declared(more than 3 consecutive days of smog) [10] since the poisonous particles in theair were an order of magnitude higher than the safe level set by the World HealthOrganization. The city came to a halt with schools being asked to close (or op-erate with air filtration systems), traffic and factory restrictions. There have beenattempts to reduce the the number of cars on the road by limiting them accordingto their last digit on number plate (odd and even numbers operating on alternatedays) in the cities of Beijing, China and Delhi, India [11]. Ride sharing helpsreduce the number of vehicles on the road which helps in reducing vehicular airpollution, noise pollution and fuel consumption.

In today’s world, ride sharing is one of the most promising approaches to combatthe problem of ever increasing number of cars which often results in traffic jams duringpeak hours in cities. There is an increased interest in ride sharing with the launch ofservices such as Avego, coseats, Zimride, Uber, flinc, et cetera, which are targetingthis area. Therefore, ride sharing can be an important step for overcoming congestionproblems significantly in the future.

1.1.2 Features of Ride Sharing

We distinguish between static and dynamic ride sharing models. The static model re-quires the participants to prearrange mostly for recurrent ride sharing while dynamic(real-time, ad-hoc, on-demand or instant) ride sharing is a service that requires an au-tomated system facilitating one time (and mostly non-recurrent shared driving) shared

27


trips for drivers and passengers in real time or on short time notice [12]. Static as wellas dynamic ride sharing systems can have the following features:

• Cost sharing which provides economic incentives for both drivers and passengers.

• Independent decision making which considers drivers and passengers as privateentities (unlike cabs), making independent decisions with respect to ride sharing.However, most ride sharing systems simply consider user preferences to matchthe best possible candidates without further user choice.

• Optimizing system metrics such as travel time, travel distance or number of pas-sengers.

The dynamic ride sharing services collect user location and time information to fa-cilitate a user matching which raises significant privacy concerns. The collection ofsuch highly precise spatio-temporal data over an extended period of time can lead toinferences about an individual’s beliefs, friends, preferences, religion or health withouttheir knowledge or permission. Lyft and Uber have recently faced user privacy con-cerns as they collect sensitive user trip information and their privacy policies have beenquestioned with the amount of access to its employees [13, 14]. In addition, there areuser safety concerns, for instance, Uber has recently faced widespread backlash whereit had been banned in a number of cities in India, and Europe (Berlin) due to user safetyincidents [15, 16].

Further, an important area in ride sharing is personalization. Ride sharing has inde-pendent users with individual preferences such as type of vehicle, driver gender, smok-ing status, travel time, user rating and cost. Trust is the degree to which a system usercan rely on the integrity, ability (driving ability) or behavior of another user. It deter-mines whether the two would like to consider each other for a possible ride share. Aride sharing system needs to implement proper authentication and ratings for all usersto ensure trust. A ride sharing service will be attractive which provides users a per-sonalized experience by incorporating their preferences. Social networks can be usedto establish trust and accountability for the users. Ride sharing services such as Uberand Lyft require users to provide a rating as a measure of safety and satisfaction. Uberprovides different types of services and pricing based on the type of vehicle in its Lux,Exec and UberX models.

28


This thesis will develop ride sharing models that are built with the objective of pri-vacy and safety for its users while offering personalization.

1.2 Challenges

For a ride sharing service to be successful it needs to tackle the challenges presented fordesigning a personalized ride sharing system which builds user trust and handles usersafety and privacy.

In contrast to static ride sharing, dynamic ride sharing has additional and unique setof challenges:

• On demand trips that can be arranged in real time or on short notice.

• Ability to offer non-recurring one time trips according to the current traffic condi-tions.

Ride sharing has different variants where the drivers can either require passengertrips to be inclusive (both passenger source and destination are part of driver’s trip) orpartial (either the passenger source or destination or both fall outside the driver trip).Further, the routes can either be the original unchanged driver routes or can includedetours to pick up passengers. In addition, the drivers might choose to share rides witha single or multiple passengers [17].

In contrast to on demand transport systems such as taxi sharing service [18], dial-a-ride service [19], ride-sharing has the added complexity of managing user trust, privacyand safety. In a taxicab service the users place their trust in the central operationalbody which manages all the taxis and their registered drivers. They have additionalsafety requirements such as some cities have all taxicabs with installed cameras and thetaxicabs are monitored centrally by GPS. A taxi system has accurate information aboutthe taxi (drivers) locations and occupancy which is used to make decisions and diverttaxis to pick up additional passengers.

However, in a dynamic ride sharing system both the drivers and passengers are in-dependent users who can dynamically register for trips. In addition the number ridesharing vehicles can be several thousands and much larger than the number of taxis. Asystem wide rating based mechanism is needed to indicate the trustworthiness of each

29


system participant. In addition, privacy is a greater concern as individuals might notwant to disclose their identities and exact location and time information to drivers (orvice versa) unless an agreement of ride sharing has been reached. The drivers and pas-sengers can accept or decline a ride share offer depending on their preferences. Unlikeon demand transport systems, which have a fixed number of drivers running in the sys-tem, the number of drivers in the dynamic ride sharing system keeps changing. Thismakes it difficult to predict the driver availability with high probability. Additionally,there can be users preferences (such as female only, non smoker) which can be ensuredin a taxicab system but adds additional constraints for choosing candidates in a ridesharing system. So the ride sharing system needs to be personalized to consider the userpreferences such as rating, vehicle type and non smoker.

Ride sharing with multiple passengers can only be effective if we have highly ef-ficient algorithms computing on demand trips such that the occupancy of vehicles ismaximized. One approach to compute on demand trips is considering this problem as acontinuous detour query problem [20] in which the objective is to find the shortest pathbetween two location with stop-overs in between. However, the constraints in ride shar-ing are more challenging. In dynamic ride sharing every stop-over can have multiplepassengers, and every passenger has a corresponding destination stop-over. Committingto one passenger involves committing to go through another stop-over. In addition, thelocation and time constrains of the passengers also need to be considered.

1.2.1 Privacy Issues

The ability of any location based application collecting location data raises privacy con-cerns for individuals. The user trip information can be collected and historical trajec-tory data is available to the service providers. Analysis of such highly precise spatio-temporal data when collected over an interval of time can lead to identification of in-dividuals, their home/work locations, and their behavior. Highly detailed informationabout a person’s beliefs, friends, preferences, religion, health can be inferred by analyz-ing the location data without their knowledge or permission. This information can beused to influence, target or threaten the individual [21, 22]. We believe that concernsabout privacy and user safety after recent incidents with popular ride sharing applica-tions such as Uber and Lyft [13, 14, 23] have highlighted the need for privacy when

30


sharing location data.

Data can be inherently imperfect or purposefully degraded for privacy purposes.There can be different kinds of uncertainty:

Imprecision: Imprecision is the limitation on the granularity of the collected data.For spatio-temporal data it can be either spatial or temporal imprecision.

• Spatial Imprecision: Huge volume of data can be collected if the data monitoringis at every node (street) in the network. However, it might be needed to onlycollect the data at some important points which give the same semantic meaningto the trajectory. The selection of network nodes can depend on importance of anedge. For example: Major or arterial roads can have more data collection pointsthan minor roads. For ride sharing the limited number of pick up points reduce thepossible points of data collection of an individual’s location information. Anotherexample of spatial imprecision is obfuscation or data masking such that the datais deliberately degraded to protect an individual’s privacy. For example, insteadof giving the exact location on the street, a user only provides an approximatelocation by providing the name of the suburb the person is present in.

• Temporal Imprecision: How frequently is the data collected, is another factorwhich controls the amount and quality of data collected. Sparsely collected datacan lead to shape distortion of the original trajectory but can lead to higher privacy.

Incompleteness: What is the extent to which the data is collected? The vehicle canbe completely identified by reading the unique identifier registration number or RFIDtag but this leads to zero privacy. One approach is to use a changed or suppressedidentifier for the vehicle but it still can lead to persistent tracking. Another approach isto collect partial or incomplete information of the vehicle which could be its color orbrand (Section 5.4.1.1).

Inaccuracy: If the collected data is deviating from its true values that the datacollection is inaccurate. For example, noise in data adds to inaccuracy.

Vagueness: Vagueness is the imprecision in concepts used to describe a concept.For example, distance from a point described as far or near is vague as there is no setdefinition for near in terms of metric units and it can mean different distance for differentindividuals.

31


Individual's Privacy Utility of the

data

Very high privacy demands leading to low utility

rivacy

Pr

UtilityUtility

Figure 1.3: Privacy-Utility Tradeoff

The research challenge associated with analysis of the collected data is to ensureadequate privacy for an individual while being able to meet the objective of the desiredapplication. Privacy for an individual and efficiency of the requested service are con-flicting goals. Figure 1.3 shows the privacy-utility trade-off. As shown in the figure,the privacy-utility curve can be different for different applications. The optimal privacysolution for a desired level of accuracy depends on the query or the system application.Some type of queries (for example: count queries) might be able to get higher privacyguarantees while providing acceptable accuracy. Participatory sensor networks gathersensory data (weather, mobility, environment) from a group of people to retrieve infor-mation for applications such as urban mobility, congestion, health and wellness. Anindividual might participate in a service and volunteer to share personal data with theincentive to get a personalized service or a better overall service. Companies exchangedata-sets to improve or tailor their services to customers.

The aim is to find an optimal solution which satisfies privacy requirement specificto a user while still extracting useful information for providing the service.

32


1.3 The Importance of Economics for Ride Sharing

Economics is an important driver of ride sharing, with the right incentives ride sharingcan be much more lucrative than cabs.

Ride sharing drivers might need to take detours to pick up/drop passengers. Thetypical costs for ride sharing drivers are fuel, vehicle maintenance, toll, parking andincreased trip time. In addition to these costs, the ride sharing drivers might need to paysome annual charges or share some percentage of their earnings with the ride sharingapplication facilitating the service. For example: a driver using the Uber service needsto share around 20% of their revenue with Uber. The drivers charge the passengersbased on their costs and sometimes also take demand-supply into account (for example,travel at odd hours or low vehicle availability hours might incur higher cost).

The economic incentive for the passengers is that the cost is lower compared totaking a cab. For Melbourne, the cost of taking a taxi is the sum of: $3.20 for the flagfall, $2 for the booking fee and $1.617 per km 4. On the other hand, the cost of using anUberX vehicle is the sum of: $2.25 for the flag fall, no booking fee, $0.40 per minuteand $1.15 per km 5.

Compared to a cab driver, ride sharing drivers do not have to buy expensive permits,which is the main reason of lower operating costs. For instance, in New York City thepermits are called medallions, and they have been sold for up-to $1 million. The highprices for these licensee is the result of the fixed number of permits against the highdemand.

Any computational model simulating ride sharing has to incorporate an economicmodel. This thesis makes contributions to the development of an economic model forride sharing.

1.4 Our Research Problem

Our research focus is on dynamic or real-time ride sharing, which requires users toshare their location and time information for finding a ride. We personalize user searchby taking care of their preferences and build models considering multiple objectives:

4http://www.taxifare.com.au/rates/australia/melbourne/5https://www.uber.com/cities/melbourne/

33

http://www.taxifare.com.au/rates/australia/melbourne/

https://www.uber.com/cities/melbourne/


privacy and efficiency. Our work is divided into three main areas (Figure 1.4):

1. Privacy aware dynamic ride sharing: In this research, we present a privacy awarepersonalized ride sharing model. We present ‘Match Maker’, a negotiation-basedride sharing model which hides exact location information data for system partic-ipants to preserve privacy. We use the concept of imprecision (not being preciseabout location of the user out of set of n locations) and follow the idea of ob-fuscation, which equates a higher degree of imprecision with a higher degree ofprivacy.

2. Optimal Pick up Point Selection for Effective Ride Sharing: We present a ridesharing approach where the pick up/drop off locations for passengers are selectedfrom a pre-defined fixed set. We present a scheme that optimally chooses fixedlocations of Pick up Points (PuPs) and aims to maximize the car occupancy rateswhile preserving user privacy (and safety through video surveillance). Addition-ally, if PuPs are fixed such that they remain the same at all times, they might addto the congestion and become traffic bottlenecks so we need to balance traffic bychanging PuPs dynamically over the day. Our method enhances privacy as theusers do not need to provide their precise home/work locations, or more generallystart and end location of a trip.

3. Enhancing ride sharing opportunities using aggregate data: Origin-destinationmatrices capture the spatial and temporal distribution of traffic demand, whichcan be a vital input for placement of PuPs. Collection of trajectories lead to theestimation of the different traffic flows between the source destination pairs takenby the vehicles. We present a privacy aware model to compute origin-destination(OD) matrix and the corresponding trajectories by recoding partial vehicle infor-mation. We measure the impact of OD matrix information for PuP selection onride sharing.

1.4.1 Privacy Aware Dynamic Ride Sharing

In this research, we present a privacy preserving dynamic ride sharing system. Driversand passengers register for a ride share dynamically and are matched based on their

34


Aggregate OD flows PuPs (can change to balance load, handle special events)

Traffic flow updates

Figure 1.4: Privacy Preserving Dynamic Ride Sharing System

preferences and ratings. A driver ellipse describes the path area within which a drivercan travel, to reach the destination within a pre-defined time budget. We use the driverellipse to select the candidate passengers and compute the optimal path for ride sharing.The dynamic ride sharing problem with time windows has been proved to be NP-hard[24, 25]. We present an effective algorithm with recursive ellipse based search spacereduction for finding (semi-)optimal paths.

We describe two models to establish trust between drivers and passengers: eBaymodel and Match Maker model. We have termed the standard model as eBay modelwhich is used by most application to implement ride sharing where a trusted centralserver (location service provider) has access to complete identity, location and timeconstraints information for all participating users. The service provider makes the ridesharing decisions (by matching the nearest or best available driver to a passenger) andcommunicates the optimal route to the drivers. We propose the Match Maker model,which is stronger in terms of privacy as imprecise location and time information isshared with the central trusted body. Its role is to only check the identity and form a onetime communication link between the candidate drivers and passengers. The locationinformation is negotiated by starting with a larger negotiation area which is refinedin every iteration for successful negotiations. We follow the idea of obfuscation [26]where the location information in deliberately hidden or degraded by users to protecttheir privacy.

We present and compare strategies to negotiate the location data between candidatepassengers and drivers. We describe the attack model for the system and evaluate thevulnerabilities of the eBay and Match Maker models.

35


1.4.2 Optimal Pick up Point Selection for Effective Ride Sharing

In this research we address the problem of finding the optimal number and placement ofPick up Points (PuPs) such that it balances optimal coverage and occupancy rates whilepreserving user privacy. With our proposed fixed PuPs model, not only the users haveenhanced privacy (since they are k-anonymous) and safety (monitored PuPs) but theirchances of ride sharing also increase as more drivers are likely to visit the PuPs withhigh passenger availability.

Designing a ride sharing system has to cope with two main challenges: maximizecar occupancy rates while ensuring privacy and safety concerns of users. This leads totwo fundamental questions about the pick up or drop off points: how many points andwhere should they be located? If the number of these points is decreased then privacyincreases but it is likely to increase trip time and reduce occupancy rates. On the otherhand, if the number of points is increased then privacy decreases (in particular if weassume that individuals choose their nearest point). There can be two strategies for PuPselection, completely adhoc and dynamic or fixed. For example, a cab based systemoperates under the assumption that users can find a cab at any time and at any requiredlocation. However, this strategy can often lead to passengers not finding a ride becausea match for a cab and a rider is not available. An alternative is fixed pick up locationssuch as taxi stands where the chances of finding a waiting cab is much higher. We adopta similar strategy and present a system which has fixed predefined locations of pick uppoints (which can be changed on demand or for load balancing) and aims to maximizethe car occupancy rates.

The main idea of a Pick up Point (PuP) is that every PuP defines a circular catchmentarea centered at the PuP. The radius of the circle is the maximum distance any individualwithin the catchment area needs to travel to reach the PuP. A point on the map is 1-covered if it is inside the catchment area of at least one PuP. The aim is to ensure thatevery point of a city’s area is covered by at least one PuP while minimizing the totalnumber of PuPs.

The need for anonymity motivates our coverage model where an individual is k-anonymous if at least (k−1) other people could also choose the same PuP. Since we havedifferent circle radii for suburbs based on their population density, every PuP coversroughly the same number of people. Therefore, the anonymity achieved by the PuPs for

36


the number of individuals covered remains the same. The opportunities for ride sharingnow depend on the number of PuPs available to a user within her walkable distance andthe number of drivers likely to travel to these selected PuPs.

We have implemented GRASP (Greedy Randomized Adaptive Search Procedure)which provides us an optimization framework and tailored it to develop appropriateprocedures to find Pareto front solutions of optimal PuP placements. We have imple-mented optimal coverage solution based on suburb population density which does notguarantee privacy. In the second solution we guarantee k-anonymity and evaluate itscost on coverage. We compare the impact of different PuP selection strategies on ridesharing.

1.4.3 Privacy Aware Trajectory Determination for Enhancing RideSharing

In this work we propose a model to generate aggregate trip information in the form oforigin destination matrices and vehicle trajectories to improve ride sharing by selectingappropriate PuPs.

An important tool used for effective real time traffic management is the origin desti-

nation (OD) matrix which is a “trip table” that displays the number of trips going fromeach origin to each destination. The objectives of estimating the trip distribution couldbe to replicate the spatial pattern of trip making, or to account for the spatial separationamong origins and destinations (in terms of time or cost), or to make strategic planningand management decisions of transportation networks. While trip distribution tells theproportion of trips between pre-selected sources and destinations, it does not specify thepath taken between them.

One way to monitor traffic is to keep track of each vehicle completely using its fullidentifier (ID) (example, vehicle registration number or RFID) at every node of its trip.Monitoring each vehicle for the whole trip using the full ID to create the OD matrixor trajectory information will result in 100% accuracy. However, this data can be usedto track an individual’s location which could lead to risk of location based attacks (e.g.threats to personal safety, location based unsolicited marketing [21]) and compromiseprivacy.

To estimate the trajectories and create an OD matrix we do not need to gather such

37


precise information for every vehicle. Our approach preserves privacy by gathering onlypartial information of the vehicles. A partial ID of the vehicle can be created which isused to track the vehicle across the network. However, if the same partial ID is usedacross the system it becomes a persistent ID which in-turn can also be used to identifya vehicle uniquely.

To avoid the privacy threats associated with the identification of vehicles, we intro-duce re-identification of vehicles locally only up to three consecutively traveled nodesto build the matrix. Vehicle re-identification is the process of matching vehicles fromone point on the road network to the next without any unique vehicle identifier. Thistechnique preserves privacy as we do not fully identify the vehicle but use the partialidentifier locally in the network.

Our ride sharing PuP selection model utilizes the aggregate information of travelstored in the format of OD matrices. The origins and destinations represent a partitionof the map at some granularity. Since the trip patterns differ at different times of dayand weekend/weekday traffic patterns so we have accommodated the trip informationin the map to find PuPs which are more effective for that time. The trip information isencoded using the available OD matrix such that in every partition (suburb) it is markedas the location of the destination/source moving from/to it. The objective is to grouptravel to similar destinations (sources) together within a PuP coverage such that it canprovide a higher chance of ride sharing with people going to similar destinations whoare picked up collectively by drivers.

1.5 Contributions

In this thesis, we make the following contributions:

• We develop an efficient privacy aware model (Match Maker) for dynamic ridesharing. We compute the driver’s ellipse as the set of all possible locations thedriver can possibly visit within a time budget. We use the ellipse to shortlist can-didate passengers and develop a recursive ellipse based algorithm which reducesthe search space based on the driver’s time constraints and computes the optimaldriver path. We develop an efficient local greedy algorithm running at the driver’sside rather than a global optimal solution (NP Hard) computed by the service

38


provider.

• To preserve privacy and safety, we propose the Match Maker model based on ne-gotiation, which hides exact location information for system participants whileimplementing privacy preserving ride sharing. We present an attacker model andcompare the privacy implications. We investigate the impact of different nego-tiation strategies used by passengers to negotiate their location information withcandidate drivers.

• To ensure trust we devise a rating mechanism to incorporate feedback, driverbehavior and similarity with passengers. We design a rating mechanism whichtakes into account the user’s feedback and personalizes the ride sharing experienceby matching users according to their preferences.

• We propose a partial coverage model with different circle radii based on suburbdensities to select pick up points for the ride sharing model. We develop modelsenabling privacy (k-anonymity) with optimal coverage. We propose a Voronoidiagram based guaranteed k-anonymity solution and compare it with the optimalcoverage solution for effective ride sharing. We investigate the impact of differentPuP selection strategies on ride sharing. Our observations reveal that our PuPbased ride sharing model can save between 23-40% (average of 31.5%) of vehiclekm. In Melbourne with nearly 4.5 million population and an average trip lengthof 10.2 km, this would save 35.95 million km per weekday.

• We design a partial information based technique using local transition matrices tocompute the traffic flows without the need of a prior sample or target OD matrix.The method not only computes the traffic demand but also predicts the traffic pathsor trajectories. The method preserves privacy of the vehicle owners by using onlypartial information to build the local data structures. We introduce the conceptof k-anonymous l-grouping. We use this aggregate information to compute PuPswhich are more effective for a particular time of day.

1.6 Outline

The rest of the paper is organized as follows:

39


Chapter 2: We discuss background study for data privacy, location privacy and ve-hicle trajectory privacy. We survey the existing works related to transport managementand ride sharing.

Chapter 3: We propose our Match Maker model for privacy preserving dynamicride sharing, where the users do not need to trust the central trusted service provider andnegotiate among themselves to find possible rides.

Chapter 4: We present our approach to select optimal locations and number of pickup points for effective ride sharing.

Chapter 5: We design a model to compute origin destination matrix and estimatetrajectories in privacy preserving manner by using vehicle re-identification and collec-tion of partial vehicle information. We present our solution for enhancing ride sharingby selecting pick up points based on this aggregate travel information.

Chapter 6: We conclude the thesis and discuss the possible future research direc-tions for privacy preserving dynamic ride sharing.

1.7 Papers Resulted from this Thesis

• Preeti Goel, Lars Kulik, and Ramamohanarao Kotagiri. 2015. Privacy Aware Dy-namic Ride Sharing. ACM Trans. Spatial Algorithms Syst., Accepted November2015

• Preeti Goel, Lars Kulik, and Ramamohanarao Kotagiri. 2015. Optimal Pick upPoint Selection for Effective Ride Sharing. Under Review.

• Preeti Goel, Lars Kulik, and Ramamohanarao Kotagiri. Privacy aware trajectorydetermination in road traffic networks. In Proceedings of the 20th InternationalConference on Advances in Geographic Information Systems, SIGSPATIAL ’12,pages 406-409, New York, NY, USA, 2012. ACM.

40

Chapter 2

Background

Dynamic ride sharing is a location based service where the users location and time datais collected to match potential passengers and drivers. The location data (xi,yi, ti) is inthe form of a latitude (xi), longitude (yi) pair along with time (ti) information. Thisdata can be considered sensitive when collected over extended periods of time and canlead to the estimation of a user’s source/destination stops eventually estimating vehicletrajectories. In Section 2.1, we discuss the main data privacy techniques and modelswhich are applicable to location and vehicle trajectory privacy. Section 2.2, discusseslocation privacy, various location privacy models, architectures and techniques. Wealso discuss people’s perception of location privacy and the different privacy policiesand laws around the world. In Section 2.3, we survey the existing work in the vehicletrajectory privacy domain and compare the approaches according to their applicabilityto static or dynamic update scenarios and the privacy technique used.

Traffic flow estimations, origin-destination matrices can be used as aggregate data toimprove ride sharing. In Section 2.4, we present the different types of queries requiredfor transportation management and survey the existing approaches with a particular fo-cus on estimation of origin-destination matrices, traffic flow patterns and congestionprediction. Finally, in Section 2.5 we discuss the ride sharing variants, definitions, his-tory and trends. We survey the existing work for ride sharing and review the approacheswhich emphasize privacy. Further, we study the other modes of shared transport andpresent details of some existing ride sharing services.

41

Chapter 2. Background

2.1 Data Privacy: Why and How

Large volume of sensitive personal data is collected on a day to day basis by organi-zations like hospitals, financial institutions, government agencies, social networks andinsurance companies. The purpose of the organization can be to extract and publishuseful information about the data while protecting the private information of individu-als. Data privacy laws, regulations and issues come up when an organization wants toshare or publish private data of individuals while at the same time want to protect theirpersonally identifiable information.

There are two main research focuses for implementation of data privacy.

• Perturbation: addition of noise

• Anonymization: anonymize data records

The aim is to release or publish the private data such that there can be guarantees tothe participating individuals that they cannot be identified while the released data is stilluseful. Now, we will discuss some of the protection models based of the techniques ofperturbation and anonymization.

Records in a data set are uniquely identified by a set of some identifiers associatedwith it. The first step towards anonymization is the removal of these identifiers. Forexample; If the Name, Phone Number, Address fields are removed from the MedicalData published by a hospital it can be assumed to be anonymous. However, it is notsufficient as there are still link attacks which can be done to uniquely identify indi-viduals [3]. For example: Two sets of data can be obtained from different sources likeVoter Registration List and the anonymous data released by Medical Organization fromwhich the unique identifiers have been removed. These two can be linked on the basisof their common attribute (Zip Code, Birth Date, Gender) to identify individuals andtheir medical conditions. (Figure 2.1)

Dalenius [27] introduced Quasi Identifiers as the maximal set of attributes whichcan uniquely identify an individual. For example, locations (zip code, address, regions),dates (date of birth, admission, discharge), gender, ethnic group etc.

Definition 2.1.1. Quasi Identifier (QI).A set of attributes in a private table which, incombination, can be linked with external information to re-identify the respondents towhom information refers [27][28].

42


Figure 2.1: Link Attacks [3]

2.1.1 k-Anonymity

k-anonymity [3] is a privacy protection model where the information of each individualcannot be distinguished from at least k−1 individuals whose information also appearsin the database. k-anonymity provides a “blend in the crowd" approach to privacy [29]in which it is assumed that there exists a table owner who can recognize the attributesas Quasi Identifiers and private or sensitive attributes.

Definition 2.1.2. k-anonymity. Let RT(A1, . . . ,An) be a table and QIRT be the quasi-identifiers associated with it. RT is said to satisfy k-anonymity if and only if eachsequence of values in RT[QIRT ] appears with at least k occurrences in RT[QIRT ] [3].

It is achieved by two main approaches:

• Generalization: a public attribute value is replaced with a less specific but seman-tically consistent value. For example: (Age 31 =⇒ [30−35])

• Suppression: suppressing or removing values of individual attributes such that thevalue is not released at all.

Table 2.2 shows the 2-anonymous version of the medical data in Table 2.1 by gen-eralization. There are some limitations of k-anonymity which are discussed in the nextsection which proposes some extensions to the technique.

43


Zip Code Age Gender Condition3025 34 Male Heart Disease3020 39 Male Heart Disease3021 33 Female Heart Disease3020 31 Female Cancer3065 54 Male Heart Disease3065 46 Male Flu3087 22 Male Cancer3088 25 Male Flu

Table 2.1: Medical Data

Zip Code Age Gender Condition302* 3* Male Heart Disease302* 3* Male Heart Disease302* 3* Female Heart Disease302* 3* Female Cancer306* ≥ 40 Male Heart Disease306* ≥ 40 Male Flu308* 2* Male Cancer308* 2* Male FluTable 2.2: 2-Anonymous Table

Homogeneity Attack

Background Knowledge Attack

Zip Code Age Gender Condition302* 3* Male Heart Disease302* 3* Male Heart Disease302* 3* Female Heart Disease302* 3* Female Cancer306* ≥ 40 Male Heart Disease306* ≥ 40 Male Flu308* 2* Male Cancer308* 2* Male Flu

Table 2.3: Attacks on 2-Anonymous Table

2.1.2 l-Diversity and t-Closeness

There are attacks possible in which case k-anonymity fails to provide privacy which are:

• Homogeneity Attack: It is possible when the sensitive attribute lacks diversity inits values. For example: If the attacker knows that he is trying to find informationfor his neighbor (zip code 3024) Mark who is nearly 33 year old, then looking atthe 2-anonymized table 2.3 he knows that Mark has Heart Disease.

• Background knowledge attack: When the attacker has background knowledge.For example: If the attacker knows that Mark his 25 year old neighbor (zipcode:3087) is seriously ill and was taken in an ambulance then looking at the ta-ble 2.3 he knows that he has either Flu or Cancer. Since he knows he is seriouslyill and in hospital for a long time he knows that he has Cancer.

44


Let Q be the Quasi Identifiers in the table T s.t. for a record t, t[Q] = q , and q∗ bethe generalized value of q in T ∗ be the published table where t∗[Q] = q∗. A q∗-block isthe set of tuples in T ∗ whose nonsensitive attribute values generalize to q∗ [30]. A blockis “well represented" if all possible values of the sensitive attribute occur in the blockwith roughly the same proportions.

Definition 2.1.3. l-Diversity A q∗-block is l-diverse if contains at least l “well repre-sented" values for the sensitive attribute. A table is l-diverse if every q-block is l-diverse.[30]

There are some limitations of l-diversity:

• When a sensitive attribute has very different degree of sensitivity. For example:Medical records indicating the HIV status of patients. 1% of the HIV positiverecords are highly sensitive while 99% of the HIV negative records informationis not that sensitive.

• 2-diversity might not be needed for an equivalence class or q-block that containsonly negative records.

• l-diversity does not protect against probabilistic inference attacks.

• Similarity attack: Since l-diversity does not consider the semantic meanings ofsensitive values there could be blocks with very similar values of the sensitiveattribute hence providing little or no privacy [31].

t-closeness is a privacy definition that requires that the distribution of a sensitiveattribute in any quasi identifier group is close to the distribution of the sensitive attributein the overall table.

Definition 2.1.4. t-closeness An equivalence class is said to have t-closeness if the dis-tance between the distribution of a sensitive attribute in this class and the distributionof the attribute in the whole table is no more than a threshold t. A table is said to havet-closeness if all equivalence classes have t-closeness. [31]

45


2.1.3 Differential Privacy

Differential Privacy model claims that it provides a stronger privacy guarantee thanthe above discussed models as it does not require any assumptions about the data andprotects against attackers who know all but one row in the database. It provides privacyguarantees irrespective of the background knowledge and computational power of theattacker. Differential privacy originated as a model for interactive databases and seeksto limit the knowledge that users obtain from query responses. It was later extended toprivacy preserving data publishing.

For data publishing, both t-closeness and ε−differential privacy limit disclosure asan attacker’s prior knowledge is limited to their prior knowledge about the distributionof confidential attributes [32].

Differential Privacy model [33][34] provides privacy guarantees where the overallinformation gained by an adversary does not change, whether an individual opts to bepresent or absent in the data. The objective is: if a disclosure occurs when an individualparticipates in the database, then the same disclosure also occurs with similar probability(within a small multiplicative factor) even when the individual does not participate.

Differential privacy attempts to break the strong correlation between the output of afunction (or algorithm) from its set of inputs such that a small change in the input dataset should not cause huge variations in the output.

Definition 2.1.5. ε−Differential Privacy: A randomized algorithm or function K is ε

differentially private if for all data sets D1 and D2 differing in at most one record, andfor all possible anonymized data sets D [35][33][34],

Pr[K(D1) = D] = eε ×Pr[K(D2) = D]

ε is the privacy parameter of privacy budget and it controls the amount of privacy inthe system. A lower value of ε would imply a higher privacy guarantees.

Differential privacy is implemented by adding correctly calculated random noise tothe output function. The Laplace mechanism adds noise as per the Laplace distributionto the function and is useful for answering numerically valued queries [36]. For a func-tion f on a database X, the output should be f (X), the Laplace mechanism adds noisesuch that K = f (X)+ Laplace Noise (depending on the sensitivity of f)

46


If the function K maps the inputs to non numerical attributes like strings then addi-tion of noise does not solve the purpose. The issue of preserving ε-differential privacyfor such a function has been addressed by the exponential mechanism [37].

Composition property: If the function K is ε-differentially private, then Kt is tε-differentially private. So, when consecutive queries are executed and each maintainsε-differential privacy then the total differential privacy gets added over all the queries.

The implementation challenges of differentially private systems are:

1. Selection of optimal value of ε: How to reduce ε while still maintaining utility ofthe results.

2. Dynamic database: Most of the work of differential privacy works with staticdatabases stored in a central server.

3. Time series data or highly co-related data: For time series data, the compositionproperty adds up the noise added at each timestamp to generate huge amount ofnoise in the series.

4. Differential privacy does not protect against attackers auxiliary information.

2.2 Location Privacy

2.2.1 Location Privacy: What and Why?

With the modern GPS aware devices location data is widely collected and used for appli-cations like traffic monitoring, route planning, friend finding, advertisement deals, emer-gency services et-cetera. Location Based Service (LBS) is a service which is offered tousers based on their location information. It is a convergence of technologies such assmart phones, internet and GIS that makes LBS possible. It is accessed by users fromtheir mobile device, through the mobile internet available on the device while sharingtheir geographical location (and other required information) with the service provider.The quality of service provided by the LBS depends on the type of service, underlyingarchitecture and the accuracy of the data collected. Highly precise data is collected forlocation based services which is either a single location update or continuous location

47


updates depending on the kind of service. Analysis of the such highly precise spatio-temporal data when collected over an interval of time can lead to identification of in-dividuals, estimation of trajectories (identification of popular source/destination points)and eventually their behavior. A lot of information about a person’s beliefs, preferences,religion, health can be inferred by analyzing the location data.

For example:

• It could be determined that an individual visits a doctor (hospital) for how longand with what frequency. This information can be used by insurance providers todiscriminate or force higher premiums.

• The location of an individual’s home, office, friends and family can be used bypeople with harmful intent to follow his or her activities and send personal threats.

• Identification of the location of stores/shopping centers an individual visits canlead to the identification of his or her shopping frequency, pattern and can alsoindicate the socio-economic status.

Location privacy is the ability of an individual to control the access of his current orpast location information[21]. It is the right of an individual or group to control when,how and how much of their personal location data is being shared or revealed to a thirdparty. The loss of location privacy can lead to the following type of issues [21] [22]:

• Inference Attacks: An individuals location can be linked together to make a rangeof inferences about his political views, religion, sexuality, friends and family. Thisinformation can be used to influence, target or threaten him.

• Location based advertising (LBA) is to spam an individual with offers dependingon the physical location collected using mobile phones. LBA can not only irri-tate the mobile user with useless messages but also disrupt his relationship to hissurroundings [38].

• Threats to personal safety: Stalking or physical attacks.

The challenges to implement location privacy are:

48


• Location privacy requirements: The location privacy requirements differ forindividuals based on their understanding and need for the same. It also varies forthe same user at different places and at different times.

• Privacy vs utility trade-off: Location information is useful for providing andenhancing services while unauthorized access to such information can reveal a lotof sensitive information about an individual. The best quality of services that canbe achieved for a required level of privacy is also application dependent.

There the three phases of location information extraction:

1. Data Collection: Location data is collected by a service provider or organization.For example: When you request for a routing service, your current location keepsgetting updated to the service provider.

2. Data Analysis: These location updates can be collected and analyzed to enhancethe service or for data mining purposes. For example: The route finding servicecollects the historical movement data of an individual and uses that to determinehis driving behavior to suggest personalized routes in the future.

3. Data Publishing: The collected data or some statistics of the data can be publishedand that should not leak an individuals private information.

Also, the privacy goals may vary the selection of model and the implementationtechnique. Users could require:

• User Location Privacy: Users want to hide their location information and in mostcases the query content.

• User Query Privacy: Users want to hide their query content but revealing locationinformation is not a concern. For instance, a query regarding the nearest barlocation.

• Trajectory Privacy: Users want to avoid revealing enough information (spatiotemporal points) which can be linked together to form a trajectory. We discussthis in Section 2.3.

In [21, 39–41], the authors present detailed surveys of location privacy methods,architectures, threats and attacks.

49


2.2.2 Methods for Ensuring Location Privacy

Other than privacy laws and policies the methods for ensuring location privacy are:

2.2.2.1 Anonymity

The location information is separated from the identity of the user. One of the simplesttechnique to achieve anonymity would be replace the user ID with a pseudonym, therebyseparating the identity from the location information. An individual is therefore anony-mous but has a persistent identity . However, in case of location data, pseudonymitydoes not work as the pseudonyms can be linked for location information which couldbe reverse geo-coded to infer the identity of the user (identity inferred over a period oftime from home/work locations). In [42], the authors proposed "mix zones", the idea ofchanging pseudonyms for location privacy. Mix zones are areas where users cannot betraced and their identity is mixed with all the other users in the mix zone. In addition,the pseudonym gets changed every time the user enters the mix zone. This prevents anattacker from linking pseudonyms as the new pseudonym could belong to any user inthe mix zone.

An important technique for anonymization is cloaking which could be spatial orspatio-temporal. In spatial cloaking the user location is represented as a bigger regionwhich includes the exact user location. Temporal cloaking reduces the frequency oftemporal information by delaying a query until enough users are present in the region.In [43], the authors present a spatial cloaking based adaptive quadtree-based algorithmthat decreases the spatial resolution of location information according to the number ofother people in the same quadrant. The authors extend the quadtree-based approach tospatio-temporal cloaking by adapting the spatial resolution by decreasing the temporalresolution for the same anonymity constraint.

The cloaking techniques guarantee anonymity by extending the principle of data k-anonymity such that the users location information cannot be distinguished from k-1other users. The concept of spatial k-anonymity [44] is based on the principle of k-anonymity for data privacy (Section 2.1.1). The exact user location is replaced withinan anonymized spatial region such that it contains at least k-1 other users. Therefore,an individual cannot be distinguished from at-least k-1 other people and an attacker canonly pinpoint the user location with 1/k probability.

50


However, anonymity based approaches do not work for applications that requireauthentication or offer personalization [45].

2.2.2.2 Obfuscation

Location obfuscation is defined as the deliberate degradation of the quality of locationinformation (by being imprecise, inaccurate or vague) in order to protect an individual’slocation privacy [26, 46]. The authors discuss the following three methods to degradethe quality of location information:

• Imprecision: The user location information is imprecise as instead of providinga single location the user provides a bigger set which includes the user location.The greater the set the higher is the achieved level of privacy [26, 43]. In [26],the authors present an imprecision based negotiation model for obfuscating prox-imity queries (finding nearest POI). The user protects his or her location privacyby reporting a set of location which includes the exact location. The algorithmcomputes the proximate POI and negotiates for more information from the userfor best solution.

• Inaccuracy: The user location information is not present in the location informa-tion set provided to the service provider. The greater is the distance of the user’sreal location to this set, the higher is the achieved level of privacy [46, 47].

• Vagueness: A user’s description of the location information is vague such thatexact boundaries are not defined. For example: use of phrases such as "near-far","close", "within five minutes of walking" to describe location [48].

Location obfuscation is complementary to the concept of anonymity. Unlikeanonymity based techniques where the identity is anonymized and the user is "one inmany", in obfuscation based techniques the user identity is known and the quality oflocation information is degraded. Obfuscation assumes that the user can control thegranularity of location information shared with the service provider. Since obfuscationdoes not hide identity, it is more suitable for applications which require user authentica-tion. In addition, since the quality of location information has been degraded it becomesharder to infer the user identity.

51


2.2.3 System Architectures for Location Privacy

Different system architectures can be applied to implement a privacy preserving LBSusing the location of privacy methods described above. In [49], the authors discussthe issues of obtaining high quality of service and anonymization for different systemarchitectures for privacy preserving LBS. We will now discuss three different systemarchitectures.

2.2.3.1 Client Server Architecture

This is the centralized architecture where the users communicate directly with the server.There are inaccurate or false position based approaches. In [50], the authors presenta false dummies based approach where the user send n-1 false dummies or locationsalong with his or her true location information. The server returns an answer set whichcontains the answer to the true location as well. The user then computes the exactrequired answer from the set. In [51], the authors present a landmark based approachwhere the space is divided into Voronoi cells based on certain POIs. The user sendsthe location of his or her nearest landmark to the server to get an answer. In theseapproaches either the quality is service is low by getting approximate solutions [51] orthe server load is very high when the server responds for multiple locations for everysingle query [49].

2.2.3.2 Centralized Trusted Third Party Architecture (TTP)

In this architecture, there is a centralized trusted third party which is responsible forimplementing privacy. The trusted party also known as the location anonymizer, isbetween the users and the LBS provider. The users communicate their exact location tothe trusted third party which then anonymizes and sends the query to the query server.The response is sent back to the location anonymizer which then forwards it to thecorresponding user. It is usually assumed that the communication channels betweenthe users and the anonymizer are secure while the communication channels betweenthe anonymizer and the service provider can be public. Many approaches have beendesigned using this architecture based on cloaking, k-anonymity and obfuscation [26,43, 52–54].

52


However, as with any centralized system this system has the disadvantage of havingthe location anonymizer as a single point of failure and also a potential bottleneck. Also,the system privacy is zero if the third party itself gets compromised.

2.2.3.3 Peer to Peer Architecture

Peer to peer (P2P) is a de-centralized architecture and eliminates the need of a central-ized trusted third party to meet privacy requirements. The mobile users communicatedirectly with each other using P2P communication technologies such as 802.11 andBlue-tooth to work cooperatively with other users and form cloaked spatial regions. Thequerying user discovers peers either via single hop or multi-hop communication to forma cloaked region which satisfies his or her k-anonymity or privacy area requirements.Once a cloaked region is formed and peers are identified, one of the peer is randomlychosen as a query agent. This agent gets the query and cloaked area from the user andcommunicated with the server on behalf of the user. Finally, the answer set is returnedto the real user who then computes the exact answer from it [55]. In [56, 57], the au-thors present distributed privacy preserving architectures where the users communicate(encrypted communication links) with each other through a base station.

P2P architecture does not have the shortcomings of the centralized TTP based ar-chitecture. However, it has other design and implementation challenges. In [58], theauthors discuss the challenges and research issues in implement P2P architecture forLBS. The challenges in peer searching include defining the hop distance to find k-1

peers and estimation of trustworthiness of peers. The challenges in forming a spatialcloaked region includes movement uncertainty since the peers keep moving and thecloaked region needs to be adjusted accordingly. Finally, the limitations of the mobileenvironment including battery power and network disconnection also need to be studied.

2.2.4 People’s Perception of Location Privacy

The ability of ride sharing systems or any location based application collecting locationdata raises privacy concerns for individuals. The user trip information can be collectedand historical trajectory data is available to the service providers. Analysis of suchhighly precise spatio-temporal data when collected over an interval of time can leadto identification of individuals, their home/work locations, and their behavior. Highly

53


detailed information about a person’s beliefs, friends, preferences, religion, health canbe inferred by analyzing the location data without their knowledge or permission. Thisinformation can be used to influence, target or threaten the individual [21, 22].

However, the question remains to what extent people care about location privacy.Although earlier studies [22, 59, 60] seem to indicate that people are less concernedabout privacy, more recent work indicates the opposite [61].

[60] conducted a (fictional) study with 74 undergraduate computer science studentsto collect their precise location information (via mobile phone) for one month. He foundthat the students set a median price of £10 to reveal their personal location tracks forresearch purposes. The median set price doubled if the data was going to be used forcommercial purposes. After conducting group interviews with 55 people from differ-ent backgrounds, [59] found that the people were not worried about their privacy usinglocation-aware services. However, he noted that “It did not occur to most of the intervie-wees that they could be located while using the service.” [22] conducted a GPS surveywith 219 people from Microsoft to provide recorded GPS data for two weeks for a 1in 100 chance of winning a $200 MP3 player. 97 of them were asked if the data couldbe used outside Microsoft and only 20% denied this. It may be that the implicationsof sharing location data were not well understood by the participants. However, we be-lieve that concerns about privacy and user safety after recent incidents with popular ridesharing applications such as Uber and Lyft [13, 14, 23] have highlighted the need forprivacy when sharing location data.

A recent survey [61] was conducted as part of the Data Privacy Day (held annuallyon January 28) to understand people’s concerns about location privacy and awarenessabout the use of location based services. The survey reported that 52% of people ex-pressed strong concern with sharing their location with other people or organizations.A clear majority of respondents expressed concern about sharing their location withoutconsent (84%), having personal information or identity stolen (84%) and overall loss ofprivacy (83%).

2.2.5 Location Privacy Laws

There are government policies in place in different parts of the world to protect anindividual’s location privacy. There are data privacy laws which are general rules in

54


the data protection directives but now new issues are being recognized specific to thetracking and collection of GPS data.

Location privacy and its possible impacts are slowly becoming a concern for generalpublic and governments. Recently, Location Privacy Act 2012 was proposed in Califor-nia which required law enforcement to obtain a warrant before collecting any GPS orlocation data from a person’s electronic device like a cellphone 1.

On May 16, 2011, EU’s Article 29 Data Protection Working Party (WP29) adoptedan opinion setting out privacy compliance guidance for geo-location services on smartmobile devices 2. The conclusions of the opinion are not law, they become law only ifthe EU itself or EU member states choose to pursue the recommendations in the opinion.They regard location data as sensitive data and require service providers to obtain priorconsent before using the data.

2.3 Vehicle Trajectory Privacy

The location data collected of a moving vehicle provides the movement patterns for avehicle which is the vehicle’s trajectory. Such data when collected over an interval oftime can lead to identification of the driver, the driver’s sensitive location informationand lead to location privacy issues.

Vehicle trajectory privacy has different components:

• Vehicle Identity Privacy: The identity of the vehicle owner should not be disclosedwhile his trajectory is being monitored. Some techniques use pseudonyms but iflong term tracking of the vehicle is done there are still linking attacks possiblewhich can identify the vehicle owner with high accuracy [22, 62]

• Trajectory sensitive location privacy: The trajectory of the user can be consideredto be made of some sensitive locations and safe locations. The sensitive locationscould be his home, destination and some important stops in between. Publishingor recording the trajectory data should not lead to identification of these sensitiveareas [63].

1https://www.eff.org/cases/california-location-privacy-act-20122http://ec.europa.eu/justice/policies/privacy/docs/wpdocs/2011/wp185_en.pdf

55

https://www.eff.org/cases/california-location-privacy-act-2012

http://ec.europa.eu/justice/policies/privacy/docs/wpdocs/2011/wp185_en.pdf


The privacy of a vehicle can be addressed at two different levels.

• The first one is at the time of recording the vehicle information by recordinganonymous, suppressed or aggregate data [64].

• The second approach is to address the privacy issues at the time of data publishingwhich is static trajectory data publishing ([4], [65]).

The challenges in this addressing vehicle trajectory privacy are:

• Understand the requirement of applications and the minimum amount of datagranularity required to achieve the purpose.

• Address privacy issues such that an optimal balance between privacy and accept-able accuracy can be maintained.

• Define a measure of trajectory privacy which can be compared and modeled fordifferent vehicle privacy components.

There are different traffic monitoring models which use the vast amount of spatialdata to model better traffic management solutions but the issue is to preserve the driver’sprivacy. One of the challenges in this research domain is that there is no single definitionor measure of trajectory privacy. However, different security mechanisms have beenapplied for trajectory privacy depending on the objective or the application. This sectioncompares and discusses the exiting work and measures of trajectory privacy.

The simplest approach is to record anonymous trajectories by omitting personalidentification data from the movement data. This has been used by applications likeINRIX 3, which is a provider of traffic information, directions and driver services. How-ever, there are re-identification attacks [22, 62] possible where data mining algorithmsexploit unique features in trajectories to identify information like travel behavior, homelocation and use reverse geocoding to track individuals.

2.3.1 Anonymization (k-anonymity, l-diversity) Based Techniques

Another privacy concept frequently applied to trajectories is k-anonymity (Sec-tion 2.1.1).

3http://www.inrixtraffic.com/

56

http://www.inrixtraffic.com/


Figure 2.2: A (2,δ ) Anonymity Set: Two trajectories with their cylindrical volumes.The central cylindrical 2-anonymous volume with radius δ /2 [4]

The k-anonymity model makes assumptions about the background knowledge of theattacker and many privacy attacks have been identified on them. A stronger model basedon k-anonymity is l-diversity in which the table is anonymized such that the sensitiveattribute (which the attacker wants to determine) has at least l different values havingroughly the same frequency [30].

For a trajectory, which is a sequence of location and timestamp updates, k-anonymity should apply to all points in the trajectory. The dataset has to be anonymizedsuch that every trajectory is indistinguishable from at least k−1 other trajectories.

There are different trajectory anonymity models built using k-anonymity which as-sume the presence of a central trusted server responsible for the secure anonymization.

In [4], the authors present a clustering based trajectory anonymity algorithm wherethe trajectory is not a polyline in 3-dimensional space but a cylinder volume. They takeinto account the imprecision introduced by the positioning systems and combine trajec-tories with overlapping cylinders. They define (k,δ ) anonymity in which a trajectoryis indistinguishable from at least k− 1 other trajectories which are bound by a cylin-drical volume of radius δ (Figure 2.2). This work assumes the presence of a centralserver which is responsible for the anonymization as a post processing of the trajecto-ries before publishing. Also since the published trajectories are in the form of aggregatecylinders further processing may be required for applications which need to work onatomic trajectories.

57


tr2 tr1 tr2 tr1

a b

Figure 2.3: Anonymization (a.)Trajectories tr1 and tr2 (b.)Anonymization of tr1 and tr2

c Figure 2.4: Reconstruction Process: Reconstruction to two new trajectories

58


In [65], the authors define trajectory k-anonymity and present a generalization andreconstruction based approach for trajectory data publishing. Figure 2.3 & 2.4 showsthe anonymization and the reconstruction process. The anonymization is done by usinga grouping algorithm in which close trajectories are added into groups until there are k

trajectories in the group (the rectangles in the figure represent the anonymized regions).Unmatched points are suppressed in this step (Figure 2.3.b). They reconstruct atomictrajectories by uniformly selecting atomic points from anonymized regions (Figure 2.4).

So far, we have discussed work for trajectory privacy in traffic monitoring ([4], [65]).Mobile devices with internet connectivity (3G, Wi-Fi) are widely used for accessinglocation based services (LBS) in moving vehicles. Some examples of LBS are: turnby turn navigation to an address, receiving alerts (traffic jam), location-based mobileadvertising ([66], [38]). When the mobile user needs to send continuous updates of itslocation for a service it becomes a trajectory.

Mokbel and Chow [66] present a survey of the different techniques used for LBS andstatic trajectory publishing. They differentiate between continuous and snapshot LBSand compare the applicability of specific techniques (Spatial Cloaking, Mix Zones, Pathconfusion, Dummy trajectories) for them. However, they do not focus on the definitionand measures of vehicle trajectory privacy.

Spatial Cloaking is a technique in which the user’s location is cloaked or hidden intoa bigger region consisting of more users. For k-anonymity this will mean that k−1 otheruser’s are reported. For continuous LBS there are spatial cloaking mechanisms in whichthe location based service is delayed or the minimum bounding box size is increased toinclude the k−1 users of the previous timestamp [67].

Mix zone is a pseudonym changing technique in which k users enter the mix, spenda random time inside the mix zone such that the probability for a user to take any exitremains the same. Therefore, an adversary cannot match the incoming user pseudonymsto the outcoming ones. The concept of mix zone has been extended to vehicular mixzones [68] such that the mix zone region is not an exact rectangle but adaptive segmentsto protect against the spatial and temporal constraints of the road networks.

Dummy trajectories [50] is a technique in which a user injects fake trajectories(along with his true trajectory) in the system. The fake trajectories are created such thatmaximum confusion can be added for an adversary to link the correct location samplesof the user.

59


In [63], the authors present two anonymization definitions for trajectories:K−frequent and K−Present trajectory anonymity. The proposed model is for dynamictrajectory updates for location based services. K−present (weak) trajectory anonymityidentifies K− 1 vehicles that are close to the requester at the time of request and thuscould have issued the request (from the viewpoint of the service provider). On the otherhand, K−frequent (strong) trajectory anonymity collects the vehicles that are near therequesting vehicle at the time of request and for whom the current route of the request-ing trajectory is also frequent. The model assumes the presence of a trusted server thatcollects the user movement and reconstructs movement patterns for them.

A l-diversity based location privacy model for road network is presented by Wang et.al. [69]. A user’s location is l-diverse if it is k-anonymous and contains at least l differ-ent road segments. The model creates a user’s (u) profile (δ u

k ,δ ul ,σu

t ,σus ) customized

to his privacy requirements where δ uk and δ u

l specify the k-anonymity and l-diversityrequirements. σu

s and σut specify the maximum spatial and temporal tolerances. It is

assumed that there exists a trusted third party anonymizor which receives the exact userquery and is responsible for anonymizing and deanonymizing the results according tothe user’s privacy profile. In XSTAR the neighboring queries are grouped into a cloak-ing star structures which are then merged to form super-star structures if necessary tofulfill the privacy requirements while minimizing the communication and computationcosts.

In [70], the authors presents a suppression based model for trajectory anonymiza-tion. They consider the trajectory publishing scenario in which the vehicle ID (and someother sensitive information) is published along with the trajectory data. They proposethe (K,C)L privacy model which requires that every sub-sequence in the trajectorydatabase is shared by at least K− 1 other records (C = confidence threshold and L isis the maximum length of the background knowledge). It is a generalized version ofk-anonymity (C = 100% and L = length(tra jectory))and l-diversity concepts. It canimpose different levels of security depending on the chosen values of K,C and L.

In [71], the authors argue that k-anonymity concept of relational databases cannot beapplied directly for moving objects. According to them, the quasi identifiers for mov-ing objects cannot be fixed set of identifiers as in relational micro-data. They present adefinition of k-anonymity for moving objects with different sets of QI’s and present al-gorithms to compute anonymization groups satisfying the definition. They also present

60


L3 ‐> L17

L1 ‐> L2 ‐> L3‐>L46

L3 ‐> L1 ‐> L25

L3 ‐> L4 ‐> L14

L1 ‐> L23

L1 ‐> L4 ‐> L22

L1 ‐> L2 ‐> L3 1

TrajectoryID #

Root

L1:4 L3:3

L2:3 L4:1

L2:1L3:2

L1:2 L4:1

L1:1L2:1

Figure 2.5: Prefix Tree [5]

the idea of information loss which measures the loss is the probability to accurately de-termine the position of the moving object over an interval of time. However, this workdoes not describe a suitable way to provide the quasi identifiers for each moving objector trajectory. In case of road networks, one way is to ask the vehicle owner to providethe QI’s, a user might not be well equipped to decide the QI’s to achieve the requiredamount of privacy. Another way is to use some historical data analysis for the purpose.

In [72], the authors present a privacy aware framework (PAM) which updates themobile client about when and how to send the location update to the server. The locationupdates are sent as a bounding box rather than exact locations. Using these updates theserver computes a safe region which contains k-users including the mobile rectangle. Itimplements k-anonymity by collaborating between user and a central server.

2.3.2 Differential Privacy Based Techniques

In [5], the authors present a differential privacy model for publishing trajectory data.This work assumes that locations are known as discrete spatial areas in a map. A tra-jectory is an ordered set of locations visited by an individual. A prefix tree is createdfor the trajectory database where root contains all trajectories in the database and everynode represents a location prefix and a count of the set of trajectories having the prefixto the node (Figure 2.5). A noisy version of this tree is now created which for a treeheight h adds Laplace noise to count at every node in the tree.

In this algorithm there are two parameters which control the privacy guarantees,

61


t11 = 10 t12 = 60 tm2 = 200 tn1 = 100

Overlap time = (t12 – tn1) = 40

T1

T2

(x11, y1

1) (xn1, yn

1)

(x12, y1

2) (xm2, ym

2)

Distance Graph

T1 T2dist(T1 , T2 )

Figure 2.6: Distance Graph: The p% contemporary is calculated for trajectory T1 =(t1

1 ,x11,y1

1) . . . (t1n ,x1

n,y1n) and T2 = (t2

1 ,x21,y2

1) . . . (t2n ,x2

n,y2n) as the percentage overlap. p=

100×min(40/Duration(T1),40/Duration(T2)) = 29%. The distance between T1 andT2 in the distance graph is the distance of the location coordinates in the overlap timeperiod.

ε and the height of the tree h. The algorithm to build the noisy prefix tree starts byfixing the height of the tree h to add ε = ε

h Laplace noise at every level of the tree. Bycomposition property of differential privacy model the total noise thus gets limited to ε .The evaluation is doing by performing two types of queries, count queries and frequentsequential pattern mining queries. The experiments in this work are done with realpublic transit data from Montreal where a trajectory is a set of locations. The locationdata is not timestamped, however, the authors claim that it can be extended to includetimestamps as well.

2.3.3 Metrics

In [73], the authors propose a time to confusion metric and present a path cloakingalgorithm to meet privacy guarantees. They measure the degree of privacy as the Meantime to confusion, the time an adversary could correctly follow a trajectory. The pointsin the GPS trace which satisfy the a confusion threshold are published and the rest arecloaked. Their algorithm provides privacy to drivers in low traffic density areas as well.

62


In [74], the authors propose a distance metric to publish synthetic trajectories whichpreserve the locations covered by the original trajectories. The proposed distance mea-sure considers both spatial and temporal features of trajectories and can compare trajec-tories that are partially time wise overlapping or not overlapping at all. They computep% contemporary trajectories as the ratio of overlap in terms of overlap time and thedistance between them as the distance of the location coordinates in the overlap timeperiod. A distance graph is then created where trajectories are nodes and the p% con-temporary trajectories have edges with the distance as the weights on the edges (Fig-ure 2.6). A clustering algorithm is applied to the graph to get the k-connected clusterswhich represent the anonymization groups. Transformed trajectories are finally pub-lished in which each (location,time) triple is swapped with another original trajectory inthe anonymization group within some temporal and spatial constraints.

2.3.4 Recording Partial Information

We now discuss traffic monitoring models which preserves privacy at the time of record-ing the trajectory data.

In [64], the authors provide the concept of Virtual Trip Lines (VTL). VTL are geo-graphical markers where the vehicles can provide their location updates. The markerscan be positioned on a partial road network graph (higher traffic roads like such as high-ways and arterial) while avoiding sensitive areas (like hospitals). The proposed archi-tecture separates the identity and location information by encrypting them with differentkeys and splitting the vehicle identity authentication and location data processing tasksto separate servers. The user controls his location privacy as he downloads the VTLmarker list into his mobile handset and updates the location information at the markersonly. This is a distributed architecture which requires the establishment of an ID Proxyserver, a traffic monitoring service provider and a VTL generator to communicate withthe user’s mobile phone. VTL’s limit the update of location information only to the setof pre-decided geographic markers.

63


(K,C)L

k-anonymity

- (k, δ), (K-Present), (K-Frequent)

l-diversity

Figure 2.7: Relationship Between the Trajectory k-Anonymity Based Definitions

2.3.5 Comparison of Models

Figure 2.7 shows the relationship between the different k-anonymity trajectory privacydefinitions discussed so far ([70], [63], [4], [65]) using a Venn diagram. As discussedl-diversity [30] is a subset (stronger privacy) of k-anonymity which anonymizes the datasuch that the sensitive attribute is also well represented. The (K,C)L privacy model([70]) is a super-set as it can be specialized to k-anonymity and l-diversity by varyingthe values of K,C and L.

Table 2.4 presents a comparison of the all the above discussed models categorizedby the the technique used, the trajectory privacy model described, the application areaand the whether the privacy issues are addressed at the time of recording the raw data orlater when publishing the collected data.

2.4 Transport Management

2.4.1 Transport Queries

There are many type of queries that need to be answered for transportation managementpurposes. Here are a few examples:

1. How many vehicles travel on this road segment in a day or particular hours of aday or year?

64


Paper Technique Trajectory PrivacyModel

Application Area

[4] Clustering (k,δ ) anonymity Static trajectory publishing[65] Generalization and

trajectory recon-struction

k-anonymity Static trajectory publishing

[63] Safe route computa-tion

K−frequent andK−Present

Dynamic Location Updates,LBS

[70] Suppression basedsubsequence match-ing

(K,C)L Static trajectory publishing

[71] Generalization k-anonymity for Mov-ing Objects

Static trajectory publishing

[74] Micro-aggregation Distance Metric Static trajectory publishing[69] Location

anonymizationmodel X-Star

k-anonymity, l-diversity

Dynamic Location Updates,LBS

[5] Noisy prefix tree Differential Privacy Static trajectory publishing[73] Cloaking Metric: Mean Time to

confusionStatic trajectory publishing

[72] Mobile data updatein a bounding box

k-anonymity Dynamic Location Updates,Continous Spatial Queries

[64] Virtual Trip Lines Record partial informa-tion

Dynamic Location Updates,Travel Time Estimates

Table 2.4: Comparative Analysis of Vehicle Trajectory Privacy Models

2. How many vehicles travel on a route in a day or particular hours of a day or year?

3. How many vehicles travel from one part of the road network to another (say froma suburb to city)?

4. Which road segments become congested simultaneously?

5. How does a link affect the road network interms of usage changes and traveltimes? This might be needed to schedule maintenance works.

6. What are the routes taken between two points of the road network and how manyvehicles take the route?

65


7. What the are most popular or frequently used routes at a particular time of theday?

8. Which roads have traffic behavior (vehicles/hr at a time of day) differing signifi-cantly from normal?

The queries that are important from a user’s point of view are:

1. Find the shortest or fastest path to a destination

2. Find a path which uses my familiar road network to plan a route to destination

3. Find a path which uses the simplest route to the destination

4. In between a route, a user might want to avoid congestion while driving to adestination

5. Plan the fastest path to a destination via stops A→ B→C→ D

There are different types of tools used to answer the transport management queriesthe choice of which also depends on the data collected. We will now discuss some ofthe important ones.

2.4.2 Origin-Destination (OD) Matrices

An important tool used for effective real time traffic management is the origin destina-

tion (OD) matrix which is a “trip table” that displays the number of trips going fromeach origin to each destination. The objectives of estimating the trip distribution couldbe to replicate the spatial pattern of trip making, or to account for the spatial separationamong origins and destinations (in terms of time or cost), or to make strategic planningand management decisions of transportation networks. While trip distribution tells theproportion of trips between pre-selected sources and destinations, it does not specify thepath taken between them.

Traditionally, the existence of a prior or target OD matrix is assumed which providesguidance in the determination of OD matrix. The target or prior OD matrix can be anold OD matrix or a result from a sample survey. This is the basis of a number of

66


traffic modeling approaches ([75]), maximum likelihood approaches ([76]), Bayesianapproaches ([77]) and generalized least square methods ([78]).

Estimation of OD matrices becomes much easier if there is data for every vehicletrajectory on the road network. New types of traffic sources like GPS traces, mobilephone logs, smart car records, road camera recordings, registration plate scanners canbe used to collect such data. [79] estimated the OD matrices and path flows using linkcounts and a dense network coverage of registration plate scanners. [80] develop a like-lihood and generalized least square based statistical model for OD estimation using linkcounts and GPS routing information for a fraction of vehicles on the network. [81] esti-mated OD flows using mobile phone location data collected whenever a user places(orreceives) a call, or sends(or receives) a message, or connects to the internet.

2.4.3 Trajectory Estimation

While origin-destination matrices reports the proportion of trips between pre-selectedsources and destinations, it does not specify the path taken between them. A spatialtrajectory is a sequence of data points where each data point records location informa-tion (geospatial coordinate set) and a time-stamp [82]. The computation of the trafficpaths or trajectories is the traffic modeling step in which the trip distribution numberscan be input to road traffic simulators to generate the traffic paths. However, additionaltraffic information is required as a large number of trajectory solutions can satisfy everyOD-pair value. This can answer questions about the routes taken between two locationsof the road network, the number of vehicles that take a particular route or the frequentlyused routes at a particular time of the day.

In [83], the authors survey the existing research in trajectory data mining. The workexplores the correlations and differences between the different trajectory mining tech-niques used for trajectory classification, pattern mining and outlier detection.

The traffic flow patterns on a road network are referred as hot routes to discoveran insight into the area’s traffic patterns [84]. The authors propose a density basedalgorithm FlowScan to discover hot routes in a network. The algorithm finds routesin which the edges need not be adjacent but in a sequence with vehicle traffic abovea certain threshold. Two edges are considered traffic density-reachable if they are near(according to some threshold) and they share common traffic. It is a clustering algorithm

67


Property MeaningKemeny constant Average travel time on road

Perron Eigenvector Congested roads in the networkFirst Mean Passage Times Average travel time from origin to destination

Table 2.5: Properties of Markov Chains Describing Road Network Properties

which clusters edges which are traffic density-reachable.

In [85], the authors estimate the most popular route for a query with a start locationand a destination. The authors use the trajectory data to mine the transfer network(transfer nodes and edges), by clustering of trajectory points to find intersections basedon density and direction constraints. They use the turning probability of each nodein the network into an Absorbing Markov model to compute the transfer probabilities

which are essentially popularity indicators of nodes in the network. Finally, using thesepopularity indicators the authors propose an algorithm to find the most popular routebetween two given locations.

2.4.4 Prediction Models

Xu et. al. [86], propose a route planning model in which current traffic data to used tobuild and predict congestion regions which need to be avoided. They use the currenttraffic update to find the congested edges which have the current travel speed much lessthan the allowed edge speed. These edges are clustered to build congestion regions.They estimate the impact of congested regions on nearby links or edges by consideringthe turning statistics (turning probability from historical data). The next step is to findthe top-k intermediate destinations in the moving direction of the final destination. Analgorithm is proposed which finds the final route using these intermediate destinationswhile monitoring and updating the congested regions.

In [87], a model based on Google’s Page Rank algorithm using Markov chains hasbeen proposed for road networks. They use properties of markov chains to predicteffects of road network modifications and to estimate the properties like average traveltime. Table 2.5 shows the property of markov chain and its utility for road networks.

68


2.5 Ride Sharing

2.5.1 Definitions and Variants

Ride Sharing is the sharing of car journeys such that a driver picks up people and clubroutes along with people while going to his or her destination. The drivers intend ontaking the trip whether or not they can find an appropriate passenger (or multiple pas-sengers) to share the ride with. The monetary objective of ride sharing is not to makea profit but cost sharing. The static version of ride sharing requires the users to makea pre-arrangement by posting about their trips in advance. Car pooling is a form ofstatic ride sharing. Dynamic ride sharing on the other hand is a service that requires anautomated system facilitating one time shared trips for drivers and passengers in realtime or on short time notice [12]. It is a location based service which is facilitated viasmart phones through the internet available on the phone and by reporting their locationthrough the phone’s geographical information system.

There are different variants of ride sharing. In [17], the authors present a surveyproposing a classification according to the different aspects of existing ride sharing sys-tems.

• On the way or Detour: In the on the way ride sharing, drivers only pick passen-gers whose source and destination are on the way of their original route. In detourbased ride sharing the drivers take detours to pick up passengers within certaintime limit of distance from the original route.

• Partial or Inclusive: Partial ride sharing also considers passengers whose eithersource or destination are not within the driver’s trip. Therefore, ride sharing withone driver only offers partial trip to the passenger. Inclusive ride sharing onlyoffers trips to passengers whose both source and destination lie within the driver’strip.

• Single or Multiple passenger: The drivers can decide to pick up only a singlepassenger or multiple passengers according to the vehicle capacity.

Our work lies in the dynamic real-time ride sharing category where the drivers takedetours to pick multiple passengers with inclusive passenger trips.

69


Time Incentive Model Facilitator1939-1945 (WorldWar II)

Rubber Car pooling clubs Bulletin boards

1970 (Oil crisis) Oil Car pooling, parkand ride

Employer spon-sored

1990’s Traffic Car pooling, parkand ride

Employer spon-sored

2010 Traffic, environ-ment, cost

Dynamic ride shar-ing

GPS, Wi-Fi, smartphones, LBS

Table 2.6: History of Ride Sharing

2.5.2 History and Trends

In Table 2.6 we present a brief history of ride sharing in the US [8, 88]. Ride sharingstarted as car pooling clubs at the time of World War II. A ride sharing club programwas created so that a car is shared by four people. The efforts were made by the USOffice of Civilian Defense to conserve rubber for the war. Factories and companies fa-cilitated the car pooling club matching via bulletin boards. Ride sharing became popularagain in the 1970’s due to the energy crisis at that time. It was facilitated by employersponsored ride sharing, HOV and park and ride facilities (car parks with connections topublic transport). During the 1990’s ride sharing resurfaced with telephone based ridematching programs leading to online ride matching services by late 1990’s. Since 2005,ride sharing has gained momentum with the use of technology enabled devices. Theobjective to ride share are now congestion, environment and an opportunity to share thetravel cost. Several ride sharing applications such as Lyft, Uber, Avego have gainedpopularity.

2.5.3 State of the Art

Web and mobile based applications such as coseats, Zimride, flinc provide static orpre-arranged ride share services. Users pre-register their trip with identity information(such as phone number, facebook id) and the application matches potential ride sharingoptions. Zimride uses facebook id’s to create a user profile and compute a detour asthe cost to pick a passenger. The site matches people within a person’s social networkwho work at the same place or have mutual facebook friends. These applications make

70


arrangements for static ride sharing and do not have an implicit mechanism to handlethe participant’s trust, privacy and safety.

In [17], the authors propose 6 classes of ride sharing according to the different cri-terion identified. They highlight that the use of ride sharing has decreased by almost10% in the past 30 years. They identify three major challenges for ride sharing ser-vices: design of attractive mechanisms (pricing, incentives), proper ride arrangement(user profile information, multi-hop rides), and building of trust among travelers in on-line systems. Our work lies in the dynamic real-time ride sharing category where thedrivers take detours to pick multiple passengers with inclusive passenger trips. How-ever, ours is a greedy optimization where the central system is only responsible to starta negotiation between the possible ride share candidates. This approach is not part oftheir classification. Our work is focused on privacy, trust and safety of the users whichwe believe will help increase the ride sharing uptake.

In [12], the authors present a detailed literature review of the optimization chal-lenges of dynamic ride sharing systems. The ride sharing problem is usually modeledas an optimization to minimize the system wide travel miles [89, 90] or travel time,or maximizing the number of participants [90]. The dynamic ride sharing problemwith multiple passenger pick up is NP-hard [24, 25]. In our work, we do not optimizeany system wide metrics but present a greedy optimization approach where individualdrivers and passengers optimize the path choices depending on their own for cost, timeor other preferences. This presents more control over the choices to the users of thesystem which we believe will act as an incentive and help in increasing the extent ofride sharing.

[91] highlight the importance of trust, convenience and incentives to increase theuptake of ride sharing. They conducted a survey to understand the concerns of usersand reported that trust on co-passengers and the ability to choose the co-passengers areamong the important factors. In most of the current ride sharing work the the driversand passengers have to agree to the ride sharing trip suggested to them by the serviceprovider. Our work addresses the trust and privacy concerns of the users while providinga solution where the users choose their ride sharing partners based on their preferences.

Carpooling is one of the variants of ride sharing where the objective is to group peo-ple who follow the same route at the same time regularly. Traditionally, car pooling isidentified as a mode of grouping people traveling to the same work location from their

71


homes [92]. In [92], the authors highlighted that one of the main reasons carpooling iseffective is not because of reduced congestion or pollution but the economic incentiveof sharing the trip costs. Ride sharing by the same individuals over a period of time canlead to static car pooling arrangements between consenting individuals. The identifica-tion of overlapping routes for travelers [93] by collecting user trajectories can be used tosuggest carpooling. Unlike carpooling, dynamic ride sharing accommodates one timeroutes which unexpected changes in schedule and no prearrangement.

In [94], the authors present a scalable collective transport system by clustering tripsto find users who can be grouped to travel collectively to minimize transportation costs.This form of collective transport promotes cab sharing, car pooling and ride sharing.This work focuses on a scalable algorithm for cab sharing and presents how it can beextended for ride sharing. However, the additional ride sharing constraints of user pref-erences are not considered. In addition, all the trip details are available to the centraltrusted server who then selects a potential driver out of a group of users and the pas-sengers (riders who are willing to travel without a car) and are traveling towards samelocation are then bundled to travel together. Further, since ride sharing drivers are in-dependent users they might refuse to accept the offered trip which is not considered inthis work. Our work focuses on the privacy issues that arise from sharing the user pri-vate trip information with the trusted third party and consider a greedy solution whichprovides better choice to the individual users.

In [95], the ride sharing algorithm fixes the driver trajectory and matches passengerswith stops on the trajectory. The authors present a R-tree based scalable ride sharingsolution in which the drivers are matched to passengers who satisfy their social andeconomical preferences and have their start and destination stops falling exactly on thedriver’s trajectory. We believe that restricting the driver to an exact trajectory limits theride sharing options, as the driver might be able to detour from the original trajectoryand travel within the time constraints to pick up passengers and increase revenue.

In [90], the authors present a dynamic ride sharing algorithm based on auctions.The passengers provide their location and time constraints to the system, the systemthen computes the list of matching drivers with additional information like their socialnetwork status. The passengers place bids to the preferred candidate drivers, and thedriver chooses the highest bid. They use social networking such as Facebook to selectcandidates for ride share within the network for dealing with privacy and trust issues.

72


Although, this work presents a dynamic ride sharing solution with global optimizationfor vehicle kilometers traveled, the solution is for a single driver matched to a single pas-senger. The time complexity for the single driver single passenger optimal assignmentis shown to be computed in O

(N3) by a modified version of the Hungarian algorithm.

Since we consider multiple passenger pick up the global optimization for which is NP-hard, the server side solution is computationally infeasible. In addition, in our systemthe server privacy is the main objective and the server knowledge of the individualslocation and time information is limited.

In [24], the authors propose a dynamic ride sharing system that groups together thedrivers that have similar routes. A driver is chosen in every group to give rides to theothers based on the cost computed for the driver and the passengers. The user modelingcomponent of the system is aware of the identity, exact location, time information andpreferences of every registered user in the system. The optimization component gener-ates clusters or groups of users that maximizes the global transportation objective. Theauthors prove that this optimization problem is identical to the well known NP-hard setcover problem and they implement a approximate greedy set-cover algorithm to gen-erate the ride sharing groups. This is a different approach to ride sharing where it isassumed that every individual candidate can be a potential driver or passenger and theirtrips are clustered to get ride sharing groups. In addition, complete user trip informationincluding location and time is available to the central server.

In [96], the authors propose a ride sharing system vHike which can be extended to adynamic system in the long run. vHike is available to only a closed community of userswhere every user has an identity profile. To ensure safety the course of every agreedride is tracked to note any unexpected deviations from the route. To build trust, usersare encouraged to rate each other after every ride. In vHike, the central trusted bodyhas access to all the sensitive location and time information for every user. The userprivacy is not considered in this system as the drivers broadcast their ride informationto passengers via Blue-tooth. After a passenger starts communicating with the driver,the driver can check the passenger profile to obtain the rating information and make adecision to accept or decline the offer. This approach differs from our work as we donot limit the users to a community and do not track the course of a ride.

73


2.5.4 Privacy and Ride Sharing

There are many mobile applications and websites providing ride sharing services suchas coseats4, Zimride5, ShareURRide6, etc. which are targeting this area. However,this kind of ride sharing is static as the trips are pre-arranged where the routes andtime constraints are known in advance. Avego7, Uber8, Lyft9 are modern dynamic ridesharing applications. However, Avego has its drawbacks in terms of decision makingoptions for its users and Uber has recently faced widespread backlash where it hadbeen banned in a number of cities in India, and Europe (Berlin) due to user safetyconcerns [15, 16]. Lyft and Uber have recently faced user privacy concerns as theycollect sensitive user trip information and their privacy policies have been questionedwith the amount of access to its employees [13, 14]. The user trip information can becollected and historical trajectory data is available to these service providers. Analysisof such highly precise spatio-temporal data when collected over an interval of time canlead to identification of individuals, their home/work locations, and their behavior.

In [97], the authors study the impact of design to incorporate privacy and securityof participants in a ride sharing prototype. They highlight the need of privacy basedtechniques for implementing privacy preserving ad-hoc ride sharing. They first discussprecise tracking versus imprecision, such that imprecise information is given for startinga conversation which is refined with a few people on a need to know basis. This is theapproach we have implemented in our negotiation based model. They discuss the impactof having moderators to refer participants versus a referral and reputation based system.We implement a reputation based rating model in which the participants are ranked bypeople traveling together reflecting the trust a participant builds over time.

In [98], the authors present a survey on dynamic ride sharing and found that mostpeople prefer anonymous locations (for example, parking lots, major intersections) forpickup and drop-off locations rather than giving their exact work/home locations. Ourwork, is motivated from this growing privacy concern which comes from revealing exactstart/destination locations.

4http://www.coseats.com5https://zimride.com/6http://www.shareurride.com.au/7https://carmacarpool.com/8https://www.uber.com/9https://www.lyft.com/

74

http://www.coseats.com

https://zimride.com/

http://www.shareurride.com.au/

https://carmacarpool.com/

https://www.uber.com/

https://www.lyft.com/


2.5.5 Other Shared Vehicle Transport

Shared vehicle transport lies in between private and public transport system. It is ademand driven travel arrangement where travelers share a vehicle either simultaneously(ride sharing, hitch hiking) or over a period of time (bike sharing).

2.5.5.1 Hitchhiking

Hitchhiking is a form of shared transport where a rider tries to get a ride (lift) with acar or truck driver (usually stranger). It is also known as lifting or thumbing or hitchingor autostop. Unlike ride sharing it is usually free and the passengers do not have muchchoice. The incentives for hitchhiking for a driver are mostly social such as goodwillexchange, an opportunity to travel with a stranger and form new friendships. It is con-sidered an adventurous way of traveling for the passenger as it involves uncertainty interms of time and travel experience. Apart from uncertainty, one of the biggest concernin hitchhiking is safety which comes with traveling with a stranger. There is no re-search which focuses exclusively on hitchhiking in terms of computational challenges.Incase the safety and uncertainty challenges are dealt with using a central system withbackground checks, registered users, facilitated rides and a cost sharing mechanism tomaking it attractive it becomes the same as dynamic ride sharing.

Slugging (casual carpooling) is a form of car pooling and hitchhiking which is adhoc or dynamic and the incentive is time by use of faster HOV lanes or toll reduction.

2.5.5.2 On Demand Transportation (Vanpools)

The Dial-a-Ride problem (DARP) is a vehicle transport planning problem for planningpickup and drop off of passengers. The static DARP consists of service requests placedby customers in advance with time windows and a fleet of vehicles located at a cen-tral depot. The objective is to allocate tours to vehicles such that they service all thecustomers within their time windows while minimizing the overall distance traveled ornumber of vehicles used. The static DARP is known to be NP-hard [99]. In dynamicDARP vehicle requests are placed in real-time. When a request is accepted it must beincluded optimally in the current partial solution [100]. Unlike DARP, in dynamic ridesharing the drivers are independent and not employed by a transportation company and

75


they make their own greedy choices to pick up passengers. Further, they can join andleave the system dynamically at any point in the road network.

2.5.5.3 Taxi Sharing

Taxi sharing can be described as either a hire vehicle with fixed routes but withouttimetables. They depart times are usually governed by the number of occupied seatsand they might stop anywhere in between to stop or pick up passengers. They operatebetween high usage routes and are popular in developing countries. Another form oftaxi sharing is taxi ride sharing where the passengers can be clubbed together in theusual taxicab system by the centralized system and share the cost. The incentive forthe passengers could be cost sharing or availability. The taxi system incentive could becost by offering rides to a higher number of passengers and reduction of overall milestraveled by the cabs.

Taxi ride-sharing is a special case of DARP. In [18], a dynamic taxi ride-sharingframework has been proposed for GPS equipped taxi cabs in a city. The road network isdivided into grid cells with pre-computed distances between then and a spatio temporalindex of the taxis in each grid cell is computed. Every passenger query is processed bythe central scheduler who computed the candidate taxis for the passenger using its timeand location constraints. Once the list of candidate taxis is computed the insertion feasi-bility for the passenger is checked for each candidate taxi such that the time constraintsof the passengers already traveling in the taxicab are not violated. At every point oftime, only one query (passenger) is being considered for insertion. Our problem is morechallenging as every driver tries to pick multiple passengers in every round. In addition,since both drivers and passengers are independent they can have individual preferencesabout the ride sharing.

Investigating on demand ride sharing with taxis [101] the authors provide monetaryconstraints for passengers and drivers. The system makes sure that the passengers donot pay more than they would pay without a taxi share. In addition, any existing rideshare passengers are compensated for the additional trip time due to ride sharing. Over-all, the taxi driver makes more money despite the detour. However, in a greedy systemsuch as ours these decisions are made by individual drivers and passengers. The driverspresent the cost along with other parameters such as trip times and starting times to the

76


passengers of the chosen path during the negotiation (Section 3.5.1). The passengers arethen free to choose a driver according to their own preferences which could be lowercost, fastest trip time, earliest starting time or any other personal preference. We showthrough experiments that passengers who can share a ride are significantly better off interms of cost than taking a taxi (Section 3.6.4.7). The drivers ensure for themselves thattheir chosen path has a non negative path cost (Section 3.2.2) in the initial computationso that the chosen detour does not cost more than their original non ride sharing route.Further, since the path once negotiated is not changed while taking the actual route, thepassengers are already aware of their trip time and hence there is no further compensa-tion. Currently the passengers sharing the ride do not split the trip cost. In the future, wewould like to investigate a monetary model which can compute the trip cost dependingon the number of passengers sharing a trip and share the trip cost proportionally.

2.5.5.4 Bike Sharing

Bike share is a service in which bicycles are available for shared use on short term basis.Individuals borrow bikes from bike stations and return them on the basis of time of use.This service can be based on a small cost or even free and the incentive is to provide analternative to motorized transport in urban areas. It is also used to connect passengersto public transport.

Existing research aims at studying the mobility patterns for urban areas to improvethe existing bike sharing systems. In [102], the authors study the mobility patternsof users to predict the number of bikes available at any time on a particular station.In [103], the authors derive the activity patterns of users by analyzing the operationaldata from bike sharing systems to reveal imbalances in the distribution of bikes at sta-tions.

2.5.6 Popular Ride Sharing Services

The real time services such as Uber, Lyft, SideCar are classified as ride sharing ser-vices. However, as we differentiated earlier these services have dedicated drivers andnot have drivers actually commuting to a destination. This implies that these servicesare a modern real time replacement (or competition) to traditional taxi services and donot actually bring up car occupancy rates by utilizing the empty seats in existing on read

77


vehicles. Therefore, we classify them as real time ride sourcing services. They do offercertain benefits such as on demand, lower cost and convenience, however they do nothelp in bringing down congestion and travel time. Moreover, they could even add morevehicles at peak hours adding to the existing vehicles on road and thereby increasingcongestion levels.

In Table 2.7, we compare two of the popular real time ride sourcing services.

78

Chapter 2. BackgroundTa

ble

2.7:

Popu

larR

ide

Sour

cing

Serv

ices

Ube

rLy

ftD

rive

rsD

edic

ated

regi

ster

eddr

iver

sD

edic

ated

regi

ster

eddr

iver

sSh

ared

Rid

ew

ithM

ultip

lePa

ssen

gers

Ube

rPoo

llet

spe

ople

shar

eri

des

alon

gth

eirw

ayto

dest

inat

ion

Lyft

Lin

e-S

hare

the

ride

with

othe

rsgo

ing

the

sam

ew

ay

Priv

acy

Com

plet

eus

erlo

catio

nda

taco

llect

ed.

Driv

ers

-ded

icat

ed,m

onito

red

GPS

,so

nolo

catio

npr

ivac

y.Pa

ssen

gers

-Rev

ealc

ompl

ete

loca

tion

and

time

info

rmat

ion

tose

rvic

epr

ovid

erto

geta

ride

a

Priv

acy

rela

ted

issu

es[1

3,14

]--

-Use

rdat

aco

llect

ion

and

stor

age

-Em

ploy

eeac

cess

tous

erin

form

atio

n

Com

plet

eus

erlo

catio

nda

taco

llect

ed.

Driv

ers

-ded

icat

ed,m

onito

red

GPS

,so

nolo

catio

npr

ivac

y.Pa

ssen

gers

-Rev

ealc

ompl

ete

loca

tion

and

time

info

rmat

ion

tose

rvic

epr

ovid

erto

geta

ride

b

Priv

acy

rela

ted

issu

es--

-Use

rdat

aco

llect

ion

and

stor

age[

23]

-Em

ploy

eeac

cess

tous

erin

form

atio

n

Safe

ty

-Ins

uran

cepe

rtri

p-B

ackg

roun

dch

eck

ford

river

s-O

ptio

nto

anon

ymiz

eph

one

num

bers

Safe

tyIs

sues

-Ban

ned

inci

ties

such

asB

erlin

,Del

hi[1

5,16

]du

eto

safe

tyis

sues

.

-Ins

uran

cepe

rtri

p-B

ackg

roun

dch

eck

ford

river

s-V

ehic

lein

spec

tions

Adv

ance

Boo

king

Itis

are

altim

ese

rvic

eso

noop

tion

for

adva

nce

book

ing

Itis

notp

ossi

ble

tosc

hedu

lea

Lyft

pick

upin

adva

nce

Vehi

cle

Opt

ions

Diff

eren

tver

sion

sfo

rdiff

eren

tsta

ndar

dof

cars

(Ube

rX,U

berB

lack

)Ly

ftan

dLy

ftPl

usdi

ffer

onth

enu

mbe

rof

ride

san

dno

tthe

stan

dard

Paym

ents

Cas

hles

s,th

roug

hre

gist

ered

onlin

epa

ymen

tinf

orm

atio

nC

ashl

ess,

thro

ugh

regi

ster

edon

line

paym

enti

nfor

mat

ion

Rat

ing

and

Feed

back

Pass

enge

rsan

ddr

iver

sra

teea

chot

hera

fter

ever

yri

de.

Pass

enge

rsan

ddr

iver

sra

teea

chot

hera

fter

ever

yri

de.

Geo

grap

hies

312+

Citi

es,5

7C

ount

ries

59+

US

Citi

esD

isco

unts

Ref

erra

lsys

tem

,dis

coun

tcod

esR

efer

ralp

rogr

am,d

isco

untc

odes

a htt

ps:/

/new

sroo

m.ub

er.c

om/u

bers

-dat

a-pr

ivac

y-po

licy

/b htt

ps:/

/www

.lyf

t.co

m/pr

ivac

y

79

https://newsroom.uber.com/ubers-data-privacy-policy/

https://www.lyft.com/privacy


80

Chapter 3

Privacy Aware Dynamic Ride Sharing

The literature survey (Chapter 2) highlights the privacy concerns of users in current ridesourcing services and shows that the existing research models are not built ground upto tackle privacy, safety and trust for the users. In this chapter, we present our nextgeneration dynamic ride sharing model which provides location privacy to the users,incorporates trust and personalization and regards both drivers and passengers as inde-pendent decision making entities.

• Location Privacy: We present ‘Match Maker’, a negotiation based dynamic ridesharing model which hides exact location information data of system participantsfrom the central service provider. It also limits the information gained by partic-ipants by sharing imprecise location information which is refined with a selectedfew candidate riders in every negotiation round.

• Trust and Personalization: Trust is the degree to which a system user can relyon the integrity, ability (driving ability) or behavior of another user. It determineswhether the two candidates would like to consider each other for a possible rideshare. We develop a rating mechanism which incorporates positive and negativefeedback and also penalizes malicious riders. The candidate matching is person-alized according to the user preferences.

• Local Optimization: Unlike taxi service, both drivers and passengers are inde-pendent clients in a ride sharing system. This implies that the drivers can alsohave their own personal preference (which can change per trip) for the optimal

81

Chapter 3. Privacy Aware Dynamic Ride Sharing

path. Global optimal solution might not be the best preferred solution for a driver.Further, global solution computation for dynamic ride sharing with time windowsis NP-Hard[24, 25]. We present an efficient greedy local recursive ellipse-basedalgorithm to compute an optimal driver path.

3.1 Overview

Car occupancy rates (travelers per vehicle) are currently very low in most countries,for example, in Australia average car occupancy is between 1.15 and 1.251. It hasbeen estimated that the average unit costs of congestion (total avoidable congestioncosts) for metropolitan Australia is $11.1 billion due to the delays incurred. The lowoccupancy rates imply that more than 70% (assuming an average number of car seatsas 4) of this $11.1 billion congestion cost are generated by the empty seats. Therefore,accounting for these empty seats in cars will help reduce traffic congestion therebyreducing emissions and travel costs (fuel and tolls). Ride sharing can be an effectivesolution to counter the problem of increasing traffic jams at peak hours in cities.

Ride sharing has different variants (Section 2.5.1) where the drivers can either re-quire passenger trips to be inclusive (both passenger source and destination are part ofdriver trip) or partial (either the passenger source or destination or both fall outside thedriver trip). Further, the routes can either be the original unchanged driver routes or caninclude detours to pick up passengers. In addition, the drivers might choose to sharerides with a single or multiple passengers [17]. We implement dynamic ride sharingwhere the drivers pick up multiple passengers and take detours as long as the detour iswithin their travel budget. Our shared rides are inclusive where the passenger sourceand destination need to be part of the driver’s trip.

As discussed in Section 2.5.4, the existing ride sharing services face significant userprivacy and safety concerns. The current services are not built with the premise of pri-vacy and address privacy issues on top of their existing models. Applications such asflinc, Zimride connect users through social networking (Facebook) and use their net-work information to filter candidates and address safety concerns. Few existing systemsaddress the privacy and safety concerns as a last step before matching drivers to pas-

1AustRoads - (http://algin.net/austroads/site/Index.asp?id=87)

82


sengers (e.g.: coseats has a female only version). The significant difference betweenthe uptake and abundance of these services is due to the privacy and safety concerns ofusers [104]. To increase the uptake, our model is designed from ground up to provideprivacy, security and ensure trust to users. We present a negotiation based model for pri-vacy protection where the drivers and passengers initially disclose imprecise or vagueinformation to the system participants. The information is refined in every negotiationround with fewer selected candidates.

In contrast to on demand transport systems such as taxi sharing service [18], dial-a-ride service [19], ride-sharing has the added complexity of managing user trust, privacyand safety. In a taxicab service the users place their trust in the central operationalbody which manages all the taxis and their registered drivers. They have additionalsafety requirements such as some cities have all taxicabs with installed cameras and thetaxicabs are monitored centrally by GPS. A taxi system has accurate information aboutthe taxi (drivers) locations and occupancy which is used to make decisions and diverttaxis to pick up additional passengers.

However, in a dynamic ride sharing system both the drivers and passengers are inde-pendent users who can dynamically register for trips. A system wide rating based mech-anism is needed to indicate the trustworthiness of each system participant. In addition,privacy is a greater concern as individuals might not want to disclose their identitiesand exact location and time information to drivers (or vice versa) unless an agreementof ride sharing has been reached. The drivers and passengers can accept or decline aride share offer depending on their preferences. The algorithm for dynamic ride sharinghas to provide multiple driver choices for every passenger trip as the driver might notagree to the suggested passenger and vice versa. Unlike on demand transport systems,which have a fixed number of drivers running in the system, the number of drivers inthe dynamic ride sharing system keeps changing. This makes it difficult to predict thedriver availability with high probability.

The constraints in ride sharing are more challenging than that of continuous detourquery problem [20] in which the objective is to find the shortest path between two lo-cation with stop-overs in between. In dynamic ride sharing every stop-over can havemultiple passengers, and every passenger has a corresponding destination stop-over.Committing to one passenger involves committing to go through another stop-over. Inaddition, the location and time constrains of the passengers also need to be considered.

83


In this chapter, we present a privacy preserving dynamic ride sharing system.Drivers and passengers register for a ride share dynamically and are matched basedon their preferences and ratings. The driver ellipse describes the path area which thedriver can travel to reach the destination within the defined time budget. We use thedriver ellipse to select the candidate passengers and present a recursive ellipse searchspace reduction based optimal path computation algorithm.

We describe two models to establish trust between drivers and passengers: eBaymodel and Match Maker model. The models differ in the amount of information re-vealed to the central trusted body. In the eBay model the central trusted server hasaccess to complete identity, location and time constraints information for all user. Thisis the standard model used by most applications to implement ride sharing. It makesthe ride sharing decisions and communicates the most optimal cost route to the drivers.We propose the Match Maker model, which is stronger in terms of privacy as impreciselocation and time information is shared with the central trusted body. Its role is to onlycheck the identity and form a one time link between the candidate drivers and passen-gers. The location information is negotiated by starting with a larger negotiation areawhich is refined in every iteration for successful negotiations. We follow the idea ofobfuscation [26] where the location information in deliberately hidden or degraded byusers to protect their privacy. We propose to use hashed location information exchangebetween the drivers and passengers. The eBay model optimizes the global system widemetrics whereas the Match Maker model computes the greedy solution which is optimalin terms of the negotiating driver and passenger preferences. We present and comparestrategies to negotiate the location data. We describe the attack model for the systemand evaluate the vulnerabilities of the eBay and Match Maker models.

An optimal solution to the ride sharing problem either minimizes the system widetravel miles [89, 90] or travel time, or maximizes the number of participants [90]. Thedynamic ride sharing problem with time windows is NP-hard [24, 25]. In our work, wedo not optimize any system wide metrics but present a greedy optimization approachwhere individual drivers and passengers optimize the path choices depending on theirown preferences for cost, time or travel distance. This presents better choice to the usersof the system which we believe will help in increasing the uptake of ride sharing.

To the best of our knowledge we are the first to present a system designed to ensureprivacy, safety and trust for dynamic ride sharing. In summary, our contributions in this

84


chapter are as follows:

• We propose a recursive driver ellipse based model that reduces the search spacebased on the driver’s time constraints and computes the optimal driver path.

• To preserve privacy and safety, we propose the Match Maker model and compareit with the eBay model. We present an attacker model and compare the privacyimplications.

• To ensure trust we devise a rating mechanism to incorporate feedback, driverbehavior and similarity with passengers.

• To ensure privacy we present three strategies to negotiate location informationbetween the drivers and passengers.

• We perform extensive experiments on real road networks.

The remainder of this chapter is organized as follows: Section 3.2 describes thefundamental concepts of our system. In Section 3.3 we present the system architectureand describe and compare two models for dynamic ride sharing. In Section 3.4 wepresent the attacker model for the ride sharing system and compare the models. InSection 3.5, we describe the optimization problem for the drivers and the passengers.We present the experimental results in Section 3.6. The related work is described inSection 3.7. We conclude in Section 3.8.

3.2 Fundamental Concepts

In this section we describe the terminology (Table 3.1), basic concepts and system goalsfor the ride sharing service.

3.2.1 Terminology

A road network graph is a directed graph G = (N,E) where N(n1,n2, . . .nm) is the setof nodes and E(e1,e2, . . .en) the set of connecting directed edges. The nodes representthe intersections of a real road network and edges are the connecting road segmentsassociated with the corresponding pair of nodes.

85


Driver. A driver is a person who intends to share a ride while traveling from astart node (ns) to a destination (ne) within a time window (tw = [ts, te]). The numberof people that can be picked up are decided by the vehicle capacity (k). Every driverhas a rating in the system (r). Depending on the vehicle and the route, every drivercan have a different pre-defined cost per unit distance (c) and expected revenue perunit distance (rev). The driver preferences (pr) denote the preferences for candidatepassenger selection and optimal path selection. A driver d, is represented as a 8 tuple.di = (ns,ne, tw,k,r,c,rev, pr) : ns,ne ∈ N; tw = [ts, te]

Pick up Point PuP. The Pick up Points (PuPs) are fixed points on the road networkwhere the drivers can pick up or drop off passengers. In our work, the pick up pointsare road network intersections. The set of possible PuPs is a subset of all nodes in theroad network (

{∪n

i=1PuPi}⊂ N).

Passenger. A passenger is a user who wants a ride to reach a trip destination (td)from a trip origin (to). The passenger negotiates with the drivers using PuP start andPuP end nodes (PuPs,PuPe) that are reachable from the trip origin/destination points.The time constraints for the passenger are defined by the earliest time (ts) agreed toreach the start location and the maximum time (te) to be dropped off at the destination.The passenger preferences (pr) denote the preferences for driver selection (age, gender,rating etc). Every passenger has a rating in the system (r). pi = (PuPs,PuPe, ts, te,r, pr)

Region. A region is a finite (bounded by a finite diameter) subset of Euclidean space.

Path. A path P is a network path which is a sequence of nodes ni0 ,ni1 . . .nik if thereis an edge connecting ni j and ni j+1 for j = 0,1 . . .k−1 and ni j ,ni j+1 ∈ N.

3.2.2 Concepts

Reachability of a PuP (RPuPi): Since the pick up points are fixed we can compute thenetwork Voronoi diagram [105] for the road network. The network Voronoi diagrampartitions the road network graph into regions by using the shortest path road distanceinstead of euclidean distance (d(ni,n j) is the network road distance between the twonodes). The network Voronoi cell (NVCPuPi) for a pick up point (PuP) is the region(collection of road segments and corresponding nodes) such that every point in thisregion is closest to this PuP than to any other. The system is flexible to include anynumber of pick up points. One strength of the system is that it can preserve privacy by

86


Notation Definitiondi.ns Starting node for driver didi.ne Destination node for driver didi.ts Start time for driver didi.te Final destination reach time for driver didi.k Driver maximum vehicle capacitydi.r System rating for driver dipi.r System rating for passenger pidi.c Cost per unit distance incurred by driver di

di.rev Revenue per unit distance charged by driver didi.pr Driver di preference parameters for passenger selectionpi.pr Passenger pi preference parameters for driver selectionPuP Pick up point for passenger pick ups/drop offpi.to Starting point for passenger pipi.td Destination point for passenger pi

pi.PuPs Starting PuP node for passenger pipi.PuPe Destination PuP node for passenger pi

pi.ts Earliest starting time for passenger pipi.te Latest reach time for passenger pitb Time budget (te− ts)

Table 3.1: Terminology

87


having fixed pick up points since a pick up point only discloses an area and does notdisclose individual origins.

NVCPuPi ={

n ∈ N|d(n,PuPi) ≤ d(n,PuPj)∀ j 6= i}

(3.1)

Every node ni has a set of corresponding locations which is conceptually a region de-fined as its catchment area C(ni) = {li1, li2, . . . lin}. The reachability of a PuP is thesum of the number of locations in its corresponding network Voronoi cell. If a passen-ger’s start or destination PuP is known then the reachability of the PuP is the number oflocations the passenger can be assumed to be leaving from (or going to).

RPuPi ={

C(n)|∀n ∈ NVCPuPi}

(3.2)

Sensitivity: Sensitive data for an individual is data that needs to be protected, such asidentity, address and medical condition. The sensitivity of a location (sl j) is the extentto which an individual typically would be concerned about its disclosure. Generally, aresidential location is considered to be more sensitive than a commercial complex whichis more sensitive than a parkland. If based on the location type a sensitivity value (higherfor more sensitive) is assigned to every location in the network then the sensitivity of aregion can be computed as a function of the sensitivities of the locations in the region.For a pick up point it will be the function of the sensitivities of the locations in itsnetwork Voronoi cell. The function can be mean, maximum, median, min etc. and eachone captures a different aspect. In this article, for the sake of simplicity we choose thefunction to be average.

However, the actual assignment of sensitive values to locations only affects the ne-gotiation process and not the actual trips. Whenever we have a few points highly sensi-tive which need to be protected then maximum is a choice. However, when the pointshave more or less similar sensitivity then average is the choice. We capture the idea ofdifferent values representing sensitivity by using an assignment function as follows:

SPuPi = mean(SNVCPuPi );SA = Si|i ∈ A (3.3)

Privacy: Disclosure of being present in a highly sensitive region would imply thatthe individual’s privacy is at a greater risk than being present in a less sensitive region.

88


te

t (t b)

me e bu

dget

tim time

P btsF1 (ns) F2(ne)

2fspace

s

a

Figure 3.1: Driver Ellipse: F1 (source) and F2 (destination) are the two foci of theellipse. a and b are the respective major and minor axis. For any point P in the ellipsePF1+PF2 < 2a.

One model for privacy is that it is inversely related to sensitivity. Other models are alsoconceivable (1/S2), but for the sake of simplicity we choose the following:

PPuPi =1

SPuPi(3.4)

Driver Path Area Ellipse: An important concept is to determine how far a drivercan travel. One way is to compute the detour ellipse based on the extra time the driverdecides. We now describe the driver ellipse computation.

The time budget for driver determines the area the driver can travel while reachingthe destination in time. The driver ellipse (Figure 3.1) is a collection of all the possibletrajectories that the driver can take from start to reach the destination location withinthe time budget. The start and destination locations are the two focii of the ellipse andthe spare time left after taking the shortest path between the source and destination isused to determine the axis of the ellipse (assuming a constant travel speed). The driver’sspace time prism is the set of all points (xi,yi, ti) where (xi,yi) is the location at timeti where the driver can possibly be during a time budget while traveling between thesource and destination. Every feasible route is inside this space but this does not imply

89


P1P1

P2src dest

Figure 3.2: Path Cost: The driver path from src to dest is used to compute the cost (cdiP ).

that every possible route planned in this space is feasible.

The path area ellipse is the space projection of the driver’s space time prism [106].The driver ellipse (Figure 3.1) can be represented by 2 f the focal length or the distancebetween driver’s source and destination; a,b and the time budget tb ( f ,a,b, tb). a andb are the respective major and minor axis. The time budget (tb) is used to compute theextra distance the driver can travel. The extra time td

max that the driver can travel canbe computed by using the shortest path time tsp for the distance (ns,ne). td

max = tb− tsp.Assuming a constant speed s, we can compute (a− f ) = (td

max× s)/2. The other axisof the ellipse is b =

√a2− f 2.

Path cost: Path cost (cP) for a driver (di) is computed as the difference of the cost(fuel, toll) for the driver to travel the path and the revenue generated by picking passen-gers up-to the driver’s vehicle capacity (k). cdi

P = ci×dist(P)− (revi×∑kj=1(dist(pi j))

where dist(pi j) is the distance traveled with the pthj passenger. For example (Figure

3.2), if path length is 7 units and cost/unit distance is 1 then ci×dist(P) = 7. The twopassengers the driver picks up travel for 4 units each. If the revenue/unit distance is2 then the overall path cost = 7− 2× (3+ 3) = −5. A negative path cost indicates apositive driver revenue.

3.2.3 System Goals

We aim to achieve a high quality ride sharing service that has a strong privacy protectionmechanism for the users. We now list the system goals which need to be addressed foran efficient privacy preserving dynamic ride sharing system.

Privacy: The privacy requirements of a user of a ride sharing service can be toprotect one or more of the following: identity, location (start, destination, route), timeinformation, matching (who travels with whom). In general, two privacy models arefeasible for ride sharing:

• Drivers post their trip details in a private manner and the passengers choose.

90


• Passengers post their needs for a trip and the drivers match them to their route tomake a pick up decision.

Privacy in a system can be driver only, passenger only or both. An example where wehave a passenger only privacy is taxi ride sharing where the driver’s do not have any pri-vacy concerns. If we have a set of fully public places for pick up and drop then in sucha system the privacy concerns of passengers might be considered as negligible. How-ever, in our system both drivers and passengers are concerned about their privacy. Thisrequirement asks for a solution that can handle privacy while facilitating the requiredinformation exchange for ride sharing. Usually, one party volunteers information first tostart a negotiation. However to overcome this limitation, in our system we use a trustedthird party (TTP) to start a negotiation which collects trip intents from passengers andtrip offers from drivers. To further enhance the privacy of the users no clear text (hashedrequirements) information is exchanged with the TTP.

Trust, Personalization & Security: Trust is the degree to which a system user canrely on the integrity, ability (driving ability) or behavior of another user. It determineswhether the two would like to consider each other for a possible ride share. A passengercan have different a number of preferences for selecting the driver such as driving abil-ity, gender, feedback reports et cetera, according to which the ride sharing matching ispersonalized. After every successful trip the passenger, driver can rate each other whichcan be used to update an individual’s feedback rating. The system goal is to ensureproper authentication and ratings for all users. In case of malicious drivers or passen-gers the system should be able to identify and penalize them so that they cannot misusethe system in the future. We propose a rating mechanism based on user preferences insection 3.3.4.

Optimal matching: The optimal matching problem can be global or local. In theglobal optimization problem, the objective is to reduce the overall vehicle km or satisfythe ride sharing request for the maximum number of users. A system could prioritizeeither of the objectives. However, this might not be the optimal solution for every indi-vidual driver and passenger independently. In the local solution every driver and pas-senger make the greedy optimal choice for themselves which might not be the optimalachievable solution system wide. The driver passenger matching needs to be optimalfor the users. For drivers this implies the maximum revenue for the chosen path. For a

91


passenger this could mean the smallest cost or the fastest trip. Both the passengers anddrivers can have different preferences for an optimal solution (see Section 3.5).

3.3 System Architecture

We introduce two models based on the degree of information shared with the TrustedThird Party (TTP) server. For implementing ride sharing, every system handles thefollowing information: driver trips (di = (ns,ne, tb,k,r,c,rev) ), passenger trips (pi =

(PuPs,PuPe, ts, te)), user identities, ratings, matched trips and cash exchange mechanismdetails. The strength of a privacy aware system depends on how this information ishandled.

In the first model the TTP is a fully trusted server who controls the entire ride shar-ing process. We call it the “eBay model” as like eBay it has registered users who canbe drivers or passengers. It is responsible for the cash exchange and maintains ratingsfor the system users and does not disclose the exact identity to either party unless a ridesharing agreement is met. This system aims for a global optimal solution for the reg-istered trips in the system. The other extreme of the fully centralized model would bethe fully decentralized peer to peer model where drivers and passengers communicateindependently (through Blue-tooth etc [96]) and establish ride sharing. The responsi-bility for cash exchange, ratings is with the individuals. However, in a decentralizedsystem identifying malicious users and maintaining privacy and user safety is a majorchallenge. The second model we propose is the “Match Maker” model which is a hy-brid between the two scenarios discussed above. We call it the Match Maker model asin this model the TTP is only responsible for matching the relevant drivers and passen-gers. It does not make the optimal trip decisions, but acts as a Match Maker based onthe hash of PuPs in the regions provided by the drivers and passengers. The TTP knowsthe identities of the users and is responsible for ratings updates and cash exchange. Incontrast to the eBay model where the ride sharing decisions are made by the TTP, in theMatch Maker model the decisions are made by the individual negotiating drivers andpassengers. The model generates a greedy solution for the ride sharing problem and theoverall system wide travel distance (time) might be higher than that generated by theeBay model. We now discuss the two models in detail.

92


eBay model Match Maker modelUser Iden-tity

TTP TTP

Location TTP-exact start and destinationinformation

TTP-Location hash

Time Win-dow

TTP-exact travel constraints forall users

Information shared only be-tween negotiating passengersand drivers

Matching TTP-maintains the final pairings TTP-maintains the final pairingsRatings TTP-maintains and updates TTP-maintains and updatesOptimalPath Com-putation

TTP (Global optimization) Individual driver

CommunicationTTP-communicates the optimalroute and selected passengerdriver pairings

TTP-initial match making basedon hashes

CashExchange

TTP TTP

Table 3.2: Ride sharing Models

3.3.1 Model 1: eBay Model

The eBay model is the “Know All and Everyone” model in which the TTP server con-trols the entire information flow between the drivers and passengers and is responsiblefor providing a safe and optimal ride sharing service. We describe this model to empha-size the need of a model focused on privacy, trust and security. The TTP’s informationaccess and responsibilities are described in Table 3.2. Since the global optimizationobjective for ride sharing to minimize the cost or maximize the number of shared tripsis NP-hard [24, 25], we prune the search space using the concept of driver ellipse as aheuristic. This heuristic reduces the computations while selecting the best driver route.

3.3.1.1 Procedure

Initialization: All the drivers and passengers register with the TTP with one time verifiedidentity credentials.

For dynamic ride sharing, the drivers register their trip information

93


TTP

Figure 3.3: Match Maker Model

with the TTP 〈di,di.ns,di.ne,di.tb,di.pr〉. The passengers register their trip⟨p j, p j.PuPs, p j.PuPe, p j.ts, p j.te, p j.pr

⟩.

The next step is to compute the driver ellipse (Section 3.2.2). A candidate passen-ger’s start and destination pick up points are inside the driver ellipse. The candidatepassenger list is pruned as according to their rating constraints. The optimal route isfinally calculated and communicated to the driver and the identity information is ex-changed between the negotiated pairs. The cash exchange and the rating updates arehandled by the TTP as well.

3.3.2 Model 2: Match Maker Model

The Match Maker model is the “Know Everyone” model, it reduces the implicit trustplaced in the TTP (Figure 3.3). The TTP no longer knows the exact locations and timeconstraints for the users. The TTP’s data access and responsibilities are described inTable 3.2.

3.3.2.1 Procedure

Initialization: To ensure user privacy each user is registered with the TTP using onetime verified identity credentials.

The drivers initially register their trip with the TTP asking passenger informationfor their driver ellipse. The drivers and passengers once authenticated (similar to Ker-beros [107] authentication) register their trip with imprecise location information and

94


excluding their time information. The TTP uses this information to match the driversand passengers and issues an encrypted unique pairing identifier (UID) per matchedpair using the TTP’s private key. This unique pairing ID is encrypted with the respec-tive public keys of the matched driver and passenger. Thereafter, the matched driversand passengers identify the pair information using the UID that is unknown to any otherparty. Further, the rating updates are sent as encrypted messages to the TTP includ-ing the matched drivers and passengers UIDs, respectively. This resembles a securemultiparty computation [108] where the parties (matched drivers and passengers) withprivate inputs (such as rating) wish to compute some joint function of their inputs (suchas updates of pair wise rating). The security and correctness of the system is preservedby a trusted external party (TTP who issues the UID).

1. TTP generates a UID for the passenger (Pi) and the driver (D j) and encrypts itwith its private key (T T Ppr); UID = T T Ppr(Pi,D j).

2. The TTP now sends this UID to the passenger (Ppubi ) and driver (Dpub

j ) encryptedby their respective public keys.

3. The driver and passenger decrypt the UID with their private keys (Dprj ,Ppr

i ) re-spectively.

4. For any further updates such as rating the passenger and driver encrypt theirmessages (msg) along with the UID using the TTP’s public key (T T Ppub);T T Ppub(msg,UID).

The PuPs in the ellipse are hashed before being disclosed to the TTP⟨di,hash(Rdi),di.pr

⟩. The passengers register their trip with the hashes of the pick up

points in their start and destination regions⟨

p j,hash(p j.Rs),hash(p j.Re), p j.pr⟩. No

further time or location information is disclosed to the TTP. The driver gets hashed PuPinformation from the passenger but since the PuPs are inside the driver’s ellipse thelocation of the hashed PuPs is known.

Similar to our protection of user identities we also need the protection of locationidentities so that a driver cannot identify a passenger traveling to the same locationsbased on the passenger’s region hash between two consecutive days. To ensure this, themodel includes a hash generator which selects new keys for the pick up points (which

95


Privacy Privacy –– Region based selectionRegion based selectionPrivacy Privacy Region based selectionRegion based selection

Passenger – discloses the bigger region (rectangle) in the first round If passenger is still in candidate list then refines the region in every iteration

Figure 3.4: Privacy Based Negotiation

are known in advance) every day. After the initial larger hashed regions are disclosed tothe TTP and a connection is formed between the drivers and passengers, a one time key(using public key encryption) is exchanged between them. All further refined locationdata exchange is encrypted with this key disabling the TTP from learning any morelocation information about the users. The details of this mechanism are not in the scopeof this work.

Based on this information and the ratings for the users the server shares a list ofcandidate passengers with the drivers and vice versa. The server thus forms a link ofcommunication between the potential matches using a one time identifier. No furtherlocation and time information is revealed to the server as all information negotiatedbetween the drivers and passengers is hashed. However, the TTP still knows who iscommunicating with whom and the final pairs who decide to travel together. At the endof the trip the cash exchange and the rating updates are handled by the TTP.

3.3.2.2 Privacy based Negotiation

We propose an imprecision based negotiation model to ensure that only a few possiblecandidate drivers learn about the exact passenger locations. The amount of informationdisclosed to every possible match increases after every successful negotiation.

After the candidate passengers are known to the drivers, the drivers and passengersnegotiate with hidden or masked information (by providing a set of points rather thanthe exact start and destination points). This enables privacy based negotiation. It thisfirst step the potential candidates communicate with each other to find the riders theycould possibly travel with according to their location constraints. In the second stage

96


the drivers obtain the exact location and time information of the short listed candidates.This negotiation ensures that the drivers learn the exact location and time details for onlythose passengers who might be able to share a ride because their source and destinationpoints are inside the driver ellipse.

Initially the driver uses larger regions to represent the start and destination locations.For example: instead of providing home and office locations, driver provides the suburbregions for the two locations. The larger ellipse is generated by selecting two nodes(from the larger suburb) furthest from the ellipse center as the new focal points. Afterevery iteration the regions are refined to generate smaller ellipses. The passengers alsoprovide larger regions for their start and destination locations to the TTP. Once a nego-tiation starts, the passengers refine their regions for the matching drivers in every round.Although, passengers can increase their privacy by being less precise about their pickup points and providing larger regions to start the negotiation, their chance of finding ashared ride would be reduced. Similarly, a larger ellipse for negotiation would cause theTTP to propose more infeasible candidates for the driver. However, if people are lessprecise the computational cost for the system remains the same because the maximumnumber of potential matches remains the same (as they are bounded) and the maximumnumber of communication rounds also remains the same. This is because the maximumnumber of rounds the drivers and passengers can negotiate in depends on their travelbudget and not the size of their start negotiation regions.

The hash of all the pick up points in the larger ellipse are provided to the TTPserver in the first round. The TTP then matches the hash to the hashes provided by thepassengers and forms a communication link between the drivers and passengers whosehash sets intersect. After every round of successful negotiation the driver and passengersrefine their regions and continue negotiation. At the end of the negotiation every driver(passenger) has a list of candidate passengers (drivers) which is then used to find theoptimal path.

Figure 3.4 illustrates the two driver ellipses and passenger regions. In the seconditeration the passengers selected in the first iterations refine their regions and continueto negotiate with the driver.

The driver ellipse calculation as discussed in Section 3.2.2 involves recalculationof the major and minor axis which requires linear computation time to determine if apoint is inside the driver ellipse (Figure 3.1). Every passenger recomputes the bounding

97


box by halving it (along any one axis) such that the real location is still included in thereduced bounding box. As the number of these ellipse reductions are constant for everydriver the complexity of this step is linear.

We now describe three negotiation strategies for the passengers.

• S1: Source/Destination first. In this strategy either the source or the destinationis negotiated first. Only if the driver agrees for the pick up, the other PuP isnegotiated. This strategy can be beneficial if someone is starting (or going) froma low privacy risk location such as a shopping complex to a sensitive location suchas home.

• S2: Tandem approach. In this strategy the source and destination regions arenegotiated in tandem with the driver.

• S3: Best first. In this strategy the sensitivity of both the regions is computed andthe one with the smaller sensitivity is negotiated with in each iteration with thedriver.

3.3.3 Comparison of Models

The eBay model provides the highest privacy as long as the TTP is not compromised.This requires a high level of trust as the TTP has detailed data about the daily tripsof individuals. Access to the server data reveals all information about the locationsand behavior of the users. The Match Maker model, however, limits the TTP serverknowledge by not disclosing the exact time and location information. Therefore, theMatch Maker model introduces another layer of privacy protection for the users. Acompromised TTP can thus not be used to violate the location privacy of its users.

In the eBay model the TTP does all the computations so it needs to be a computa-tionally strong server. However in the Match Maker model the computations are dis-tributed to every driver client thereby reducing the load on the TTP server. The time andspace complexity is discussed in Sections 3.5.1.1, 3.5.2.1 and 3.5.3 which highlightsthe reduced computations of the Match Maker model. The latter model is more efficientas computations are offloaded and bounded. Although the decentralized Match Makermodel is more efficient, its key advantage is the gain of trust.

98


cilcim

Figure 3.5: User Connectivity Graph

3.3.4 Rating

All drivers and passengers have a rating which is a measure of their trustworthiness.Initially the system assigns some basic trust to every user based on the documents sub-mitted. For example, a person with no demerit points on the driver’s license with clearpolice verification gets highest rating. However, the ratings are being updated with par-ticipation in the system. Only the people who share a ride can provide a feedback andthe assumption is that the feedbacks are genuine.

At the end of every trip the driver and the passenger rate each other. This negative orpositive feedback is then used to update their rating. Recent ratings have higher weightsthan older ones. This is to reward or penalize recent user behavior. If a good driversuddenly starts getting negative feedback then this will show up in the updated ratings.Any non-negative monotonically decreasing function could be used, we have adoptedexponential time decay function as used previously for computing feedback trust in[109]. The system stores the number of positive (xi) and negative (yi) ratings for everydriver (Di) per day (ti). RDi = (xi,yi, ti). The decay rate (τ) is the rate at which the olderratings decay with time. A decay factor, τ = ln(2)/30, will reduce the rating by halfevery 30 days. The decay rate can be different for positive and negative feedback. Tohave a larger impact of negative feedback, we maintain τ2 < τ1.

Rating for the last n days is computed as:

X =n

∑i=1

(xi× e−τ1(now()−ti)) Y =n

∑i=1

(yi× e−τ2(now()−ti)) (3.5)

R1 = X/(X +Y ) (3.6)

Another factor which impacts the driver’s rating is the number of negotiations whichfailed ( f di

p ). If a malicious driver is negotiating with a large number of passengers (ndip )

to find out their location information but not picking them up then the driver’s rating

99


will go down.R2 = f di

p /ndip (3.7)

A connectivity graph can be constructed for all users using the driver, passengeragreement pair information. The TTP gets a pair value returned with rating for everytrip in the form of the onetime id of driver and passenger ([IDi, ID j]). For every driverthe TTP stores the set of passengers along with the number of times they have traveledtogether (Figure 3.5). Using the rating and pairing information the TTP can update itsdatabase for every driver: di :

⟨(p j,ci j), (pk,cik), . . . , (pn,cin)

⟩. To remove bias from the

feedback system the feedback from a person with more frequent trips can be weighedhigher. A driver who travels frequently with the same set of people implies that thedriver is trustworthy. Further, a passenger traveling with the same person frequentlywould achieve higher privacy as compared to disclosing the location details to a largenumber of drivers by traveling with a new driver every time. Using the connectivitygraph we compute a similarity index for every driver, passenger pair as the ratio of theirconnectivity count to the sum of all connectivity counts for the passenger.

SIp j(di, p j) =ci j

∑drivers(cl j)SIdi(di, p j) =

ci j

∑passengers(cil)(3.8)

The rating for a driver Rdi is computed by integrating Equations 3.6, 3.7 and 3.8:

Rdi = α1(R1)+α2(R2)+α3(SIdi) :3

∑i=1

αi = 1 (3.9)

The rating for a passenger when computed for driver di is:

Rp j = α(R1)+ (1−α)(SIp j) (3.10)

3.4 Attacker Model

We describe the attack types and compare the eBay and Match Maker model. There arethree parties involved: drivers, passengers and the trusted third party server.

100


3.4.1 Passenger Location Tracking Attack

Passenger location tracking attack implies that the location information is revealed tofind out the passenger’s exact source and destination pick up points. The degree towhich the attacker is able to refine the location (larger region or exact PuP) determinesthe privacy loss.

Exposure for a passenger whose exact PuP has been tracked is determined by theprivacy of the network Voronoi cell of the PuP.

In case of the eBay model the drivers and passengers information is exchanged onlyfor pairs who agree to travel together. The attacker could be a driver or the TTP server.

• A malicious driver: A malicious driver is one who does not pick up the passen-gers after obtaining their locations. The reason of the participation is to learn thelocation information of the passengers. However, in this model the location infor-mation for a maximum of n(n < k) passengers optimal for the driver’s route areknown by the malicious driver. A repeated offender can be penalized by reducingthe rating (Section 3.3.4) and eventually blocking the driver from the system.

• TTP server: Since the eBay model is the “know all and everyone” model, a com-promised server will fail to provide any privacy to the users.

In case of the Match Maker model only the initial hashed regions are revealed to theTTP. The rest of the negotiations happen between the chosen drivers and passengers.The attacker could be a new or existing driver or the TTP server.

• New Driver: A new driver could request passengers in certain regions from theTTP, negotiate with the passengers to refine their source, destination pick uppoints without the intention of picking them up. Such a system can identify ma-licious drivers by penalizing them for every failed negotiation and reducing theirrating exponentially. In addition, new drivers can be limited to request passen-gers from smaller number of regions which will decrease in tandem with reducedrating thereby affecting fewer passengers.

• Existing driver: If an existing driver with a high rating becomes an attacker, thenit might take longer to reduce the rating. But since recent behavior is weightedhigher, so the driver’s rating will come down eventually.

101


• TTP: If the TTP server is compromised then the attacker only learns the largerhashed regions for the passengers which do not pose a privacy risk to the passen-gers.

3.4.2 Driver Trajectory Tracking Attack

In the driver trajectory tracking attack a part or the entire driver trajectory is revealed.In the eBay model, the TTP has all the information about the agreed driver passenger

pairs and their source and destinations. The attacker could be a passenger or the TTPserver.

• Passenger: The passengers only know that their start and destination locationsare part of the driver’s trip. Since all decisions are made by the TTP no addi-tional inferences about the driver trajectory can be made by the passengers. Theyonly know that the driver’s trajectory might include or be close to their start anddestination locations.

• TTP: A compromised TTP reveals everything about the users. It can be manipu-lated to make a driver take a certain path depending on the passengers chosen forthe optimal route. There is no privacy for the users in this case.

In the Match Maker model, the TTP knows the hashed region requested by the driver.It also knows the user pairs which have reached an agreement for pick up. The attackercould be a group of colluding passengers or the TTP server.

• TTP: A compromised TTP does not reveal any part of the driver trajectory.

• Group of passengers: If a group of malicious passengers colludes and negotiateswith the driver, they can know which passengers the driver agrees to pick up andfind out the agreed pick up/drop points. This, however reveals nothing about thedriver’s start and destination locations. The passengers only learn that the driver’strajectory might include or be close to their start and destination locations.

3.4.3 Discussion

Our ride sharing system can incorporate any rating model. We have discussed the ba-sic attacker model, but we note that there could be other possible threats to such a ride

102


sharing system. For example, in one of the attacks a malicious driver could collude witha certain number of passengers to artificially inflate the rating. However, this particu-lar attack could only work on a very small number of passengers as ride sharing withnon colluding passengers will immediately adversely impact the rating of the maliciousdriver with a most recent negative rating.

Based on the comparison of the model in terms of privacy and efficiency we deter-mine that the Match Maker model provides stronger privacy. In the rest of the chapterwe will present techniques and algorithms to implement the Match Maker model.

3.5 Optimization Problem

In this section we describe the optimization problem for the Match Maker model. Inthis model there is no global optimization as the drivers and the passengers make theirdecisions independently in a greedy manner.

A trip (tr) is a path that a driver (di) takes while picking up passengers(pi1 , pi2 , . . . , pir) from their start and destination pick up points (PuPs,PuPe).

tr j = (di.ns,di.ts), (PuPj1 , t j1), (PuPj2 , t j2), . . . , (PuPjn , t jn), (di.ne, tr.te)

such that

tr.te ≤ di.te

t j1 = di.ts + tdist(di.ns,PuPj1 )

where tdist is the time to travel the distance dist.

The duration of the trip is tr.tdur = [di.ts, tr.te]. The best path (trsp) for the driver(di) is the shortest path between (di.ns,di.ne).

The trip cost (tr.c) for the driver is the product of the cost per unit distance (di.c)and the distance traveled by the driver for the trip. tr.c = di.c×dist(tr)

The revenue generated (tr.rev) by the trip in the revenue per unit distance (di.rev)for the distance traveled by each passenger in the trip. tr.rev = di.rev×∑

rk=1(dist(pik))

The driver optimization is to achieve the minimum trip cost and trip duration. Every

103


M1 M2M2

M4

Figure 3.6: Recursive Ellipse Computation

driver can have different preferences for the two objectives.

min(λ (tr.c)+ (1−λ )tr.tdur) : λ ∈ [0,1] (3.11)

subject to

tr.te ≤ di.te

(tr.c− tr.rev) < trsp.c

The passengers (pi1 , pi2 , . . . , pir) are offered trips by the driver at start times (tr.tpi js )

for their respective pick up points (pi j .PuPs). Every passenger has to pay a trip cost(tr.cpi j ) based on the distance traveled with the driver and the driver’s charged revenue.The duration (tr.t

pi jdur) of the trip for the passenger is the time the passenger travels with

the driver. A passenger might prefer the fastest trip or the trip starting at the earliesttime or the one the one with the lowest cost. Based on the passenger preference for anobjective, every passenger can have different weights for them in the objective functionshown in Equation 3.12.

min(λ1(tr.cpi j +λ2(tr.tpi jdur)+λ3(tr.t

pi js )) :

3

∑i=1

(λi) = 1 (3.12)

subject to

tr.tpi jdur < (pi j .te− pi j .ts)

tr.tpi js ≥ pi j .ts

104


3.5.1 Driver: Optimal Path Computation

The driver path ellipse (Section 3.2.2) is used to compute the optimal driver path withthe list of passengers that can potentially share a ride on that path. The time constraintsof both driver and passengers are satisfied by the chosen path. For a candidate passen-ger both the start and destination pick up point have to be inside the driver ellipse. Thedriver ellipse is described by its focii (a driver’s start and destination node), major axisand minor axis. The major and minor axis are computed using the time budget infor-mation assuming a constant speed in the road network. A point is inside the ellipse ifits sum of distance from the two ellipse focii is less than the major axis. In our algo-rithm we assume that after negotiation a driver knows the exact pick up points and timeinformation for the candidate passengers.

The pick up points for the candidate passengers are used to create a directed graphwhich has travel time of the shortest path between them as the weights. The shortestpaths are precomputed as the network pick up points are fixed.

Algorithm 3.1 uses the initial driver ellipse to start the path computation using depthfirst traversal. In lines 4-10, it checks if the driver’s destination has been reached and thereach time is within the driver budget the path gets added to the list of paths. After everynew node that is added to the path the driver ellipse is reduced recursively to change thestart focii as this node. Every ellipse reduction reduces the number of new pick uppoints to be considered for the path (Figure 3.6). In line 15 of the algorithm, everynew node is considered for the path, if the node is inside the current driver ellipse. Theoriginal driver ellipse (Figure 3.1) is DE − (F1,F2,a,b, tb); after the driver includes apick up point in its path, the time budget reduces by the time the driver takes to travel tothe pick up point (mi). We now describe the implementation of the computeNewEllipse

function in the algorithm (line 16). Assuming a constant speed s, to move around thenetwork: tb = tb−dist(F1,mi)/s. The major and minor ellipse axis are computed withF1 = mi, tb = tb. The new ellipse has reduced the search space by reducing the ellipsesize and rejecting pick up points outside the ellipse for the path. In Lines 17-20, weassume the presence of two functions, computePickUPList and canDropPassList whichreturn the list of passengers that the driver can pick from the new node and the list ofalready picked up passengers that can be dropped. The new ellipse is explored onlyif a new passenger can be picked up from this pick up point or an existing one can

105


be dropped. Every passenger that is picked up, should have the destination pick uppoint inside the reduced ellipse. Every new pick up point that gets added to the pathreduces the ellipse size and hence the search space (Figure 3.6). The algorithm is calledrecursively for every new node that gets added to a path with the current reduced ellipse.

The cost for all computed paths is calculated. The drivers select their paths accordingto their individual preferences. The drivers then offer their chosen trips to each of thepassengers. Once an offer has been issued the driver waits for all possible responsesbefore making a decision. In case not all passengers accept a driver’s offer, the driverrecalculates the cost of the path with the passengers who accepted. If the resulting pathis not feasible the driver can revoke the offer. The drivers who are not able to find a ridebut can still wait, can negotiate in the next round of offers.

For the special case, where the ride sharing is restricted to a single passenger pickup, Algorithm 3.1 can be simplified to simple pairwise lookups (for travel time) for eachcandidate passenger within the driver ellipse. In addition, if a driver can only pick upone passenger, the overall privacy should also go up because fewer passengers revealtheir imprecise locations to the drivers.

However, the number of rounds (nr) of negotiations that can take place for everydriver depends on the duration of a round (tr) and the extra time in the time budget fordriver (di). If a new negotiation is done every tr units of time, then nr can be calculatedusing the extra time that the driver has depending on the time taken for the shortest pathbetween the source and destination (tsp).

nr =di.te−di.ts−di.tsp

tr(3.13)

3.5.1.1 Complexity Analysis

Algorithm 3.1 uses depth first search to find the best possible paths for every driver. Thetime and space complexity of depth first search are O

(|N|+ |E|

)and O

(|N|)

respec-tively, where |N| is the number of nodes and |E| is the number of edges in the graph.If there are k passengers in the driver ellipse then in the worst case every passenger’sstart and destination pick up point will be different (no 2 passengers in the ellipse havecommon pick up points) resulting in a maximum of 2k+ 2 nodes (with 2 nodes for thedriver). In the worst case the graph is fully connected and there is a path between all of

106


these selected 2k+ 2 nodes, therefore the number of edges is (2k+2)(2k+2−1)2 .

|N|= 2k+ 2 (3.14)

|E|= (2k+ 2)(2k+ 2−1)2

= 8k2−4k (3.15)

O(|N|+ |E|

)= O

(k2) (3.16)

The time and space complexity of the depth first search is O(k2) and O

(k)

respectively.However, in practice we do not compute all the possible paths and abandon paths

when

• time requirements of the passenger in terms of start time or end time are not met,

• the time to reach a node does not leave enough time for the driver to reach itsdestination,

• both the source and destination points of the passenger are not included in thefeasible reduced ellipse,

• the path cost of the current path is already greater than path cost of the selectedtop x paths.

Ellipse Reduction. After addition of every new node the ellipse reduces recursivelythereby making the search space smaller and potentially reducing the number of can-didate passengers. We assume that at most a driver can pick three passengers so themaximum number of ellipse computations per path is three which makes it O

(1)

com-putations per path. However, there could be (k

3) combinations of the passengers for thedriver if every passenger’s time constraints can be satisfied. This would mean (k

3) pathsand thus O

(k3) potential computations and O

(1)

space. However, the ellipse reductioncomputation remains the same for every pick up point (2k + 2 nodes) and this can besaved and re-used to check for the path feasibility. Since a single ellipse reduction com-putation needs to be done per pick up point inside the ellipse and there are a maximumof 2k+ 2 such nodes, the worst case time and space complexity for this step is O

(k).

So, overall the worst case time and space complexity of Algorithm 3.1 is O(k2) and

O(k)

respectively. Since every driver can share a ride with a maximum of 3 passengerswe do not want the driver to communicate with every possible candidate passenger to

107


minimize the impact on passenger privacy. Since we restrict the number of passengers(k) to a small number the algorithm is efficient. We believe that for all practical purposesa small value of k is sufficient and we show in Section 3.6.4.1 through experiments thatlarger k values have little impact on occupancy rates.

3.5.2 Passenger: Optimal Driver Selection

After the drivers compute their best paths, they offer every passenger a trip with cost,start time and the duration of the trip. Based on the individual preferences, the passen-gers select a matching driver from the options. If after the passenger selection, the drivertrip is still feasible then the driver picks up the passenger at the decided start time. Apassenger might receive y offers from drivers out of which it selects one and sends nonacceptance to the rest of the offers. However, the selected driver might still not be ableto go ahead with the trip if the driver trip is not cost effective (if not all passengers on theoffered trip select the driver). In case the passenger cannot find a driver, the selectionparameters can be changed to renegotiate in the next round, when more drivers mighthave joined the system.

3.5.2.1 Complexity Analysis

In practice we found that it is important to limit not only the number of passengerscommunicating with every driver (k) but also the number of drivers every passengeris communicating to (l). A passenger will eventually accept a single driver’s offer, solimiting k and not l can lead to some passengers missing out on ride sharing who couldotherwise have found feasible rides. For simplicity, we assume that k = l. Every passen-ger looks at the offered trips and selects one based on the individual preferences (suchas lower cost or fastest time) thus resulting in a linear computation time per passenger.

3.5.3 Complexity Analysis of eBay model

The global optimization which aims to minimize the cost (travel time or travel distance)or maximize the number of participants in the ride sharing system is NP hard[24, 25].If the eBay model runs the local optimization and aims to reduce the vehicle km (perdriver) by selecting the shortest route with maximum passengers for every driver then

108


Algorithm 3.1 Optimal Paths Algorithm (CreatePaths)input : dg - directed graph of pick up points with weights as travel time, visited - stack

of nodes visited so far, ellipse - for driver di, time - cur time stack, driver -driver details structure, adjNodes(dg, node) - returns the list of adjacent nodesfor node in dg

output: PathArray- array of possible paths, PassListArr - candidate passenger list foreach path

for (∀curNode in ad jNodes(dg,visited.top()) doif (visited.contains(curNode)) then

continue /* current node is already visited in this path */

if (curNode == di.ne) then/* current node is driver destination */if (time.top() + dg(visited.top(), curNode) ≤ di.te) then

Path = getPath(visited,driver.destinationNode) PathArr.add(Path)PassListArr.add(PassList)

elsecontinue /* not in driver time budget */

if (time.top() + dg(visited.top(), curNode) < di.te) thenS = visited.top() S1 = curNode D = di.ne

if (inEllipse(S1, ellipse.top()) thennewEllipse = computeNewEllipse(ellipse.top(),S1) pickUpPassList =

computePickUpList(curNode, time.top() + dg(visited.top(), curNode,newEllipse) /* driver can pick up passengers from thisnode and the destination for the passengers is insidethe newEllipse */dropPassList = canDropPassList(curNode, src_list, time.top() +dg(visited.top(), curNode) /* driver can drop passengersalready picked up */

if (pass.empty() and dropPass.empty()) thencontinue

if (noActivityNode(visited, dropPassList, PassList)) thencontinue

src_list.add(pickUpPassList) PassList.add(dropPassList) el-lipse.push(newEllipse) cur_time = time.top() + dg(visited.top(),curNode) time.push(cur_time) visited.push(cur_node) [PathAr-ray, PassListArr] = CreatePaths(dg, visited, ellipse, time, driver)/* Recursive Call */

visited.pop() ellipse.pop() time.pop() src_list.delete(pickUpPassList)

109


Algorithm 1 is executed for every driver by the TTP to select a path. The best possiblematches are communicated to the potential drivers and passengers by the TTP whichthey have to agree to. Since every driver talks to every possible passenger (to gen-erate best possible matches), for n drivers communicating with an average number ofk passengers, the resulting complexity is O

(n× k2) time and O

(n× k

)space. For m

passengers the driver selection with l drivers per passenger would cost O(m)

time andO(1)

space. As n, m and k are expected to be large for a successful ride sharing sys-tem the eBay model requires a large computational infrastructure to perform even localoptimization.

3.5.4 Discussion

Since the local optimization model presents a greedy solution where the drivers andpassengers make their own local choices it neither guarantees that drivers will be ableto pick up passengers they made offers to nor that every passenger will get a ride (eventhough they might have offers from multiple drivers). Our model relates to game the-oretic approaches with independent selfish agents (drivers and passengers) which wehave discussed in Section 3.7.4. The complexity analysis of the Match Maker (Sec-tion 3.5.1.1, 3.5.2.1) and eBay model (Section 3.5.3) highlights that the Match Makermodel is more efficient as computations are offloaded and bounded by the parameters k

and l. This decentralized model minimizes privacy risks and limits and distributes thenumber of computations. However, in the decentralized model we are less concernedabout computational savings but more in the gain of trust. In addition, we will showthrough experiments (Section 3.6.4.1) for different k’s that a small k is not only goodfor privacy but also leads to good occupancy rates.

3.6 Experimental Evaluation

We now present our simulation setting and environment setup for the experiments.

110


3.6.1 Simulation Setting

The Australian Bureau of Statistics (ABS) provides statistics for Australia by divid-ing it in smaller blocks called Mesh Blocks2. Each mesh block can be hierarchicallylinked to larger regions called suburbs. The ABS provides estimated resident popu-lation, dwellings, and type of dwelling (residential, commercial, transport, parkland,water, education, hospital, others) data for each mesh block. The number of passengersand drivers are generated according to the population and size of each mesh block.

We divide the Melbourne map into these mesh blocks and use the dwelling type toassign sensitivities to each block. Each dwelling, is used to assign number of locationsto each region. Using the average sensitivity and number of locations we compute theprivacy for each region.

The Victorian (Victoria is one of the highly populated states in Australia) IntegratedSurvey of Travel and Activity (VISTA) provides data about travel such as total trips perday, average trip length, mode of travel, population [110]. We use these statistics to gen-erate the average number of passenger and drivers. VISTA provides the transport modeshare data for the city of Melbourne (Table 3.3), which we use to find the percentage oftotal trips taken by every mode of travel. The idea is to evaluate the effect of ride shar-ing on congestion and travel times given that a small number of people who currentlydo not share the ride will use the ride sharing system. We investigate the impact if onefifth of the total population traveling in a day as drivers (20% of 52.7%) and one fifthof population using other modes (20% of 9.4% - Bicycle, tram, train and bus) decide toride share and generate passenger requests accordingly.

3.6.2 Simulation Environment

We perform our experiments on the road network of Melbourne, Australia, which con-tains 142,473 nodes and 280,475 edges. The map is generated using OpenStreetMap(www.openstreetmap.org). We perform our experiments using MATLAB. We extractthe road network nodes and edges from OpenStreetMap and load them in MATLABwhile keeping information such as road type (which determines the speed limit), con-nectivity information and length of each edge. We have implemented Algorithm 3.1 to

2http://www.abs.gov.au/websitedbs/censushome.nsf/home/meshblockcounts

111

http://www.abs.gov.au/websitedbs/censushome.nsf/home/meshblockcounts


Mode of Travel % of total tripsVehicle Driver 52.7

Vehicle Passenger 24.4Walking Only 12.5

Bicycle 1.7Train 4.3Tram 1.7Bus 1.7

Others 0.8Table 3.3: Melbourne Transport Indicators (VISTA)

compute the ride sharing proposed paths for every driver. In particular, we have usedsome of the already implemented algorithms included in the MATLAB library such asgraphshortestpath (to find shortest path via Dijkstra for rider population generation).The pick up points are major road network intersections which we believe are morelikely to minimize the deviation of a driver’s travel path. The pick up points are ran-domly chosen according to the population density of the regions.

We generate driver and passenger populations on different region sizes of the map.

To make our simulation as realistic as possible we use the VISTA and ABS data-sets to generate the number of users for each region in the experiments. The number ofpassengers and drivers are generated according to the population and size of the regionwhich is obtained from the ABS mesh block data. Based on the number of dwellingsin the region the number of drivers originating from that region is determined. We gen-erate random destinations for every generated rider. The average travel distance for theexperiments is 10.2 km (obtained average trip length for Melbourne from VISTA data-set) and the range goes from 5 to 20 km. We assume that every vehicle has a maximumcapacity of 4 (including the driver). Since Melbourne peak time car occupancy rate is1.2, we generate the number of vacant seats per driver accordingly.

The simulation has a start time and end time which defines a time period duringwhich we run the simulation (for example, peak rush hour between 8 AM to 10 AM).All the generated drivers and passengers start their trips at a random time during thisinterval. The end time for a driver is the sum of the time taken to reach the destinationusing the shortest path (with an overhead randomly chosen of 10-20% to reflect differenttraffic situations) and the extra time the driver is willing to travel to pick up passengers

112


(this is used to compute the driver ellipse). For a passenger the end time is sum of thetime taken to reach the destination using the shortest path (with an overhead randomlychosen of 10-20% to reflect different traffic situations) and the time the passenger iswilling to wait before starting the trip (randomly chosen for passengers between 10-50% of their shortest trip time).

In our system each driver charges randomly between $0.5 and $2 per km from thepassengers depending on their ratings and other specifications such as type of vehicle.We considered the distance rate changed by taxis in peak and off peak hours3 to obtainan upper bound for this cost. A 10.2 km trip is approximately $19.70 (with a flag fall of$3.20 and per km cost of $1.617). There is also a cost that every driver incurs to actuallytravel per km which includes the fuel cost, toll cost and maintenance cost of the vehicleper km. For a proposed driver trip to be viable the cost of trip has to be less than therevenue generated by sharing rides with the passengers (path cost in Section 3.2.2). Wegenerate this cost to be less than half of the expected revenue for each driver.

To simulate the behavior of a system that has been operating for some time we setthe initial rating for all passengers and drivers randomly on a Likert scale between 1 and5 (5 being the most trusted). Passengers and drivers have a minimum default rating intheir preference setting which is the same as their own initial rating. Other than this thepassengers choose the driver based on other preferences such as earliest starting time,trip time, and trip cost. The proposed trips are ranked according the preference for everypassenger and a trip is selected accordingly. We set the preferences such that 50% ofthe passengers prefer the lowest cost, 30% the shortest trip time and 20% opt for theearliest starting time.

The experiments are conducted for ten different simulation runs with different pop-ulations of drivers and passengers. We assume that every driver is willing to travel alittle extra to find possible candidate passengers. We use this spare time to computethe driver path area ellipse (Section 3.2.2), the time budget of the driver determines thedriver ellipse size (DE Size). We have set this spare time default value to 25% of thedriver’s original time making the default driver ellipse size to be 1.25.

The driver to passenger (D/P) ratio determines the number of passengers per driverin the system. The higher this ratio the higher is the chance of a driver to find candidatepassengers and share rides successfully. We set the default D/P ratio to 1/3 so that in

3http://www.taxifare.com.au/rates/australia/melbourne/

113



the best case there is a possibility for every driver to pick 3 additional passengers andget the maximum achievable occupancy (4) where every vehicle in the system moveswith full capacity.

We vary these parameters (Table 3.4) and measure the effect of changing these pa-rameters on ride sharing.

3.6.3 Measurements

Occupancy (occ): Occupancy is the average number of people traveling in a car. Higheroccupancy indicates better utilization of existing resources such as cars, fuel and re-duced load on the road network. If p is the number of passengers and d is the numberof drivers, then the maximum achievable occupancy is 1+ p/d.

Relative reduction in vehicles (in percent) (Vr): Vr measures the percentage reduc-tion in the number of vehicles on the road network due to ride sharing. Vt denotes thetotal number of vehicles (drivers and passengers) on the road network without any ridesharing. Vrs denotes the number of vehicles with ride sharing as the sum of number ofvehicles for drivers and the number of passengers who do not get a shared ride.

Vr =Vt−Vrs

Vt×100 (3.17)

Relative reduction in vehicle km (in percent) (V KTr): V KTr measures the per-centage reduction of total vehicle km traveled in the system. V KTt denotes the totalnumber of vehicles km (drivers and passengers) on the road network without any ridesharing. V KTrs denotes the number of vehicles with ride sharing as the sum of vehiclekm for drivers (with longer paths with ride sharing) and the vehicle km of passengerswho could not share a ride. We also include the vehicle km added by passengers who donot get picked up. In the worst case each passenger will travel in an individual vehicle.

V KTr =V KTt−V KTrs

V KTt×100 (3.18)

It is important to measure V KTr as reduction in the number of vehicles is not a suffi-cient indicator of the load reduced on the road network. For example, in figure 3.7, ifV1,V2,V3 share ride with V4, V KTr = (6− 3)/(6)× 100 = 50%. Whereas if V5,V6,V7

114


V1 V2 V3

V4

V5

V6

V7

V8

Figure 3.7: Relative Reduction in Vehicle km

share ride with V8 then V KTr = (4−1)/(4)×100 = 75%. In both these cases the num-ber of vehicles on the road network after ride sharing remains the same. The maximalachievable reduction is given by 1−1/occ.

Relative reduction in traffic Load (T Lr): Traffic load is the combined indicatorof the effect of ride sharing on the road network. It takes into account the reduction invehicles as well as the vehicles km. Traffic load measures the load on the road networkas the product of the number of vehicles and the average trip length because both factorslinearly contribute to the overall traffic load. Shorter trips result in traffic load reduction,same as reduction in the number of vehicles. The ride sharing traffic load (T Lrs) is theproduct of number of vehicles (drivers and passengers who could not share a ride) andthe reduction in vehicle km (1−V KTr). Since the traffic load T Lt without any ridesharing is number of vehicles (total passengers and drivers) with 100% vehicle km, wecan compute the relative reduction in traffic load (T Lr) as:

T Lr =T Lt−T Lrs

T Lt×100 (3.19)

Travel time (T Tr): The ride sharing travel time (T Tr) is the average travel timeper person. With ride sharing we assume that the drivers and passengers who do notget shared ride continue to travel individually. The average per person travel time T Tt

without any ride sharing is the average of the travel time for all drivers (shortest routefrom their source to destination) and passengers (assuming every passengers travels onshortest path individually).

Sensitivity (s): There are different ways to extrapolate from the sensitivity of anindividual point to the sensitivity of a region including those points. One way is tocompute the maximum sensitivity. However, we choose the average sensitivity as ourregions are quite large and the maximum would distort the region’s sensitivity. There-fore, we compute the sensitivity of a region R, as the average of the sensitivity of each

115


location in the region.sR = avg(si)∀si ∈ R

Privacy: As discussed in Section 3.2.2 the privacy of a location is 1/si. If we takethe average sensitivity sR, over N locations in a region, then the privacy is PR =∑

Ni=1(si).

PR = N/sR

Exposure (E): Exposure is the amount of information exposed about a user orknowledge gained about other users by participating in the system. If the privacy ofa region is high (for example, a highly sensitive location hidden amongst many lowsensitivity locations) then the passenger’s risk of exposure is low. We now discuss theexposure levels from the passenger and driver’s point of view.

• Driver: How much knowledge does each driver gain about the passengers nego-tiating with the driver in every negotiation round? If the driver learns about Ui

passengers in the ith negotiation round then the exposure is the average informa-tion revealed (1/P) by the Ui passengers.

• Passengers: How much do drivers know about every passenger? If the passengerexposes the region information to Ui drivers in the ith negotiation round, thenexposure is the information revealed 1/P, by the passenger to Ui drivers.

Ei =Ui/P (3.20)

Cumulative exposure (CE): The actual exposure in round i, is computed for theusers Ui in that round. However, there are additional users who have gained knowledgein the previous negotiation rounds. To reflect the overall exposure in every round wealso measure and report the cumulative exposure. For a passenger it means that if initeration 1, the passenger provides a larger region with higher privacy (P1) to n1 drivers,but in round 2 it provides a smaller region with privacy (P2) to fewer drivers n2(n2 ≤ n1)

than in the previous round. The cumulative exposure after round 2 is:

CE2 =n2

P2+

n1−n2

P1

Similarly, for a driver this means that the driver gained some knowledge of the passen-gers who got rejected in the i+ 1 iteration which also needs to be taken into account.

116


Overall, for k rounds of negotiation with nk drivers in every round with Pk privacy, CEis computed using Equation 3.21. For the driver, after k rounds of negotiation with nk

passengers in every round with Pk average privacy, CE is computed using Equation 3.21.

CEk = Ek +k−1

∑i=1

ni−ni+1

Pi(3.21)

In [18], the authors report the relative distance rate (RDR) to compute the effec-tiveness of ride sharing in a taxi sharing environment. However, we do not report thismeasure, as it does not reflect the vehicle km traveled by the passengers who do not getpicked up and is not a complete measure for our case.

3.6.4 Ride Sharing

The parameters that impact ride sharing are, driver to passenger (D/P) ratio, the timebudget of the driver which impacts the driver ellipse (DE) size, the actual number ofdrivers and passengers in the system and the density of users per unit area of road net-work. We keep the network size fixed and vary the 4 parameters to measure the effec-tiveness of the ride sharing system. The D/P ratio bounds the maximal achievable ridesharing occupancy. The time budget or the DE size impacts ride sharing by controllingthe maximum detour a driver can take to pick up passengers. The actual number ofdrivers and passengers impacts ride sharing by affecting the user density per sqm areaof the region. Higher density implies more opportunity for ride sharing. In additionthe individual drivers and passengers have their preference criterion to select the bestmatch. The path computation and selection is done by running Algorithm 3.1 for everydriver and using their choice parameters as shown in Equations 3.11& 3.12. We varythe parameters as shown in Table 3.4. The number of passengers and drivers are gen-erated according to the population and size of the region. We use the VISTA and ABSdata-sets to generate the number of users for each region in the experiments.

3.6.4.1 Choice of Parameter k (Number of Passengers Negotiating Per Driver)

We discussed in Section 5.1.1 that a small choice of k (number of passengers negotiatingwith every driver) should be sufficient for ride sharing since every driver can finally

117


0 5 10 20 400

1

2

3

4

k

Occ

upan

cy

(a) Occupancy

0 5 10 20 400

10

20

30

40

50

60

k

Red

uctio

n in

veh

icle

s km

s

(b) Vehicle km reduction (%)

0 5 10 20 400

10

20

30

40

50

60

k

Red

uctio

n in

veh

icle

s

(c) Vehicle reduction (%)

0 5 10 20 400

20

40

60

80

100

k

Red

uctio

n in

traf

fic lo

ad

(d) Traffic load reduction (%)

Figure 3.8: Impact of k on Occupancy, Vehicles (km) and Traffic Load Reduction: kis the maximum number of passengers in every driver ellipse with at-most 3 passengervacancy.

118


No. DE Size D/P Ratio # Drivers1 1.25 4/1-1/5 2002 1.25 1/3 50-2503 1.25-3 1/3 200

Table 3.4: Experiment Settings

share ride with a maximum of 3 passengers. We now present our experiments for theselection of an optimal value of k. We evaluate the impact of k ∈ {5,10,20,40} on theoccupancy rate and exposure. The default values are as follows: DE size - 1.25, D/Pratio - 1/3 and number of drivers - 200. As seen in Figure 3.8, the occupancy, reductionin vehicle km and reduction in traffic load do not show any significant increase with anincrease in k beyond 10. Thus, we fix k to 10 for our experiments. As discussed in andSection 3.5.2.1 we keep k = l for our experiments. Further, our experiments comparingthe effect of k on exposure (Figure 3.14(a)) confirm that setting the default value ofk to 10 controls the number of computations while maintaining occupancy rates andaddressing privacy concerns for the drivers and passengers.

In the current model we assume that the driver provides the initial catchment area(bigger ellipse to start negotiation) centered around the actual ellipse which keeps k

small. However, if a higher degree of privacy is needed an ellipse can have any align-ment, which may result in larger k values. We will investigate this in our future work.

3.6.4.2 Effect of Driver to Passenger (D/P) Ratio

We evaluate the effect of increasing the number of passengers from 50-1000 while keep-ing the number of drivers fixed at 200 (Figure 3.9). The first row of Table 3.4 shows theexperimental setup. As expected, the occupancy increases with the increase in the num-ber of passengers (Figure 3.9(a)). This is because as the number of passengers increasesthe drivers get more opportunity to find passengers within their time constraints. With aD/P ratio of 4/1, the theoretical maximum achievable occupancy is 1.25 and with a D/Pratio of 1/3, the theoretical maximum achievable occupancy is 4. D/P ratios lesser than1/3 cannot achieve occupancy ratio higher than 4 as we assume that every driver hasa maximum of 3 vacant seats. However, the theoretical maximum cannot be achievedas there are additional time constraints and user preferences which limit the choices.

119


4/1 1/1 1/2 1/3 1/50

1

2

3

4

D/P ratio

Occ

upan

cy

(a) Occupancy

4/1 1/1 1/2 1/3 1/50

10

20

30

40

50

60

D/P ratio

Red

uctio

n in

veh

icle

km


4/1 1/1 1/2 1/3 1/50

10

20

30

40

50

60

70

D/P ratio

Red

uctio

n in

veh

icle

s


4/1 1/1 1/2 1/3 1/50

10

20

30

40

50

D/P ratio

Red

uctio

n in

traf

fic lo

ad


Figure 3.9: Effect of D/P Ratio: Impact of driver to passenger ratio on occupancy,vehicles (km) and traffic load reduction.

120


0 50 100 150 200 2500

1

2

3

4O

ccup

ancy

#Drivers (D/P ratio = 1/3)

(a) Occupancy

0 50 100 150 200 2500

20

40

60

80

Red

uctio

n in

veh

icle

km



0 50 100 150 200 2500

20

40

60

80

Red

uctio

n in

veh

icle

s



0 50 100 150 200 2500

20

40

60

80

100

Red

uctio

n in

traf

fic lo

ad



Figure 3.10: Effect of User Density: Impact of user density on occupancy, vehicles (km)and traffic load reduction.

The relative reduction in vehicle km (Figure 3.9(b)) increases initially as the number ofpassengers increase (increasing the opportunities for ride sharing). However, it startsdecreasing for any D/P ratio less than 1/3 since additional passengers added might notfind trips with the same number of drivers. Hence, even though the passengers areincreasing the ride sharing opportunities remain nearly the same and the additional pas-sengers just add to the load in the system. This is also reflected in the reduction of thetraffic load (Figure 3.9(d)) with an initial higher reduction which levels out. We alsoran the experiments with a fixed number of passengers and varied the number of driverswhich showed similar results.

121


3.6.4.3 Effect of User Density

In these experiments (Figure 3.10), we study the impact of user density. As shown inthe parameters in row 2 of Table 3.4, the D/P ratio is constant 1/3 while we increasethe number of drivers and passengers to increase the user density in a fixed region.As expected, we observe an increase in the occupancy with the increase in the userdensity (Figure 3.10(a)) as there are more opportunities for passenger pick up. We donot increase the number of drivers beyond 250 as the theoretical maximum occupancy(1+ 3/1 = 4) is achieved. Increasing the users beyond this point will result in a flatcurve. Similarly for reduction in vehicle km (Figure 3.10(b)), the maximum achievableis 75% for this case, any increase in the user density will not reduce the vehicle kmfurther.

3.6.4.4 Effect of Driver Ellipse Size

We now evaluate the impact of the change in driver ellipse size (Figure 3.11). Thethird row of Table 3.4, shows the experimental setup. As the time budget of the driverincreases the amount of the time that the driver can spend to detour increases and sodoes the size of the driver’s ellipse. With an increase in ellipse size the number ofcandidate passengers for a driver increases providing more opportunities for picking uppassengers. As expected, the occupancy and percentage reduction in vehicles increaseswith larger driver ellipse size. However, a larger increase in occupancy occurs after thetime budget increases from a factor of 2 to 3 as with a linear increase in time the ellipsesize increases quadratically. This makes many more passengers available for pick upwhich increases the occupancy within the system.

The percentage reduction in vehicle km (Figure 3.11(b)) decreases after the increaseof time budget from 2 to 3. Since with a larger time budget the driver can accommodatelonger trips to pick up people, the overall length of the trip might be equal to or evenlonger than the independent trips of the driver and the picked up passengers causing adecrease in the reduction of vehicle km.

3.6.4.5 Travel Time

With ride sharing, the drivers on average take longer routes for ride share as long as thedetour is within their travel preferences. As the trip is shared with multiple passengers

122


1 1.25 1.5 1.75 2 30

0.5

1

1.5

2

2.5

Occ

upan

cy

DE Size

(a) Occupancy

1 1.25 1.5 1.75 2 30

5

10

15

Red

uctio

n in

veh

icle

km

DE Size


1 1.25 1.5 1.75 2 30

10

20

30

Red

uctio

n in

veh

icle

s

DE Size


1 1.25 1.5 1.75 2 30

10

20

30

40

50

Red

uctio

n in

traf

fic lo

ad

DE Size


Figure 3.11: Effect of DE Size: Impact of driver ellipse size on occupancy, vehicles(km) and traffic load reduction.

123


0 50 150 250 300 450 5000

0.02

0.04

0.06

0.08

0.1

#Drivers

Trav

el T

ime

Non Ride Sharing TimeRide Sharing Time

(a) D/P Ratio = 1/1

0 50 100 150 200 2500

0.02

0.04

0.06

0.08

0.1

#Drivers

Trav

el T

ime

Non Ride Sharing TimeRide Sharing Time

(b) D/P Ratio = 1/3

Figure 3.12: Travel Time

this may lead to slightly longer routes for the passengers as well. We assume thatthe remaining passengers and drivers who could not find a match continue to travelto their respective destinations using the shortest path. This implies that the higher theoccupancy the higher is the per person travel time as people take longer routes to reachtheir respective destinations. However, with ride sharing the number of vehicles and theoverall traffic load on the road network reduces which is likely to incur smaller traveltimes.

We used the link congestion function developed by the US Bureau of Public Records[111] based on the Frank-Wolfe algorithm for traffic equilibrium.

Sa(va) = ta

(1+ 0.15

(va

ca

)4)

(3.22)

where,

• ta = free flow travel time on edge a per unit of time.

• va = volume of traffic on edge a per unit of time.

• ca = capacity of edge a per unit of time computed as possible number of cars onthe link calculated using length of link divided by average car length (4m).

• Sa(va) is the average travel time for a vehicle on edge a.

124


We compute the number of cars on every edge of the road network and changethe per edge travel time using the above formula. To demonstrate our results we usethe parameters settings in line 3 of Table 3.4, the D/P ratio is set to 1/3 and 1/1for theregion while we increase congestion by increasing the number of drivers and passengers.As expected, the results (Figure 3.12(a)) exhibit that the ride sharing average travel timegoes up initially when the number of cars on road are not sufficient to cause congestionfor the 1/1 case. But, the ride share travel time goes down as the congestion on the roadnetwork increases with increasing drivers and passengers. As expected, a change in theD/P to 1/3 demonstrates larger difference between the non ride sharing and ride sharinginstances. This is explained by the increased opportunity for drivers to ride share andthe achievable occupancy increase to 4 from 2 in case of 1/1. Although the error barsoverlap in the 1/1 case, the clear trend of ride sharing out-performing non ride sharingtravel time is shown for ratio 1/3. The error bars in figure 3.12(b) indicate the rangeof travel times observed for ten different simulation runs with different populations ofdrivers and passengers. The variation for travel distance of the introduced drivers andpassengers at every increase in the number of commuters accounts for the differentranges of values for travel time. For lighter traffic the travel time goes up but since theoverall number of vehicles still decreases there are other benefits such fuel consumption.Our results show that ride sharing is particularly beneficial during rush hours when thetraffic load is high.

3.6.4.6 Number of Ride Sharing Rounds

In every round the drivers make ride share offers to the passengers for the best selecteddriver trip. If the passengers agree (not all passengers in the trip might agree) and thedriver’s route is still feasible (in terms of cost) then the driver goes ahead with ride.Otherwise the drivers and passengers who are not able to find a ride but can still wait,might decide to negotiate in the next round of offers. Although we cannot guarantee thatdrivers will be able to pick up passengers they made offers to and that every passengerwill get a ride, our simulations (default parameters for DE size (1.25) and D/P ratio(1/3)) with a fixed number of drivers and passengers (such that no new drivers andpassengers join after every negotiation round) show that the average number of roundsper passenger is less than 2 (1.746) and the average number of rounds per driver is a

125


little over 1 (1.284), which means the number of rounds are generally small on average.In a real scenario the number of rounds depends on the new drivers and passengers whokeep joining the ride sharing system and the time budget of the users (people who joinearly and are more flexible might get a trip earlier).

In this greedy solution the drivers and passengers make their own local choices,which means that the algorithm does not guarantee that drivers will be able to pick uppassengers they made offers to and that every passenger will get a ride (even thoughthey might have offers from multiple drivers). In our simulations we find that for everyfeasible passenger trip offered out of the total number of passengers who requested a ridenot all passengers were able to secure a trip. On an average 19.80% of the passengersdid not find shared rides in the system. This is due to the high D/P ratio of 1/3 whichassumes that every car is filled with exactly 3 passengers. In practice may not be feasibledue to the time constraints of drivers and passengers. We expect that for lower D/Pratios, e.g. 1/2, the number of passengers who cannot find a shared ride would decrease.

3.6.4.7 Economic Incentive of Ride Sharing vs Taxi

In these experiments we randomly choose a set of 1000 passengers who share a ride andcompute how many passengers would not take a taxi given its economic cost. We setthe default parameters of DE size to 1.25 and D/P ratio to 1/3.

We use the Melbourne taxi rates4 at peak hours to compute the cost of taking a taxifor these passengers along their shortest routes. For Melbourne, the cost of taking a taxiis the sum of $3.20 for the flag fall, $2 for the booking fee (booking fee added randomlyfor the passengers as some might need to book the taxi in advance) and $1.617 per kmtimes the shortest route distance for the passenger. The cost for ride share is the costpublished by the drivers when they offer rides to passengers. In our system each drivercharges between $0.5 and $2 per km from the passengers, and passengers choose thedriver based on their preferences such as rating, starting time, trip time, trip cost andother personal preferences.

We then compare the taxi cost with the cost incurred by the same passengers due toride share in our system to reach their respective destinations. We observe that on anaverage a passenger doing ride share saves a significant 66.35% as compared to the cost

4http://www.taxifare.com.au/rates/australia/melbourne/

126



of a taxi. Hence, the cost of taking a taxi is not only high in terms of travel time andcongestion as more vehicles gets added to the road network but also in terms of the realprice for the users as compared to ride sharing. This makes ride sharing an economicincentive for both the passengers and drivers if ride sharing is possible. Otherwise,taking a cab or use of their own vehicles is still an option.

3.6.5 Comparison of Negotiation Strategies

We compare the three privacy negotiation strategies presented in Section 3.3.2.2. Figure3.13 presents the results for the information revealed by the passengers. After everyround of negotiation, the selected passengers reveal a refined region to the negotiatingdrivers (smaller than or same as the last round). Privacy for a passenger is the minimumof the source and destination privacy achieved in every round of negotiation. In S1(Figure 3.13(a)), the passenger first negotiates solely using the source and then uses thedestination for negotiation. After the source negotiation the negotiating drivers knowthe exact start pick up point of the passenger, thereafter the passenger’s privacy is verylow. As expected, the source first strategy (S1) exposes the most information. StrategyS2 negotiates with source and destinations in tandem while S3 negotiates the regionwith lower privacy first. We observe that S2 and S3 show similar performance with S2performing better than S3 in the first few rounds (Figure 3.13(d)). This is because theregions have similar average sensitivities and so a higher privacy region might also belarger. Although S3 achieves higher privacy in the negotiation rounds, the number ofdrivers negotiating with the larger passenger regions is also high causing the cumulativeexposure for S2 and S3 to be similar.

3.6.6 Effect of Ride Sharing Parameters on Negotiation Strategies

The negotiation strategy starts the ride sharing process where the drivers and passengerscommunicate to find the possible ride share candidates. The chosen candidates then gothrough the ride share process to find the exact candidates they want to share trips with.The ride sharing parameters DE size and D/P ratio impact the amount of informationexposed by the drivers and passengers. We now evaluate the impact of these parametersand refer to Table 3.4 for the experimental settings.

127


0 2 4 6 8 100

10

20

30

40

50

Iterations

Exposure# DriversPrivacy

(a) Source First Strategy (S1)

0 2 4 6 8 100

10

20

30

40

50

Iterations


(b) Tandem Strategy (S2)

0 2 4 6 8 100

10

20

30

40

50

Iterations


(c) Best First Strategy (S3)

0 5 10 15 200

2

4

6x 10

−3

Iterations

Exp

osur

e

S1S2S3

(d) Cumulative Exposure

Figure 3.13: Comparison of Negotiation Strategies

128


0 5 10 20 400.4

0.6

0.8

1

1.2

1.4

1.6x 10

−4

k

Cum

ulat

ive

Exp

osur

e

S1S2S3

(a) Effect of k

1 1.25 1.5 1.75 2 30

1

2

3

4

5

6x 10

−4

DE Size

Cum

ulat

ive

Exp

osur

e

S1S2S3

(b) Effect of DE Size

4/1 1/1 1/2 1/3 1/4 1/54

6

8

10

12x 10

−5

D/P ratio

Cum

ulat

ive

Exp

osur

e

S1S2S3

(c) Effect of D/P ratio

Figure 3.14: Effect of Ride Sharing Parameters on Negotiation Strategies

129


3.6.6.1 Effect of k (Number of Passengers Negotiating Per Driver)

The number of passengers and drivers who can communicate with each other controlsthe number of computations and the degree of privacy in the system. The experimentsin Section 3.6.4.1 demonstrate that a value of k higher than 10, has negligible impact onoccupancy. In these experiments we measure the increase in the cumulative exposureof the three strategies with the increase in k. As seen in Figure 3.14(a), the exposureincreases by almost 60% with the increase in k from 10 to 40. This is a significantincrease considering that the occupancy increase for the same increase in k is negligible.This confirms our assumption of setting the default value of k to 10 which controls thenumber of computations. The default value maintains occupancy rates and addressesprivacy concerns for the drivers and passengers.

3.6.6.2 Effect of DE Size

The driver ellipse size is determined by the amount of detour each driver is willing totake to find possible candidate passengers. The drivers start negotiating with an ellipselarger than their actual detour size and keep on reducing it in every successive iterationuntil their actual size is reached and all passengers have revealed their precise locationand time information. Although we bound the maximum number of passengers a drivercan talk to by parameter k, a larger ellipse would mean that more drivers get a chance tofind k such passengers. The third row of Table 3.4 shows the default experimental setup.We vary the DE size from 1.25 to 3 and study the impact on the cumulative exposureof the three strategies. As expected the exposure increases with increase of DE size(Figure 3.14(b)). Relatively the three strategies show a similar behavior as discussed inSection 3.6.5.

3.6.6.3 Effect of D/P Ratio

The increase in driver to passenger ratio increases the number of opportunities presentedto the drivers to find the candidate passengers for ride share. The first row of Table 3.4shows the default experimental setup. We vary the D/P ratio from 4/1 to 1/5 andstudy the impact on the cumulative exposure of the three strategies. As expected, theexposure increases with the increase in D/P ratio as it is possible for the drivers to findmore candidate passengers to share rides with. However, since the maximum number

130


of passengers a driver can communicate with is bounded, the exposure will not keepincreasing with the increase in passengers because the maximum number of passengersa driver could find has been reached and further addition of passengers in the systemdoes not effect the exposure. As seen in figure 3.14(c), the exposure initially increaseswith the increase in the D/P ratio to 1/2 but then becomes constant for any furtherincrease in passengers. As expected, the exposure increases with the increase in D/Pratio, however after a point the D/P ratio does not impact the exposure significantly as k

restricts the maximum number of passengers a driver could communicate with.

3.7 Related Work

In this section we discuss the literature relevant to our work. The ride sharing andlocation privacy literature has been discussed in Chapter 2.

3.7.1 Space Time Prism and Imprecision

The concept of time geography focusing on the spatial and temporal constraints of anindividual’s activity to compute the spatial activity prism was first introduced by [112].The prism defines the area the individual can occupy in space and time if the maximumvelocity for moving, the start and destination points, any stops in between and the startand destination time are known. There has been a large body of work incorporatingthe time geographic concepts into a geographic information system (GIS) environment(e.g., [106, 113, 114]).

We use the 2-D space projection of the space time prism to compute the driver patharea ellipse (Section 3.2.2) which is used to shortlist candidate passengers for everydriver and compute the optimal path. The ellipse computation is part of the heuristics torecursively eliminate infeasible paths and reduce the search space. Every feasible routeis inside this space but this does not imply that every possible route that we plan in thisspace is feasible.

Our negotiation based Match Maker model (Section 3.3.2.2) uses the concept of im-precision (not being precise about which pick up point the individual is at) and followsthe idea of obfuscation [26] where a greater degree of imprecision implies a greaterdegree of privacy.

131


There is a large amount of research using and extending the concept of space timeprisms for moving object trajectory databases [115–117]. These databases store trajec-tories of moving objects (vehicles or individuals) where the measured samples have adegree of uncertainty or missing data. Given two measured samples and a maximumspeed the space time prism for the object can be computed. [118] present a model forindexing and propose algorithms for processing spatio-temporal range queries for roadnetwork trajectories. In [119], the authors use the uncertain trajectory model for solvingthe spatio-temporal range queries where the location and time information is uncertain.The focus is on range queries which retrieve moving objects that were inside the givenregion during a time interval. They use the uncertainty model where the whereaboutsin-between two known locations are bounded by beads and the uncertain trajectory rep-resented as a sequence of beads is called a necklace. They propose pruning strategieswhich represent beads by vertical cylinders and approximate them to circles to prunebeads which certainly do not intersect with the given region. However, in our case boththe passengers and drivers are static as they are negotiating potential ride share can-didates before actually starting the trip. In every round of negotiation the query is toretrieve the list of passengers (who are not moving) whose start and destination pickup points are within the driver ellipse. In addition, the time information has not beennegotiated yet and the search space is limited by the fixed pick up points within thepassenger and driver regions. The pruning strategy only needs to consider the overlapof passenger pick up points within the given source and destinations regions with thedriver’s set of pick up points to find the potential ride share candidates. Therefore, thesophisticated algorithms for spatio-temporal range queries for moving objects presentedin [118, 119] are not required to deal with our problem.

[115] extend the concept of space time prisms for road networks. Unlike the un-constrained movement in the 2D plane, the movement is constrained to travel on a roadnetwork. They present an algorithm to compute and visualize space-time prisms forroad networks and present an alibi query to determine if individuals in two trajectorieshave possibly traveled together. However, this computation requires application of theDijkstra algorithm twice on the reduced graph obtained by selecting the edges inside thedriver ellipse. For n road edges this would take O

(n2) time. Implementing this solution

would help reduce the choices of passengers by limiting the next choices of passengersto those on the road network link but the added complexity would make it the bottleneck

132


for our algorithm. Further, since the pick up and drop points of the chosen candidateslie on the road network and Algorithm 3.1 used the reduced network created by thepoints chosen by k candidates, the road network prism computation does not simplifyour existing algorithm.

3.7.2 Optimal Path Computation

There is a large amount of research done in the area of route planning in the presence ofdynamic traffic updates about congestion and travel speeds on roads in the presence oftraffic ([120–124]).

The work investigating time dependent shortest path computation [122] finds thebest departure time such that the total travel time to the destination is minimized, wherethe traffic conditions change from time to time. A time dependent graph is created withvarying edge weights depending on the start time-interval. The authors use a Dijkstrabased algorithm where they decouple the path selection and time refinement. In [123],the authors solve the time dependent shortest path query based on a bidirectional time-dependent A∗ search which improves the computation time and space requirements. Tocompute the time dependent shortest paths, time dependent travel times are computedas a edge cost function for every edge using historical traffic data along with real-timetraffic information. However, the aim of ride sharing is to fundamentally change trafficpatterns leading to reduced congestion and travel times. Therefore, historic data is ofsmaller importance for ride sharing. However, when the ride sharing system is runningfor a period of time the generated historical data using ride sharing can be incorporatedfor the time dependent path computation. In addition, every driver in ride sharing has astart time window and a time budget using which he or she finds candidate passengersto share rides. Once a path has been selected by a driver the change is start time couldmake the current passenger pick up infeasible.

In [125], the authors present a constrained shortest path computation algorithm fork-stops shortest path. A k-stops query retrieves exactly k intermediate stops in the short-est path. The authors explored a similar smallest ellipse based heuristic according towhich the next point in the path is chosen as the one that minimizes the sum of dis-tances between any pair of consecutive points in the path. In [124], the authors presentan incremental route planning algorithm in the presence of frequent traffic updates. To

133


avoid excessive recalculation due to the traffic updates the path is divided by selecting k

intermediate destinations on the way to the final destination. The congested regions arecomputed and their impact propagated to neighboring areas to reduce re-computations.However, unlike this dynamic route planning, the optimal route selection is a staticproblem in ride sharing where the driver needs to pre-compute a route and negotiatewith the passengers to find shared rides. If our approach would include dynamic pathcomputation for the selected shared ride the computation becomes much more challeng-ing as the path has to contain pairs of stops which need to be covered and every stopcan have multiple pairs (different passengers starting from this point can be traveling toa different destination stop). Committing to one passenger involves committing to gothrough another stop-over. In addition, the time constrains of the passengers and driversalso need to be considered.

3.7.3 GPS Traces

There is a substantial amount of literature studying the GPS traces mostly available fromtaxis to study people’s movement and activities. In [126], the authors present a surveyfocusing on the work done using GPS trace analysis for studying the social behavior ofa city, operational dynamics (learn and detect abnormal behaviors) and traffic dynamicsincluding traffic flow, real time traffic indicators and travel time estimates. In [127], theauthors discuss the different types of trace data available from mobile devices, vehicle’sGPS, floating sensors and smart cards in smart cities. They present the research issuesincluding mobility prediction, human behavior and privacy disclosure and protectionduring collection, utilization and publication of the trace data. However, analysis of suchhighly precise spatio-temporal GPS trace data of individuals collected over an intervalof time can lead to the inference of highly detailed information about a person’s beliefs,friends, preferences, religion, health etc. without their knowledge or permission [21,22]. In our privacy based model the TTP does not collect any GPS location informationof the ride sharing vehicles, and matches the drivers and passengers based on the hashof their obfuscated location information.

134


3.7.4 Game Theoretic Approaches

Game theoretic approaches have been used in traffic assignment where the networkusers are considered selfish agents pursuing their own interests. In network traffic rout-ing the goal is to optimize the performance of a congested network by routing trafficsuch that the sum of all travel times is minimized. In [128], the authors consider theroad network is unregulated and the users are behaving in a selfish manner. They areviewed as independent agents participating in a non-cooperative manner. The authorsquantify the inefficiency in such a system and show that if the agents route their trafficselfishly then the network would show a poor performance. They show that the price ofanarchy is bounded, such that the routes at Nash equilibrium have total latency boundedby 4/3 of the minimum latency. Further, game theoretic framework has also been ap-plied in parking slot assignment problems where vehicles (distributed selfish agents) arecompeting for parking slots [129]. In our work the drivers and passengers are selfishagents who want to select the most efficient way to reach their destination within theirchoices for cost, time and other individual preferences. We will look at the congestionbased traffic routing as part of future work to balance the network load once the ridesharing negotiations are done.

3.8 Summary

In this chapter, we presented a dynamic ride sharing model built with the objective ofprivacy. The proposed Match Maker model was compared to the traditional eBay modelin terms of privacy and efficiency. We described an attack model to evaluate the vulner-abilities of the eBay model and our proposed Match Maker model, which came out tobe stronger in terms of privacy. We also developed a rating model to improve reliabilityand to incorporate trust into the system. The experimental results on the Melbourneroad network with user data generated using VISTA and ABS data-sets, demonstratedthe effectiveness and the reduced privacy exposure using the privacy aware dynamic ridesharing models. Our proposed ride sharing model saves between 9− 21% (on average12%) of vehicle km if drivers are only prepared to accept slight detours of their usualtrips. In the city of Melbourne, with 11.6 million trips a weekday and an average triplength of 10.2 km, this would save 14.2 million km per weekday.

135


136

Chapter 4

Optimal Pick up Point Selection forEffective Ride Sharing

In Chapter 3, we presented the negotiation based Match Maker model for dynamic ridesharing. In this ride sharing model, it was assumed that the pick up points (PuPs) areselected randomly from every suburb based on the population density. To make thesystem more effective, we present a scheme to choose optimal number and locationof Pick up Points (PuPs) that aims to maximize car occupancy rates, minimize traveldeviations for drivers (so that passengers can be picked up collectively) and increasek-anonymity (guaranteed or optimal) for passengers.

4.1 Overview

Designing a ride sharing system has to cope with two main challenges: maximize caroccupancy rates while ensuring privacy and safety of users. This leads to two funda-mental questions about the pick up/drop off points: how many points and where are theylocated? If the number of these points is decreased then privacy increases but it is likelyto increase trip time and reduce occupancy rates. On the other hand, if the number ofpoints is increased then privacy decreases (in particular if we assume that individualschoose their nearest point). There can be two strategies for PuP selection, completelyadhoc and dynamic or fixed. For example, a cab based system operates under the as-sumption that users can find a cab at any time and at any required location. However,

137

Chapter 4. Optimal Pick up Point Selection for Effective Ride Sharing

PersonalTransportC

absPublic

TransportU

ber(R

ideShare

Service)O

urm

odel:R

idesharing

with

PuPsPrivacy

&Safety

•Private

vehicle•

Safe(N

oTracking)

•Trusted

centralauthority

•G

PStracking,

CC

TV

monitoring

•R

egistereddrivers

•Trusted

centralauthority

•Public

stopsso

noin-

formation

aboutrider

tripstart/destination

•N

oTracking

(unlessthe

payment

systemtracks

userinform

a-tion)

•Trip

canbe

trackedto

re-vealorigin

ordestination•

Initialbackground

anddriving

recordcheck

fordrivers

•N

otresponsible

forvehi-

clesordrivers

•Privacy

issuesw

ithuser

tripinform

ationcollection

•R

atingbased

driverfeed-

back

•Trip

trackingonly

revealsorigin/destination

PuPs•

Passengersare

k-anonym

ous

Cost

•H

igh:Fuel,

vehiclem

aintenance•

Toll•

Parking

•Fixed

perkm

costbuthigh

•A

dditionalbooking

costortollcharges•

Different

peakand

off-peakrates

•Fixed

sharedcost

model(low

estcost)•

Variable

pricing:traffic,

weather,

location,type

ofvehicle

•L

owC

ost(typically

40%oftaxi)

•Increased

riders→L

ower

cost(Fewer

PuPsincrease

opportunitiesfor

ridesharing)

•C

omparable

toU

berbut

changesif

serviceis

notrun

bycom

mercialdrivers

Convenience

•Pointto

point•

Anytim

e•

Parkingrequired

•Pointto

point•

Difficultto

getinvery

highand

lowdem

andtim

es

•PuP

toPuP

•Fixed

schedule•

Less

frequentin

nonpeak

hours•

Low

est

•Pointto

point•

Difficultto

getinvery

highand

lowdem

andtim

es•

Providerscan

increaseor

decreaseas

perdemand

•PuP

toPuP

•Few

erPuPs

insparse

re-gions:m

orew

alking

Reliability

•A

lways

available•

Fixednum

berofcabs•

Varies

forw

eek-end/w

eekdayand

time

ofday•

Availability

dependson

location/events

•H

igh(fixed

timetable)

•L

owfrequency

inoff-

peakhours

•M

aynot

arriveon

time

•N

ofixed

number

•B

ookedon

demand

(lowdem

andhigher

pricein-

centive)•

Varies

forW

eek-end/w

eekdayand

time

ofday•

Rating

based

•Increased

riders(Few

erPuPs,

smaller

driverde-

tours,Pups

closerto

pub-lic

transporthubs

(Sec-tion

4.3.5))•

Depends

onuptake

Efficiency

(Benefits

tosociety)

•U

suallylow

occu-pancy

•A

ddsto

congestion

•W

astedem

ptytrips

atlow

demand

•L

ongerw

aitingtim

esathigh

demand

•W

astedtrips

with

lowoccupancy

•L

owestcongestion

•Shared

rides(carpooling)

•D

ecreasescongestion

•Shared

rides(carpooling)

•D

ecreasescongestion

Table4.1:C

omparison

ofTravelModes

138


this strategy can often lead to passengers not finding a ride because a match for a caband a rider is not available. An alternative is fixed pick up locations such as taxi standswhere the chances of finding a waiting cab is much higher. We adopt a similar strategyand present a system which has fixed predefined locations of pick up points (which canbe changed on demand or for load balancing) and aims to maximize the car occupancyrates. In this work we address the problem of finding the optimal number and place-ment of Pick up Points (PuPs) such that it balances optimal coverage and occupancyrates while preserving user privacy.

We address an important area of urban computing by providing an optimal pickpoint selection model for ride sharing that tackles urban traffic congestion by utilizingthe existing resources while ensuring the user privacy and security. In addition, theselected PuP placements can help urban planners to evaluate the impact of choosingdifferent PuPs during different traffic conditions and plan new PuPs with changing urbanpopulation densities.

Privacy is a significant concern in today’s society and this is evidenced by the ob-servation that people using cabs sometimes choose to start/stop a few blocks away fromtheir actual source or destination for safety concerns. This gives them some level of lo-cation privacy (obfuscation by hiding their real location) but they can still be tracked astheir real source/destination is generally within a walking range of the point they chose.This is a key motivation for our work. With our proposed fixed PuPs model, not onlythe users have enhanced privacy (since they are k-anonymous) and safety (monitoredPuPs) but their chances of ride sharing also increase as more drivers are likely to visitthe PuPs with high passenger availability.

In Table 4.1, we present a comparison of the existing transportation modes such asprivate vehicles, cabs, public transport and includes ride sourcing applications such asUber and our proposed model for ride sharing. We compare them on the basis of pri-vacy, safety, efficiency, convenience, cost and reliability. Use of private vehicles offers aconvenient, reliable and safe trip but have additional costs of vehicle maintenance, park-ing inconvenience and tremendous impact on congestion. Ride sharing offers a reliablemode of transport (with drivers increasing/decreasing according to demand unlike cabs)which can reduce congestion but has significant privacy and safety concerns. On theother hand public transport provides a private, safe and cost effective mode of transportbut lags in convenience and efficiency. The model we propose is similar to public trans-

139


port in terms of privacy (with PuP to PuP transfers) while being flexible with respect toplaces, time and ride sharing in tandem with lower cost. This model can be integratedwith existing ride sourcing models such as Uber and help to attract more drivers andpassengers.

The main idea of a Pick up Point (PuP) is that every PuP defines a circular catchmentarea centered at the PuP. The radius of the circle is the maximum distance any individualwithin the catchment area needs to travel to reach the PuP. A point on the map is 1-covered if it is inside the catchment area of at least one PuP. The aim is to ensurethat every point of a city’s area is covered by at least one PuP while minimizing thetotal number of PuPs. 1-coverage is well-studied in geometry and mathematics as theoptimal way to pack an area with minimum number of circles of a given radius [130].However, the geometric version assumes that all circles are of the same radius anddoes not distinguish between different population densities of the areas being covered.In practice, a PuP can only accommodate a certain number of people based on thepopulation density of the area it is covering. For a road network we propose to havecircles of different sizes where the radius is determined by the population density of theregion it covers. Another challenge is the limitation of the road network where the PuPscan only be placed at certain nodes and not every point on the map is a candidate to placethe PuP. If there are only two PuPs on the map then everybody reports to one of themto start and the other one is destination. This would mean very high privacy as nothingis learnt about an individuals start and destination location but there is zero utility interms of ride sharing as very few people might be able to take such trips. On the otherhand, if every road network intersection is a possible PuP then there is little privacyas individuals exact trip locations are known but it might be very convenient for thepassengers as they can find a start/destination PuP very close to their actual locations.However, since the passengers are scattered throughout the network, not many driversmight be able to pick them up as they might need longer detours to pick passengersleading to a lower utility. We aim to find out the minimum number of PuPs such that theindividual privacy is preserved while achieving ride sharing. Additionally, if PuPs arefixed such that they remain the same at all times, they might add to the congestion andbecome traffic bottlenecks so we need to balance traffic by changing PuPs dynamicallyover the day.

The actual road network can be partitioned into sections (e.g. suburbs) with different

140


Figure 4.1: Maximum PuP Distance (km)

population densities. An area with a higher density population might need more PuPsto cover it. Every PuP covers a circular region such that people in that circle are mostlikely to use this PuP for their ride sharing start/destination points. Our aim is to find theminimum number of these points such that the whole map is at least 1-covered whilestill achieving the maximum possible coverage for the dense, highly populated regions.1-coverage implies that every point in the map has at least one PuP which is reachablefrom it within a given distance (time). Every area of the map has a walkable distancewhich a resident might have to travel depending on the population density of the region.Even for the densest regions we limit the circle radius to 1 km as smaller circles increasethe chances of locating people and reduce their location privacy. As expected people inouter suburbs with lower population density might need to travel more to reach a PuP; aperson living in a distant suburb might need to travel 5 km to reach a PuP for effectiveride sharing. As shown in Figure 4.1, as we move further from the city center towardsthe lower populated outer suburbs the walkable distance increases. This is expected foran effective ride sharing system; otherwise the car occupancy rates are likely to be low.

The need for anonymity motivates our coverage model where an individual is k-anonymous if at least (k−1) other people could also choose the same PuP. Since we havedifferent circle radii for suburbs based on their population density, every PuP coversroughly the same number of people. Therefore, the anonymity achieved by the PuPs forthe number of individuals covered remains the same. The opportunities for ride sharingnow depend on the number of PuPs available to a user within her walkable distance and

141


Figure 4.2: User Location Estimation based on PuP Selection

the number of drivers likely to travel to these selected PuPs.

Our problem is a multi objective problem where we aim to maximize coverage andprivacy provided by the system to its users while facilitating ride sharing. A largernumber of PuPs would increase the coverage provided to the users but reduce the privacyif the reachable PuP information for every user is available. If an adversary is aware ofthe possible choice of PuPs a user has considered then the intersection of the catchmentareas of these PuPs leads to an estimate of the individual’s location on the map. Forexample, as shown in Figure 4.2, if user considers all 4 PuPs then the user area can berefined to area A. We assume that drivers have constraints on the amount of time bywhich they want to reach their intended destination which defines their travel budget.In this travel budget they look for possible passengers to share a ride on the trip. Asolution with more PuPs gives a passenger more options to choose but might reduce theride sharing possibilities as the driver might need to travel more to pick up passengersthereby exhausting the driver’s travel budget. Similarly too few PuPs might also hinderride sharing by either requiring drivers and passengers to travel longer distances to reacha PuP or by making it impossible to share a ride within a given time budget. Our aim isto find an optimal number and placement of PuPs such that it maximizes coverage whilefacilitating ride sharing with minimum number of possible PuPs selected.

GRASP (greedy randomized adaptive search procedure) is a multistart multiheuris-tic technique where in each iteration a feasible solution is found and then its neighbor-hood is investigated in the second phase, called local search to find a better solution[131]. The best overall solution is saved as the result after every iteration. GRASP hasbeen successfully used for for combinatorial optimization problems. GRASP providesus an optimization framework and we have tailored it to develop appropriate procedures

142


to find Pareto front solutions of optimal PuP placements. We have implemented optimalcoverage solution based on suburb population density which does not guarantee pri-vacy. In the second solution we guarantee k-anonymity and show its cost on coverage.We compare their impact on ride sharing.

In this chapter we make the following contributions:

• We propose a partial coverage model with different circle radii based on suburbdensities.

• We develop models enabling privacy (k-anonymity) with optimal coverage. Wepropose a Voronoi diagram based guaranteed k-anonymity solution and compareit with the optimal coverage solution for effective ride sharing.

• We conduct extensive experiments to validate the effectiveness and efficiency ofour proposed approach.

To the best of our knowledge, we are the first ones presenting a model with fixedpredefined locations for ride sharing which guarantee anonymity while ensuring higherchances of ride sharing.

The remainder of this chapter is organized as follows: Section 4.2 describes the fun-damental concepts of our system. In Section 4.3 we present the system architecture anddescribe and compare two models for dynamic ride sharing. We present the experimen-tal results in Section 4.4. The related work is described in Section 4.5. We conclude inSection 4.6.


In Section 3.2, we presented the fundamental concepts for ride sharing. In this section,we extend the terminology (Table 4.2), basic concepts and system goals for optimal PuPselection.

4.2.1 Terminology

A road network is a directed graph G = (N,E) where N = (n1,n2, . . .nm) is the set ofnodes and E = (e1,e2, . . .en) the set of connecting directed edges. The set of nodes in

143


Table 4.2: Terminology

Notation DefinitionPi Partition/Region of the mapPi.pd Population density of partition PiPi.MPD Maximum PuP Distance for PiPi.dr Normalized population density of par-

tition PiPuPk.r Radius for PuP PuPkPuPk.c Center of PuP PuPk

the road network is divided into disjoint partitions (Pi), each representing a region of theroad network. Every partition has a known area and population density (pd) which isthe number of people per unit of area.

Pi = (N′i ,E′i ) : N′i ⊂ N;E ′i ⊂ E;

In contrast to typical road network we reduce the graph for all 2 degree nodes but wekeep the original weights (distance/time) between the reduced nodes. Many nodes arecreated for a single road link to maintain the geometry of the underlying road network.These nodes only reflect the curvature of roads but do not contribute any additional in-formation for network analysis purposes. We collapse these nodes to reduce the networksize while preserving the original distance and travel time measurements. For example,a long roundabout has many segments and many nodes to represent the geometry butin reality it is a 2n node bi-directional network (if n roads are connected to the round-about). In the reduced network, the nodes represent the intersections of the real roadnetwork and edges are the connecting road segments associated with the correspondingpair of nodes.

A Pick up Point PuP is a fixed point, c, on the road network where the drivers canstop and pick up passengers. In our work, PuPs are road network intersections whereevery PuP is a node in the graph. Every PuP has an associated radius, r, which definesthe circular catchment area and the circle is centered at c. The set of possible PuPs is asubset of all nodes in the road network (

{∪n

i=1PuPi}⊂ N).

144


A B

C D

Figure 4.3: k-Anonymity for PuP (Colored in Red): Number of individuals in its catch-ment area.

4.2.2 Concepts

4.2.2.1 Anonymity

An individual is k-anonymous if at least (k-1) people could choose the same PuP forride sharing. We use the concept of spatial k-anonymity [44] in our work where theexact user location is replaced within an anonymized spatial region that contains at leastk-1 other users. Therefore, an attacker can only pinpoint the user location with 1/kprobability. Every suburb in the map has a known population density; having circles offixed radius would imply that every person needs to travel an equal distance to reach aPuP. This implies that the number of individuals within the catchment area of PuPs willdiffer depending on the population density of the suburbs, thereby achieving different k-anonymity levels for suburbs. For sparser regions this could mean that k is very small orthe in the worst case equal to one. This indicates that the privacy of individuals in sparsepopulated regions could be at risk as they can be easily identified by their selection ofPuP. Other than privacy, the motivation behind the aim to achieve similar levels of k-anonymity for the map is to enable effective ride sharing. Having more PuPs in thesparser regions would mean that it is unlikely that many people would be traveling tothese locations and therefore the chance of a passenger to be picked up within her timebudget is very low. Therefore, same circle radius would ensure that everyone travels asimilar distance to reach a PuP but would not help in ride sharing and compromise userprivacy.

To achieve similar k-anonymity for every individual we calculate the circle radiusaccording to the population density of the region. Hence, individuals in sparse regionsmight need to travel more to reach a PuP as compared to those in denser regions (Figure4.1). We can compute the k-anonymity achieved by a PuP given the radius and resident

145


population density of the regions it covers by computing the area of the covered regions.In Figure 4.3, the PuP is placed at the intersection of the suburbs A,B,C and D withradius of 1, 2, 2 and 3 km respectively. The PuP covers 8% of suburb A, B and C, and10% of suburb D. If the resident population of each suburb is 10000 then k-anonymityfor the PuP is 3400 ((0.08+ 0.08+ 0.08+ 0.1)×10000).

4.2.2.2 Maximum Pick up Point Distance (MPD)

The Maximum PuP Distance, MPD, for a partition Pi is the maximum travel distancean individual in that partition needs to travel to reach a PuP. It defines the circle radiusfor the partition required to achieve a given k anonymity. We compute the minimumand maximum population density from the suburb resident population density data andcompute the normalized density range, dr of each partition between a and b. We set b

to 1 and a to the ratio of the maximum and minimum population density.

x =Pi.pd−min{Pj.pd| j ∈ P}

max{Pj.pd| j ∈ P}−min{Pj.pd| j ∈ P}(4.1)

Pi.dr = (b−a)x+ b (4.2)

In our experiments the maximum population density is 7892 residents per km2 and min-imum is 161 residents per km2, therefore Pi.dr is in the range of [0.02,1]. The MPD ofthe partition is computed as log of population density inverse. Since the denser suburbswill have Pi.dr close to 1, the MPD will be almost a few hundred meters. A very smallMPD would compromise the user privacy as the PuP selection would result in a verysmall area for the user location. We fix the minimum radius (MR), which is the mini-mum MPD for any suburb such that the computed MPD for any suburb is not less thanMR.

Pi.MPD = max(MR,− log(Pi.dr)) (4.3)

In our studies, the minimum radius (MR) is chosen as 1 km, this usually takes anaverage person 15 minutes on foot.

146


Figure 4.4: Equilateral Triangle Network with Circle at Every Vertex. X = R√

3

4.2.2.3 Coverage

It is a known result that the best way to pack a region with circles of a given radiusR (such that every point in the region is covered by at least one circle), is to have thecenters of the circles on the vertices of the equilateral triangle network covering theentire region such that every side X , of the triangle measures X = R

√3. This will lead

to the minimum number of circles with minimum overlap between the circles [130].This equilateral triangular arrangement achieves 1-coverage for the region(Figure 4.4).

Since every partition of the map has a different population density it needs a circleof different radius which implies that individuals in different partitions might need totravel different distances to reach a pick up point.

A point, pi, on the map is 1-covered if it is inside the catchment area of at least onePuP, PuPk, within the maximum pick up point distance MPD defined for the particularpartition, Pj, of the map that the point lies in (pi ∈ Pj).

∃k ∈ PuP : dist(pi,PuPk.c) ≤ Pj.MPD

1-Coverage is achieved for the region when every point on the map is covered byat least one PuP. Coverage for the map is defined as the average of the number of PuPscovering every partition of the map.

4.2.2.4 Partial Coverage

It might not be feasible for a map to be 1-covered due to the limited choice of availablePuPs of the road network. An individual might be required to walk longer than the MPDto reach the nearest PuP. The partial coverage is computed for all points in the networkwhich are not 1-covered. A point, pi, on the map is partially or x-covered (x ∈ [0,1])

147


PuP1

PuP2

PuP3pi

Figure 4.5: Walkability: All three PuPs are within the maximum pick up point distancefor passenger pi. The walkability is the distance to the nearest PuP which is distancefrom pi.to to PuP3.c. However, during the ride sharing process the passenger might beable negotiate a ride successfully with a driver traveling to PuP1 changing the actualtravel distance.

if the distance to the nearest PuP, NstPuP(pi), is more than the maximum pick up pointdistance MPD defined for the particular partition, Pj, of the map that the point lies in(pi ∈ Pj). The coverage of the suburb is the mean of the partial coverage of all the pointsin the suburb. The partial coverage of a point pi is computed as:

pi.x =

Pj.MPD

dist(pi,NstPuP(pi)), if dist(pi,NstPuP(pi))> Pj.MPD

1, otherwise.

4.2.2.5 Walkability

The walkability for an individual passenger (pi.w) is the travel distance to reach thenearest PuP.

pi.w = min(dist(pi,NstPuP(pi))), (4.4)

where:

dist(pi,NstPuP(pi))≤ Pj.MPD.

However, all the PuPs within the maximum pick up point distance (Pj.MPD) are

148


Figure 4.6: Voronoi Cells: For the 10 selected PuPs, k-Anonymity of a PuP is thenumber of individuals in the Voronoi cell of the PuP.

potential candidates for ride sharing for an individual. The actual travel distance for aperson is therefore, the distance to the PuP which enables a ride (Figure 4.5).

4.2.3 Location Privacy

The privacy of a user is the inability of an adversary to refine the user’s location. Inour model we limit the minimum MPD to 1 km so that even though the suburb is verydense an adversary should not be able to narrow down the individual’s location beyond1 km. A large number of available PuPs for a user increases the choice for the user butaffects privacy negatively ( if a user always chooses the nearest PuP). If an adversaryknows the possible choice of PuPs a user has considered then the intersection of thecatchment areas of these PuPs leads to an estimate of the individual’s location on themap (Figure 4.2). Also, if every individual travels to the nearest PuP for ride sharing,then the effective k-anonymity offered by a PuP reduces to the number of individualscovered by the Voronoi cell of a PuP (Figure 4.6).

4.2.3.1 Impact on Location Privacy

In a ride sharing system an attacker could be:

• A malicious driver pretending to offer a ride to gain access to the travel informationof a passenger.• A malicious passenger or a group of malicious passengers trying to learn a driver

149


route.• The third party facilitating the ride sharing.• A third party with access to ride share data published offline.

The location information of a passenger exchanged with a potential driver is thestart and destination PuP. Privacy in our system depends on how much of the locationinformation of the user can be refined. The k-anonymity achieved by a passenger alsodepends on the PuP selection strategy. There are different ways the passengers canchoose a PuP.

• Nearest within MPD: Every passenger selects the nearest PuP within the MPD forboth the source and destination. The passengers are only willing to walk within theMPD, if there is no PuP within MPD they take their own vehicle and no ride sharingtakes place.• Nearest: Every passenger selects the nearest PuP for source and destination even if

that means walking longer than the MPD.• Any 2 closest within MPD: Every passenger selects randomly from the 2 closest PuPs

within the MPD. If there is only 1 PuP within MPD then this is same as nearest withinMPD strategy otherwise passenger randomly selects one from 2 closest PuPs.• Any 2 closest: Every passenger selects randomly from the 2 closest PuPs from the

trip origin, even if that means walking longer than MPD.• Maximum trip offering: Every passenger selects a PuP (within MPD) which maxi-

mizes the chances for ride sharing. The PuPs which have the maximum number ofdrivers traveling in passenger direction of travel are selected as source and destinationPuPs.• Random: Every passenger selects a random PuP within the MPD for both the source

and destination.• All: Passengers negotiate with the all the possible candidate PuPs within the MPD.

If a PuP is being selected within the MPD (nearest, max trip offering, random orany 2 closest within MPD) then the potential area the user is coming from is the catch-ment area of the selected PuP. If an adversary is aware of the selected PuP only then thek-anonymity of the PuP is the k-anonymity of the passenger (Figure 4.3). However, ifthe adversary has additional knowledge like the PuP selection strategy and the PuP se-lections then the passenger location information can be further refined. If the selectionstrategy is nearest within MPD then the passenger location area can be refined to the

150


intersection of the Voronoi cell and the catchment area of the selected PuP. If the addi-tional knowledge is the set of PuPs considered by the passenger, then the intersection ofthe catchment areas of the considered PuPs can be used to refine the potential area of thepassenger (Figure 4.2). In case of all strategy where the passenger negotiates with allthe possible PuPs, an adversary can gain knowledge of all the PuPs used for negotiationand use that information to refine the location information of the passenger (Figure 4.2).When the PuP selection is not limited to the MPD, then the privacy becomes higher forthe nearest and any 2 closest strategies. The k-anonymity of the passengers using near-est strategy is the number of individuals in the Voronoi cell of the PuP. The k-anonymityfor the any 2 closest strategy with information of the selected PuP is the combined areaof the Voronoi cell of the selected PuP and all its one hop neighbors. However, if the 2PuPs considered are known then the k-anonymity is the combined area of the Voronoicells of the two PuPs.

4.2.4 Goals

Our goal is to find an optimal number and placement of PuPs such that it maximizescoverage while facilitating ride sharing with maximum privacy for users.

• Maximize Coverage and k-anonymity: Same number of PuPs can lead to differentcoverage and anonymity solutions. We select solutions which maximize coverage forthe same number of PuPs.• Maximize coverage for guaranteed k-anonymity: A large number of PuPs would in-

crease the coverage of the map but reduce user privacy (if all the possible candidatePuPs for the passenger are known or every passenger selects the nearest PuP). Weguarantee minimum k-anonymity for every selected PuP and maximize coverage byplacing the maximum number of PuPs while guaranteeing minimum k-anonymity.• Maximizing car occupancy: Different PuP placements can have different impact on

ride sharing. The solution which takes into account the traffic flow by grouping peo-ple with similar destinations together is likely to have a better ride share solution thusimproving car occupancy rates. We take this into account by accommodating the tripinformation, origin destination matrix into the solution.

A Pareto front or Pareto solution set is the set of solutions out of the all the feasiblesolutions for the multi objective criterion where every solution on the front is better (or

151


Algorithm 4.1 GRASPinput : maxIterations,seed,αoutput: bestSolution

bestSolution← φ for i← 1 to maxIterations dosolution ← GreedyRandomizedConstruction(seed,α) solution ←LocalSearch(solution) bestSolution←U pdateBestSolution(solution)

return bestSolution

at least as good) than all other feasible solutions. The solution set is therefore a Paretofront with optimal coverage for different number of PuPs. The Pareto front solutionsare compared for their impact on ride sharing.

4.3 PuP Selection

4.3.1 Greedy Randomized Adaptive Search Procedure

GRASP (Algorithm 4.1) is an optimization framework for combinatorial problemswhich requires the development of appropriate procedures to solve the particular op-timization problem. It is an iterative procedure consisting of two phases [131, 132].The first phase is a greedy randomized construction phase where a feasible solution iscreated (Line 1). The second phase is a local search phase where the neighborhood ofthe solution created in phase one is investigated to find a local minima (Line 4). Thebest overall solution is kept after every iteration (Line 5). It is best suited for combina-torial optimization solution where a large number of possible solutions are investigatedto find the optimal solution.

Algorithm 4.2 is the pseudo code for the randomized construction phase. It is agreedy randomized algorithm where the parameter α ∈ [0,1], controls the degree ofrandomness. The algorithm is probabilistic as it randomly chooses one of the best can-didates from the solution set and not necessarily the best one. A value of α = 0 makesit fully random and α = 1 a completely greedy approach. The list of best candidatesis the restricted candidate list (RCL). In line 7, the RCL is computed which is the listof all candidates with cost below a required threshold (according to α). Every iterationwill produce a different result as a candidate is randomly chosen out of this RCL (line8,9).

152


Algorithm 4.2 Greedy Randomized Constructioninput : seed,α , Initialize(),cost(),Update()output: solution

solution← φ C← Initialize() /* Initialize Candidate set C */foreach e ∈C do

c(e)← cost(e) /* Evaluate cost for Candidate set */

while C 6= φ docmin← min{c(e)|e ∈C} cmax← max{c(e)|e ∈C} RCL← {e ∈C|c(e) ≤ cmin +

α(cmax− cmin)} /* Create Restricted Candidate List */x← Random(RCL) /* Select random element from RCL */solution← solution∪x C←Update(C,solution) /* Update C according to

the current partial solution */foreach e ∈C do

c(e)← cost(e) /* Recompute cost for C */

return solution

The solutions generated by a greedy randomized construction algorithm are notguaranteed to be optimal with respect to simple neighborhood definitions. A local searchphase therefore attempts to improve the constructed solution by iterative looking for abetter solution in the neighborhood of the current solution. Algorithm 4.3 is the pseudocode for the local search phase of GRASP which evaluates the neighborhood of the cur-rent solution until no better solution can be found. We adapt GRASP for our problemand present the algorithms.

Algorithm 4.3 Local Searchinput : solution,Neighborhood()output: solution

while Optimal(solution) 6= true do /*Solution is not locally optimalFind solution′ ∈ Neighborhood(solution)|cost(solution′) < cost(solution)solution← solution′

return solution

There is a significant amount of work done in GRASP and substantial enhancementshave been done exploiting parallelism, path relinking, and proximate optimality princi-ple [131–133]. GRASP has been extended to tackle multi objective problems like themulti-objective knapsack problem [134], quadratic assignment problem [135] et cetera.

153


4.3.2 Our Model

In this section we present the algorithms (4.4, 4.5, 4.7, 4.6) for the optimal PuP selec-tion problem. Our problem is a multi objective problem where we want to maximizecoverage while minimizing the number of PuPs for maintaining privacy. Instead of onesolution at the end of the procedure we compute a Pareto front of best possible solutions.In every iteration of the procedure GRASP, we compare the solution with the solutionshaving the same number of PuPs and keep the one with the best coverage.

Algorithm 4.4 Optimal Coverage GRASPinput : maxIterations,seed,α ,computeCoverage()output: BSS

BSS← φ for i← 1 to maxIterations dosolution ← GreedyRandomizedConstruction(seed,α) value ←computeCoverage(solution) solutionSet ← LocalSearch(solution,value)BSS← UpdateBestSolutionSet(BSS,solutionSet)

In Algorithm 4.4, line 3 computes the solution in the greedy randomized construc-tion phase which is a selection of PuPs from the complete set of possible PuPs. Thecorresponding coverage for this solution is computed. Local search phase (line 5) itera-tively explores other solutions in the neighborhood of the current solution and generatesa set of solutions with different number of pick up points and coverage. Every solu-tion in this solution set is compared to the solutions in the existing set of solutions. Asolution s, is stored in the best solution set BSS, only if it dominates all other selectedsolutions so far. A solution x1 is said to dominate another solution x2, if both of theconditions (1 and 2) listed below are true:

1. The solution x1 is no worse than x2 in all objectives.

2. The solution x1 is strictly better than x2 in at least one objective.

This implies that s has better coverage than all solutions with PuPs greater than equal tothe number of PuPs in s.

s ∈ BSS | ∃s′ :

((NumPuPs(s′) ≤ NumPuPs(s))∧ (coverage(s) > coverage(s′)))

154


Algorithm 4.5 Optimal Coverage Greedy Randomized Constructioninput : seed,α , possiblePuPs, cost(), mapCovered(),InsideXRadius()output: solution

solution ← φ consideredSet ← φ C ← Random(possiblePuPs) /* Select arandom PuP from possible PuPs */

foreach e ∈C doc(e)← cost(e) /* Evaluate cost for Candidate set */


α(cmax− cmin)} x ∈ Random(RCL) /* Select random element */solution = solution∪ {x} C ← C−{x} C ← C ∪ InsideXRadius(x) /* Find

all PuPs inside X − R radius of PuP x which cover atleast onenew point */

foreach e ∈C doc(e)← cost(e) /* Update cost for Candidate set */

consideredSet = consideredSet ∪C /* Update considered PuP set */if (C == φ )∧ (consideredSet 6= φ ) then

C←C− consideredSetif mapCovered() 6= f alse then

C← φ /* 1-coverage, no more PuPs needed */

return solution

155


The local search phase for our one coverage model (Algorithm 4.6) searches in the1-neighborhood of every PuP selected and aims to replace the current PuP with a onewhich increases the coverage. One pass of local search can produce multiple solutionswhich are then compared against the solution set already saved as the best solution setin Algorithm 4.4.

Algorithm 4.6 1-Coverage Local Searchinput : solution,Neighborhood(),coverage()output: solutionSet

solutionSet = φ forall the x ∈ solution dosolution′ = solution−{x} RCL = Neighborhood(x) forall the y ∈ RCL do

solution′ = solution′∪{y} if coverage(solution′) > coverage(solution) thensolutionSet = solutionSet ∪ solution′

return solutionSet

4.3.3 Optimal Coverage Solution

In the optimal coverage solution the information that is used to find the Pareto frontsolutions is the population density of every suburb and the number of nodes in thesuburb polygon. We limit the candidate PuPs by the k-anonymity achieved by each PuPwith the aim to achieve a certain minimum k-anonymity.

A node is covered by a PuP if the PuP is within the MPD of the suburb Pj, that thenode belongs to. The cost of selecting a PuP x, is the number of nodes covered by thePuP.

ni ∈ covered(PuPx) | dist(ni,PuPx.c) ≤ Pj.MPD

cost(PuPx) = |covered(PuPx)|

The map is covered (mapCovered function in Algorithm 4.5) if all the nodes in the roadnetwork are within the catchment area of at least one selected PuP.

mapCovered() = true|∀n : (∃PuP : ni ∈ covered(PuPx))

In optimal coverage greedy randomized algorithm (Algorithm 4.5), the candidate set C

156


is initialized to a random PuP from the list of all possible pick up points (Line 3). Afterselecting a PuP, the PuPs in the X−R (X = R

√3) neighborhood of the already selected

PuPs constitute the candidate set (Lines 13). If all the options have been explored andstill there are points/regions which need to be 1-covered then we start with the wholecandidate set C again (Lines 16-22). 1-neighborhood (Algorithm 4.6) is all PuPs whichwere not selected in the X radius of the PuP.

In the optimal coverage solution the information that is used to find the Pareto frontsolutions is the population density of every suburb and the number of nodes in the suburbpolygon. We limit the candidate PuPs by the k-anonymity achieved by each PuP withthe aim to guarantee a certain minimum k-anonymity.

A node is considered covered by the PuP if the PuP is within the maximum pick uppoint distance computed for the suburb Pj, that the node belongs to.

ni ∈ covered(PuPx) | dist(ni,PuPx.c) ≤ Pj.MPD

The cost of selecting a PuP x, is the number of nodes covered by the PuP.

cost(PuPx) = |covered(PuPx)|

The map is considered covered (mapCovered function in Algorithm 4.5) if all the nodesin the road network are within the catchment area of at least one selected PuP.

mapCovered() = true | ∀n : (∃PuP : ni ∈ covered(PuPx))

In optimal coverage greedy randomized algorithm (Algorithm 4.5), the candidate setC is initialized to a random PuP from the list of all possible pick up points (Line 3). Afterselecting a PuP, the PuPs in the X−R (X = R

√3) neighborhood of the already selected

PuPs constitute the candidate set (Lines 13). If all the options have been explored andstill there are points/regions which need to be 1-covered then we start with the whole

157


candidate set C again (Lines 16-22). 1-neighborhood (Algorithm 4.6) is all PuPs whichwere not selected in the X radius of the PuP.

4.3.4 Guaranteed k-Anonymity Solution

This solution guarantees k-anonymity recorded by every PuP. Assuming people travelto their nearest PuP, the k-anonymity of the Voronoi cell [105] of a PuP determines thek-anonymity achieved by an individual in the cell. In this solution the PuPs are placedsuch that every PuP has almost the same number of people (with a guaranteed minimumkG) covered by its Voronoi cell. After placing every new PuP the k-anonymity of eachnew Voronoi cell is measured and the PuP is added to the solution if the minimumguaranteed k-anonymity is achieved by every existing PuP in the solution set. PuPsare selected as long as no more can be added to the solution without violating the k-anonymity condition. The RCL contains the PuPs in the Voronoi cells which have k

greater than the guaranteed anonymity (kG). After every iteration a PuP is placed in thelargest possible Voronoi cell which ensures kG for the current solution set. Algorithm 4.7is the greedy randomized construction algorithm for this solution. The candidate setis initialized to the set of possible PuPs which have k-anonymity (Figure 4.3) greaterthan the guaranteed k-anonymity (Line 2). The k-anonymity of the every PuP in thecandidate set is the initial cost of the PuP (Line 3-4). A PuP is selected at random fromRCL and is included in the solution if after including this PuP the Voronoi k-anonymity(Figure 4.14) of every PuP in the solution is greater than or equal to kG (Line 10-11). TheVoronoi cost of every candidate PuP is updated to the ratio of the Voronoi k-anonymityof the PuP and kG (Line 14). This ensures that the PuPs in the larger Voronoi cells havea higher chances of selection in the RCL. Finally, the PuPs in candidate list which are inVoronoi cells with k-anonymity very close to kG (determined by β which we set to 1.2)are removed from candidate list as selecting them will most likely reduce the size of thecurrent Voronoi cell and make the k-anonymity of the cell smaller than the guaranteedk-anonymity. 1-neighborhood (Algorithm 4.6) computed in the local search phase for aselected PuP is the closest PuP which was not selected within the Voronoi cell.

158


Algorithm 4.7 Guaranteed k-anonymity Greedy Randomized Constructioninput : seed,α ,β ,kG, possiblePuPs, InitCost(), VoronoikAnon()output: solution

solution← φ C← Initialize(possiblePuPs,kG) /* Select candidate PuPs frompossible PuP set with k > kG */

foreach e ∈C doc(e)← InitCost(e) /* Initialize cost = k -anonymity */


α(cmax− cmin)} x ∈ Random(RCL) /* Select random element */if MinimumVoronoikAnon(solution,x) ≥ kG then

solution = solution∪{x}C←C−{x} foreach e ∈C do

c(e)←U pdateVoronoiCost(e) /* Update cost for Candidate set */if c(e) ≤ β then

C ← C−{e} /* Remove element from C if voronoi k-anonymityof element is very close to kG */

return solution

4.3.5 PuP Ranking Based Solution

In the real world a passenger might prefer to wait at a PuP which has an option ofalternate public travel (like access to train/tram/cab stop nearby) or some alternate en-tertainment (cafe/shopping center) option to wait for longer durations. Ranking thePuPs according to their closeness to such points of interest would help identify popularPuPs which can provide access to ride sharing while enabling an alternate option. Thecandidate PuP list is pruned to only include PuPs within x km of the POIs. We com-pute solutions with PuP selection weighted according to the closeness (1 km walkabledistance) to a point of interest. The closer the PuP the higher the rank. Between twoPuPs having similar cost, we choose the one with higher ranking to create the solution.Since the candidate PuPs are limited by their distance to points of interest, some suburbsmight not have a selectable PuP within their MPD leading to partial coverage.

159


4.4 Experiments

4.4.1 Simulation Setup

We perform our experiments on the road network of Melbourne, Australia, which con-tains 142,473 nodes and 280,475 edges. The map is generated using OpenStreetMap1.We perform our experiments using MATLAB. We extract the road network nodes andedges (OpenStreetMap) and keep information such as road type (which determines thespeed limit), connectivity and length of each edge. We have implemented GRASP andride sharing algorithms to compute optimal PuP solutions and evaluate them using ridesharing. In particular, we have used some of the already implemented algorithms in-cluded in the MATLAB library such as graphshortestpath (to find shortest path via Di-jkstra for rider population generation).

The Australian Bureau of Statistics (ABS) provides statistics for Australia by di-viding it smaller blocks called Mesh Blocks2. Each mesh block can be hierarchicallylinked to larger regions called statistical area 2 (SA2).

The Melbourne map is partitioned into 147 SA2 areas which represent gazettedsuburbs. The population density (residents/sqm) and area sqm is known for each suburb.As seen in Figure 4.7(a) the area of SA2’s is in proportion to the population density ofthe suburb, the denser the suburb the smaller its area. The SA2’s are designed such thateach Melbourne suburb has a resident population equal to 10000, with 147 suburbs weare running the experiments for a population size of 1.47 million.

The ABS provides estimated resident population, dwellings, and type of dwellingdata for each mesh block. The number of passengers and drivers for ride sharing aregenerated according to the population and size of each mesh block. The driver andpassenger population is generated according to the population density of the suburbs.We fix a set of 10 suburbs as the destination suburbs ( for example; city suburbs in themorning peak hours) and generate the start points randomly within the source suburbs.We generate 5000 drivers and 15000 passengers. The experiments are conducted forten different simulation runs with different populations of drivers and passengers. Weassume that every driver is willing to travel a little extra to find possible candidate pas-

1www.openstreetmap.org2http://www.abs.gov.au/websitedbs/censushome.nsf/home/meshblockcounts

160

www.openstreetmap.org

http://www.abs.gov.au/websitedbs/censushome.nsf/home/meshblockcounts


(a) Population Density vs Area of SA2 regions (b) Population Density Histogram

Figure 4.7: Population Density for Melbourne SA2’s

Figure 4.8: Area of SA2’s and MPD

sengers. We use this spare time to compute the driver path area ellipse (Section 3.2.2),the time budget of the driver determines the driver ellipse size. We have set this sparetime default value to 25% of the driver? original time making the default driver ellipsesize to be 1.25. The passengers spare time is computed as twice the time to reach a PuPand some additional time for detours with drivers while picking and dropping other pas-sengers. We calculate the passenger spare time by adding to the shortest path time thetime taken to travel the source and destination MPD (assuming 4 km/hr) and some sparetime for detours chosen randomly between 25% and 100% of the actual travel time.

The initial candidate set of PuPs is selected as all the arterial intersections which is5990 intersections for our section of Melbourne map. GRASP is run for 100 iterations

161


and all solutions are compared to compute the Pareto front. The degree of randomnessparameter is set to 0.8 such that out of the ranked candidates anyone can be selectedfrom the top 20%.

In our pre-studies we have simulated the impact of further information in the formof origin destination (OD) matrices. An OD matrix is a “trip table” that displays thenumber of trips going from each origin to each destination at a specific time interval(e.g. morning 7–9am). The origins and destinations represent a partition of the map atsome granularity. We encoded the trip information using the available OD matrix suchthat in every partition (suburb) the trip is marked as the location of the destination/sourcemoving from/to it. We observed that the since the OD matrices are generated accordingto the population density of individual suburbs the OD matrix information becomesimplicit in our optimizations. We have not reported these results as we found that theadditional OD matrix information did not have any significant impact on ride sharing.

4.4.1.1 Maximum Pick up point Distance

Figure 4.7(b) reports the population density distribution for the 147 suburbs. The max-imum population density is 7892 residents/km2 and minimum is 161 residents/km2 inthe sparsest region. The normalized population density range is computed by assigning1 for the densest region and 0.0205 (161/7892) for the sparsest region. The maximumpick up point distance computation as discussed in Section 4.2.2.2 is done by settingthe minimum circle radius (MR) to 1 km. The area of a SA2 is inversely proportionalto the population density, which makes it directly proportional to the MPD. As seen inFigure 4.8, the MPD is proportional to the area and reduces exponentially with the areaof the suburb. Figure 4.1 reports the maximum pick up point distance computation forthe 147 suburbs.

4.4.1.2 Measurements

Coverage: A suburb is 1-covered if the distance to the nearest PuP for any point in thesuburb is less than the suburb MPD. Otherwise, it is computed as the ratio of MPD andthe distance of the nearest PuP.

k-anonymity: The number of individuals in the catchment area of the PuP deter-mines the k-anonymity of the PuP. The catchment area is either the MPD area of the

162


PuP (Figure 4.3) or the Voronoi cell area depending on the PuP selection strategy.

Average Vehicle Travel Distance (V T D): The vehicle travel distance is the averagevehicle distance traveled in the system by the riders. The total distance traveled is sumof the distance traveled by the drivers (including the longer distance traveled by the ridesharing drivers) and the distance traveled independently the passengers who do not geta ride (assuming they take the shortest path to destination).

4.4.2 Dynamic Ride Sharing without PuP Selection

Dynamic ride sharing without any PuP selection implies that the passengers negotiatewith their start and destination location with the driver. This would mean conveniencefor the passengers as it minimizes their travel time to the nearest PuPs. However, itwould significantly impact privacy since even those drivers who are unable to share aride are aware of the exact passenger location. It would also impact the possibility ofride sharing negatively as drivers might need to take a longer detour because they haveto drive as close as possible to passengers pick up locations. Using fixed PuPs allowsa driver to pick up passengers in groups from a few safer and possibly monitored lo-cations. Although fixed PuPs might incur slightly longer travel times for passengers toreach their selected PuP, we will show that it significantly improves the system efficacyin terms of ride sharing. Since the global optimal ride sharing computation with mul-tiple passengers being picked by a driver is NP hard, we limit the solution to a greedysolution where the passengers are aware of the number of drivers crossing every inter-section. A driver can pick up a passenger as long as the passenger’s start and destinationlocation are within the driver ellipse. As the driver specifies the spare travel time, weassume that a third party computes the driver ellipse and marks the intersections withinthe passenger MPD by the number of drivers available on them which can satisfy thepassenger’s trip requirements. For computational reasons we reduce the number of in-tersections to the primary-primary road network intersections which come out to be5990 for our Melbourne map. We set the spare time for every driver to be 25% of theactual time required for the driver trip. If every driver communicates with every possi-ble ride share passenger and vice versa then there is a large communication overhead,a large number of computations are needed to find all possible paths for the driver andalso the privacy of both the drivers and passengers gets compromised as a large popula-

163


Figure 4.9: Number of PuPs within k-Anonymity Range

tion gets to know about the start/destination locations. Since a driver can only pick up 3additional passengers (assuming the car capacity of every driver is 4), we threshold themaximum number of drivers and passengers communicating with each other for the tripfor privacy and computational reasons. In our experiments we set this number to 10.Our results show that occupancy for such a system is 1.18 (Figure 4.13). As expected,the number of shared rides is very small as the drivers need a longer detour and are notable to pick up passengers within the time constraints of both the drivers and the pas-sengers. Another reason for a smaller chance of ride share is more frequent stopping bythe drivers to pick up or drop the passengers and every stop adds a 30 seconds lag to thedrive. However, if we increase the driver ellipse size by increasing the driver spare timeto be 75% of the driver trip time, the occupancy increases from 1.18 to 2.34 as there ismore time for drivers to detour.

4.4.3 Optimal Coverage based PuP Selection

In these experiments we select PuP placements based on the population density ofthe suburb and the candidate PuPs are shortlisted based on the k-anonymity. Thegoal in this selection is to maximize coverage such that every point in the map is1-covered as long as there is a candidate PuP that can cover the point. We eval-uate the Voronoi k-anonymity after the selection of PuPs. We run experiments fork = {10000,15000,20000}. The PuP solutions are compared for coverage and k-anonymity achieved by the PuP Voronoi cells. As seen in Figure 4.9, 97% of the PuPs

164


# PuPs170 175 180 185 190 195

Co

vera

ge

0

0.2

0.4

0.6

0.8

1

(a) k = 10000

# PuPs170 175 180 185

Co

vera

ge

0.9985

0.999

0.9995

1

(b) k = 15000

# PuPs109 110 111 112 113 114

Co

vera

ge

0.8

0.85

0.9

0.95

1

(c) k = 20000

Figure 4.10: Coverage Pareto Front for Population based PuP Selection

have achievable k-anonymity greater than 10000, while only 66.5% of the total PuPshave k-anonymity greater than 15000 and 18% for k-anonymity greater than 20000.Hence, 1-coverage cannot be achieved if we increase k-anonymity to 15000 as there arepoints on the map which do not have a candidate PuP within their MPD for PuP selec-tion. As expected, the coverage decreases when the candidate PuP list is pruned to onlycontain PuPs with achievable k-anonymity greater than 20000. Setting k-anonymitylower than 10000 does not achieve any better results for coverage and k-anonymity asthe same set of PuPs remain the candidates and 1-coverage has already been achievedfor k = 10000. We have studied this for k = 5000 in our preliminary experiments andwe confirm this in our studies.

165


# PuPs100 120 140 160 180 200

Wal

kab

ility

0.6

0.8

1

1.2

1.4

1.6k = 10000k = 15000k = 20000

Figure 4.11: Walkability Pareto Fronts for Population based PuP Selection

As seen in Figure 4.10(c), the partial coverage is 0.86 for k = 20000 which impliesthat on an average individuals need to travel 14% more than their MPD to reach a PuP. 1-coverage is achieved for k = 10000 (Figure 4.10(a)) and k = 15000 (Figure 4.10(b)) andsince the goal is to achieve the maximum coverage, denser areas might have more thanone PuP within the MPD. As expected, the nearest PuP for a point might be much closerthan the MPD for the suburb keeping the walkability typically significantly smallerthan the MPD (Figure 4.12(a)). As seen in Figure 4.11, the walkability Pareto frontshows the average walkability is less than one km for k = 10000 and 15000 because ofmultiple PuPs for most suburbs. However, once we achieve higher k-anonymity levels(k = 20000) the ratio between walkability and MPD is less beneficial and can exceedone because of lower PuP density.

Figure 4.13 shows the occupancy achieved by the optimal coverage solutions fordifferent k-anonymity thresholds. We select one solution from the coverage pareto front(Figure 4.10) of each of the three k-anonymity thresholds and perform ride sharing forthe same set of drivers and passengers. The number of PuPs in the Pareto front solutionsselected are 172, 170 and 109 for k = 10000,15000 and 20000 respectively. As thenumber of PuPs decreases the ride sharing occupancy drops as drivers need to make alonger detour to share rides which makes some of the trips infeasible. As expected, theoccupancy decreases for k = 20000 which has the fewest number of PuPs (Figure 4.13).Also, since coverage is one for k = 10000, passengers are able to reach a PuP withintheir MPD and need to walk less, leaving more spare time to negotiate the trip with thedrivers resulting in higher occupancy rates.

166


Suburbs0 50 100 150

Dis

tan

ce (

km)

0

1

2

3

4WalkabilityMPD

(a) Walkability vs MPD

k-anonymity 0 5000 10000 15000 20000

# P

uP

s

0

5

10

15

20

25

30

(b) k-anonymity Histogram

Figure 4.12: Walkability and k-Anonymity for a PuP Placement on Pareto Front (k =10000)

Occ

up

ancy

0

0.5

1

1.5

2

2.5

3

Dynamic Ride SharingOptimal Coverage (k = 10000)Optimal Coverage (k = 15000)Optimal Coverage (k = 20000)

Figure 4.13: Ride Sharing Occupancy for Dynamic Ride Sharing and Optimal CoverageSolutions (using selected solution from coverage pareto front)

167


Longitude (o)144.8 144.85 144.9 144.95 145 145.05 145.1 145.15

Latit

ude

(o)

-37.95

-37.9

-37.85

-37.8

-37.75

Melbourne (OpenStreetMap)

Figure 4.14: Voronoi Cells for a PuP Placement on Pareto front (k = 10000)

4.4.3.1 k-Anonymity vs PuP Selection Strategy

The actual k-anonymity achieved depends on the selection strategy of the users. Ifany one of the PuP within the MPD can be selected randomly then more PuPs do notreduce the k-anonymity and the k-anonymity achieved is the number of individuals inthe circular catchment area of the selected PuP. However, if everyone travels to thenearest PuP the actual k-anonymity is defined by the Voronoi cell (Figure 4.6) of thePuPs which is lower than the overall required anonymity. The Voronoi cells of oneof the PuP placements is shown in Figure 4.14. Figure 4.12(b) shows the Voronoi k-anonymity range achieved for the PuP placement in Figure 4.14. Figures 4.15 & 4.16report the mean and minimum k-anonymity for the Voronoi cells. Since, the number ofPuPs and coverage is nearly the same for k equal to 10000 and 15000, the mean andminimum k-anonymity remain similar. As expected, since the number of PuPs reduceby 35% for k=20000, the number of people covered by an individual PuP increasesthereby increasing the mean k-anonymity.

However, if the users change the PuP selection strategy to travel to any one of the2 nearest PuPs and an adversary only knows about the selected PuP, then the locationinformation of the user can only be reduced to the combined area of the Voronoi cellsof the selected PuP and that of its one hop neighboring Voronoi cells. Therefore, the

168


# PuPs170 175 180 185 190 195

k A

no

nym

ity

8000

8500

9000

9500

10000

(a) k = 10000

# PuPs170 175 180 185

k A

no

nym

ity

8500

9000

9500

10000

(b) k = 15000

# PuPs109 110 111 112 113 114

k A

no

nym

ity

12500

13000

13500

14000

14500

(c) k = 20000

Figure 4.15: Mean k-Anonymity (Voronoi cell) Pareto Front for Population based PuPSelection

169


# PuPs170 175 180 185 190 195

k A

no

nym

ity

0

1000

2000

3000

4000

(a) k = 10000

# PuPs170 175 180 185

k A

no

nym

ity

1000

1200

1400

1600

1800

2000

2200

(b) k = 15000

# PuPs109 110 111 112 113 114

k A

no

nym

ity

1400

1600

1800

2000

2200

(c) k = 20000

Figure 4.16: Minimum k-Anonymity (Voronoi cell) Pareto Front for Population basedPuP Selection

# PuPs0 50 100

k-an

on

ymit

y

0

50000

100000

150000

200000

250000Nearest PuP2 closest PuPs

Figure 4.17: k-Anonymity with 2 Closest PuP (user can select any of the 2 closest)Strategy

170


k-anonymity achieved is much larger than the required k-anonymity. We compute thek-anonymity of one of the PuP placements (k = 20000) using the nearest and the twoclosest PuP strategies (Figure 4.17). As expected, the k-anonymity achieved by the anytwo closest PuP strategy is much higher (nearly 7 times) than the nearest PuP selectionstrategy. On the adverse side, the walkability for users increases in the any two closestPuP strategy as users might need to travel a little longer to reach the second closestPuP. The walkability for 15000 users using the any two closest strategy for one PuPplacement (k = 20000), increases from 1.3 km for nearest strategy to 1.7 km.

If the PuP selection strategy is to choose any of the two closest PuPs but withinMPD, then an adversary (with knowledge of the selected PuP) can reduce the area ofthe user to the MPD radius of the selected PuP (Figure 4.3). Similarly if the user selectsa PuP randomly from all of the PuPs within MPD then the user location can be reducedto the area of the user to the MPD radius of the selected PuP. However, if the userdecides to negotiate with all of the PuPs within MPD to increase the chances of findinga ride and the adversary knows about the multiple PuP selection then the user locationcan be reduced to the intersection of the MPD radius circles of the PuPs, reducing thek-anonymity (Figure 4.2). Figure 4.19(a) shows the k-anonymity for the different PuPselection strategies for 15000 users. One solution is selected from the Pareto front foreach of the k thresholds of 10000, 15000 and 20000 and k-anonymity is computed for the15000 users with the PuP selection. The difference in the k-anonymity for the nearestand the nearest within MPD strategy for k = 20000 is because of partial coverage as6833 users do not find a PuP within their MPD (Figure 4.18). k-anonymity is highestfor any 2 closest PuP selection strategy as an adversary cannot use the MPD radiuscircle to compute the area the user could be coming from. The combined area of thePuP Voronoi and all the one hop neighbors is much higher than the MPD circle area.Figure 4.19(b) reports the walkability for the same set of users using the PuP selections.As expected the walkability is highest when users are willing to travel outside theirMPD radius for the any 2 closest PuP strategy. Walkability is highest for k = 20000,since there are fewer PuPs and users need to travel longer to reach a PuP as the secondclosest PuP (and even the first closes) might be outside the MPD. Walkability is similarfor the strategies where users travel to PuPs within their MPD, other than nearest.

171


k=10000 k=15000 k=20000

# U

sers

(N

o P

uP

in M

PD

)

0

2000

4000

6000

8000

0 129

6033

Figure 4.18: Number of Users without a PuP within MPD

k=10000 k=15000 k=20000

k-an

on

ymit

y

0

20000

40000

60000

80000

100000

120000NearestNearest within MPDAny 2 closestAny 2 closest (within MPD)Random (within MPD)All (within MPD)

k=10000 k=15000 k=20000

Wal

kab

ility

0

0.5

1

1.5

2NearestNearest (within MPD)Any 2 closestAny 2 closest (within MPD)Random (within MPD)All (within MPD)

Figure 4.19: k-Anonymity and Walkability (km) vs PuP Selection Strategy

172


4.4.4 Guaranteed k-Anonymity PuP Selection

In these experiments we select PuP placements based on the partitioning of the mapinto Voronoi cells which guarantee k-anonymity above a given threshold. The goal inthis selection is to select the maximum number of PuPs such that every Voronoi cellhas roughly the same number of individuals as required by the k-anonymity thresh-old. We evaluate the coverage after the selection of PuPs. We run experiments fork = {10000,15000,20000}. The PuP solutions are compared for coverage and k-anonymity achieved by the PuP Voronoi cells. Ideally, each Voronoi cell should havesame number of individuals. However, there might be some cells which if divided fur-ther would lead to some other cell having less than k people, causing some cells tohave larger than k individuals. Figure 4.20(a) shows the coverage Pareto fronts for thethree k thresholds. As expected, the coverage reduces with an increase in k, as thereare fewer candidate PuPs to choose from and more people report to a smaller numberof PuPs. Figure 4.20(c), shows the minimum k-anonymity which is very near to thek-anonymity thresholds. Figure 4.20(b) reports the mean k-anonymity which increaseswith the decrease in PuPs as the voronoi cells become larger with fewer PuPs increasingthe k-anonymity.

4.4.5 Optimal Coverage vs Guaranteed k-Anonymity

Section 4.4.3 and 4.4.4 report and analyze the experiments for optimizing coverageand guaranteed k-anonymity respectively. In Figure 4.21(a) we combine the solutionsfrom the two sections to compute the k-anonymity vs coverage graph. As expected,k-anonymity decreases as coverage increases since more PuPs are added to increasecoverage which reduces k-anonymity (assuming nearest PuP selection strategy). Sim-ilarly walkability increases with an increase in k-anonymity as fewer PuPs mean usershave to walk longer to reach the nearest PuP (Figure 4.21(b)).

We compute ride sharing occupancy for the envelope solutions (highlighted in redin Figure 4.21(a)) with the highest coverage on the k-anonymity vs coverage curve. Wealso add the ride sharing results for the solutions for dynamic ride sharing with 5990PuPs and increase the PuPs for coverage one solution to 450 and 1500 to measure theimpact of number of PuPs on ride sharing (Figure 4.22). The nearest strategy is used bypassengers to select a PuP. As expected, the occupancy increases initially till 170 PuPs

173


# PuPs40 60 80 100 120

Co

vera

ge

0.7

0.75

0.8

0.85

0.9

0.95

1

k = 10000k = 15000k = 20000

(a) Coverage

# PuPs40 60 80 100 120

k A

no

nym

ity

10000

15000

20000

25000

30000k = 10000k = 15000k = 20000

(b) Mean k-anonymity

# PuPs40 60 80 100 120

k A

no

nym

ity

10000

12000

14000

16000

18000

20000

22000k = 10000k = 15000k = 20000

(c) Minimum k-anonymity

Figure 4.20: Coverage and k-Anonymity Pareto Fronts for Voronoi cell based PuP Se-lection

174


Coverage0.7 0.8 0.9 1

k A

no

nym

ity

5000

10000

15000

20000

25000

30000

(a) k-anonymity vs Coverage

Walkability (km)0.5 1 1.5 2 2.5

k A

no

nym

ity

5000

10000

15000

20000

25000

30000

(b) k-anonymity vs Walkability

Figure 4.21: Pareto Front for the Combined Set of Solutions

# PuPs170 600 1500 3000 4000 5990

Occ

up

ancy

1

1.5

2

2.5

3

Figure 4.22: Occupancy vs Number of PuPs

for increase in PuPs since the coverage improves to 1 till this point. Further increasein PuPs does not increase occupancy, rather the occupancy falls as passengers selectthe nearest PuP which requires the drivers to take longer detours making many tripsinfeasible. Also, the frequent stopping adds a 30 second lag per stop.

4.4.6 Ride Sharing for PuP Selection Strategies

We have compared different PuP selection strategies for their impact on privacy in Sec-tion 4.4.3.1. We now compare them for their impact on ride sharing. We measureoccupancy (Figure 4.23(a)) and the number of vehicles reduced (Figure 4.23(b)) in the

175


PuP selection strategy

Occ

up

ancy

0

0.5

1

1.5

2

2.5

3

No PuP SelectionNearestNearest (within MPD)Any 2 closestAny 2 closest (within MPD)Random (within MPD)Max Trip Offering (within MPD)All (within MPD)

(a) Occupancy

PuP selection strategy

Nu

mb

er o

f V

ehic

les

Red

uce

d

0

2000

4000

6000

8000

10000No PuP Selection

Nearest

Nearest (within MPD)

Any 2 closest

Any 2 closest (within MPD)

Random (within MPD)

Max Trip Offering (within MPD)

All (within MPD)

(b) Number of Vehicles Reduced (20000 Vehicleswithout Ride Sharing)

Figure 4.23: Ride Sharing Occupancy for PuP Selection Strategies

network after ride sharing. We select a PuP placement for guaranteed k-anonymity of20000 from the Pareto front solutions (Figure 4.21(a)) and run the experiments for thedifferent strategies. In this PuP placement coverage is not one and there are passen-gers who do not have a PuP within their MPD. As a result, the occupancy is higher forstrategies when the passengers travel outside their MPD to reach a PuP as this increasesthe number of passengers in the system looking for ride sharing. As expected, the oc-cupancy is the highest when the passengers negotiate using all of the possible PuPswithin their MPD (Figure 4.23(a)). However, the privacy is lowest for this strategy ifthe adversary is aware of all the passenger PuP choices (Figure 4.19(a)). Occupancyis comparable for the other PuP selection strategies, however, there is a trade-off withk-anonymity and walkability.

We evaluate the average vehicle travel distance per traveler in the system after ridesharing using the different strategies (Figure 4.24). The total distance traveled is sumof the distance traveled by the drivers (including the longer distance traveled by theride sharing drivers) and the distance traveled independently by the passengers whodo not get a ride (assuming they take the shortest path to their destination). The to-tal number of travelers in our simulations are 20000 (5000 drivers and 15000 passen-gers). As expected the strategies with higher occupancy rates have lower average ve-hicle distance as the travelers who ride share reduce the overall distance added to the

176


PuP Selection StrategyAve

rag

e V

ehic

le T

rave

l Dis

tan

ceP

er T

rave

ler

(km

)0

2

4

6

8

10

12

No PuP SelectionNearestNearest (within MPD)Any 2 closestAny 2 closest (within MPD)Random (within MPD)Max Trip Offering (within MPD)All (within MPD)

Figure 4.24: Average Vehicle Travel Distance per Traveler

trip. For example, if three passengers traveling 10 km each travel with a driver travel-ing 15 km after ride share, then the average travel distance for the 4 travelers is 11.5km ((10+ 10+ 10+ 15)/4). The average vehicle distance without any ride sharing is12.266 km for our simulation. This shows that ride sharing with PuP selection savesbetween 23-40% (average of 31.5%) of vehicle km if drivers are willing to take slightdetours and passengers are willing to travel to the pick up points. In the city of Mel-bourne, with 11.6 million trips a weekday and an average trip length of 10.2 km, thiswould save 35.952 million km per weekday.

4.4.7 PuP Ranking within Certain Distance of POI

In our experiments we set the POI’s as the train stations in Melbourne. The selected PuPhas to be within 1 km of the train station. This would mean that every passenger has abackup option to take a train in-case the passenger is not able to negotiate a shared ride.Figure 4.25 presents the PuPs limited by their distance to the train station network andone of the selected PuP placement. We vary k-anonymity of the candidate PuPs from10000 to 20000 for our experiments. As expected, the choice of PuPs limits the coverageand partial coverage is achieved (Figure 4.26(a)) since some suburbs do not have aselectable PuP within their MPD (Figure 4.25). Figure 4.26(b) reports the average k-anonymity achieved with nearest PuP selection strategy.

177


Figure 4.25: Melbourne Train Network and Selected PuP Placement

# PuPs50 100 150 200

Co

vera

ge

0.75

0.8

0.85

0.9

0.95

1

k = 10000k = 15000k = 20000

(a) Coverage vs Number of PuPs

# PuPs50 100 150 200

k A

no

nym

ity

6000

8000

10000

12000

14000

16000

18000k = 10000k = 15000k = 20000

(b) k-anonymity (Mean) vs Number of PuPs

Figure 4.26: Coverage and k-Anonymity (Mean) for PuP Selection within 1 km of TrainStations

178


4.5 Related Work

4.5.1 Coverage

The coverage problem in road networks is related to the geometrical disk covering prob-lem where we need to find the smallest number of disks of a given radius r, requiredto cover an area. It is a well known result that the triangular arrangement in which acircle of radius R, is placed at every vertex of the equilateral triangle of side X = R

√3

achieves optimal one coverage. Kershner solved the optimal one-coverage problem in1939 [130].

In our coverage model an individual is considered 1-covered if there is a PuP withinthe maximum pick up point distance, otherwise the partial coverage is computed. Thegoal is to select the minimum number of PuPs that maximize the coverage (1-coverage ifpossible) with certain k-anonymity guarantees. The additional constraints in our modelare:

• Road Network Limitations: A PuP cannot be placed on any point in 2-D space butthe placement has to be chosen from a selected set of possible locations.• Privacy: The PuP placement takes care of user privacy in terms of k-anonymity by

using different travel radius for different suburbs based on their population density.Hence, coverage means different travel distances for different suburbs.• Network Load: The selected PuPs need to be changed periodically to balance the road

network traffic load. Fixed PuPs can be potential traffic bottlenecks as the drivers andpassengers are being directed to them for ride sharing.• Ride Sharing: Further, the best placement depends on the passenger PuP selection

strategy, as the PuP selected for ride sharing might not be the nearest but random orthe best (with most drivers covering that PuP) possible candidate for ride sharing.The aim is to select the minimum number of PuPs that maximize the coverage (1-coverage if possible) with certain k-anonymity guarantees while achieving optimalride sharing.

4.5.2 Public Transport Access Coverage

Public transport access coverage is the opportunity for potential riders to get to a tran-sit service within acceptable (usually 400 m [136]) walking distance. The conflicting

179


objectives of improving public transport service quality while maintaining access cov-erage is one of the problems faced by public transport planners. Service quality can bemeasured as travel time, efficiency, reliability, safety et cetera [137, 138]. Therefore,public transport stop placement has to be such that its efficiency is not much behindprivate vehicle speeds while maintaining access to public transport. Fewer stops wouldincrease speed but decrease access coverage.

In our problem we need to place ride sharing PuPs such that we can achieve op-timal coverage for efficient ride sharing. The service quality is measured using theride sharing occupancy and travel time. However, in our problem too many PuPs notonly impact ride sharing service quality negatively but also reduces privacy. Also, un-like public transport access our coverage distance depends on the population density ofthe suburb, since having the same coverage would not help increase ride sharing rates.Further, privacy is a concern in public transport scheduling/placement/arrival predic-tion/data collection only if the usage data is published or compromised since it is beingcollected by a central trusted authority and not shared with other users [139–141].

4.5.3 Sensor Networks

The coverage based deployment models are widely used in sensor networks to measurethe quality of service of a sensor network. The goal is to ensure that every point ofinterest is within the sensing range of at-least one sensor. The sensor coverage problemhas been studied and reviewed in many surveys [142]. The applications of sensor net-works include fire detection, intrusion detection, battlefield surveillance and industrialmonitoring.

In sensor networks the stationary nodes can be placed deterministically [143] or inbulk at random positions [144, 145]. However, PuP placement for ride sharing is a dy-namic problem where points need to be selected optimally from a fixed set of points andchanged for load balancing or different requirements of privacy and coverage. Replac-ing a point or adding a new point to the existing set of PuPs has implications in termsof infrastructure (such as CCTV cameras) developed around the PuP. Further, sensornodes have energy, cost and power consumption constraints which are not a concern forPuP selection.

Some applications like intrusion detection require more than 1-coverage to achieve

180


a higher degree of accuracy [146, 147] or for added reliability (due to sensor errorsor failures) [148]. A higher degree of coverage known as k-coverage is desirable forsuch applications where every point of interest is covered by at-least k sensors [143]. Inour problem, k-coverage would imply loss of location privacy (if everyone selects thenearest PuP) as the location of an individual can be reduced to the intersection of the k

PuP regions.

In [149], the authors present the partial coverage concept of object detection for sen-sor networks. The conflicting objectives of object detection quality and network life-time are balanced using the notion of partial coverage where sensing fields are partiallysensed by active sensors at any time. In [150], the authors present a partial observabil-ity (coverage) model where information about observed information flows is used toidentify optimal sensor locations. The aim is to infer unobserved links or the links withlower information flow to determine a partial sensor placement with maximal informa-tion flow. Our proposed notion of partial coverage implies that an individual needs towalk longer that the MPD of the starting suburb. A suburb might be partially coverednot only due to the conflicting objectives of privacy and coverage but also due to theavailable candidate PuPs in the road network.

Additionally our coverage model is built to ensure location privacy of users. Privacyissues in sensor networks arise due to the way data is communicated in the sensor net-works, for example in military applications [151]. However, in visual sensor networkssuch as traffic monitoring the way data is collected and stored also poses privacy issues[152].

4.5.4 Facility Location

The facility location problem consists of a set of potential facility (resources) sites wherea facility can be opened, and a set of demand points that must be serviced. The goal isto pick a subset of facilities to open, while minimizing the cost of satisfying the demandpoints with respect to some set of constraints [153, 154].

Single facility location problem (k-center problem) involves minimizing an objec-tive function such as euclidean distance of a new facility from a collection of existingfacilities. A typical example of single facility location problem are placing a new ware-house such that its distance from production facilities and retail outlets is minimized

181


[155].

The maximal covering location problem (MCLP) addresses the problem of locatinga predefined number of facilities in order to maximize the number of demand pointsthat can be covered within a specified time or distance[156]. It does not require thatall demand points are covered. In our problem we locate PuP locations (facilities) froma given set of candidate PuPs such that the overall distance of PuPs to people livingin suburbs is within their MPD. The number of PuPs to be selected is not fixed and theobjective can be optimal coverage or guaranteed k-anonymity. If we drop all the privacy,network load and ride sharing assumptions then MCLP is a special case of our problem.

In [157], the authors introduce the concept of intermediate or partial coverage toformulate a MCLP solution. In classical MCLP, a demand point is not covered by afacility if it is not within a specified critical distance. In the partial coverage modela demand point can be fully covered within the minimum critical distance or partiallycovered up to a maximum critical distance. The partial coverage function is defined as afunction of distance of the demand points from the facilities. In our problem, a point onthe map is partially covered if the distance to its nearest PuP is more than the MPD ofthe suburb. We define partial coverage of a point as a function of its distance from thenearest PuP. However, we do not have a notion of zero coverage or a maximum criticaldistance.

4.5.5 Urban Computing

In [158], the authors introduce the concept of urban computing and its main challenges.They classify the applications areas into seven categories. Our work aligns with theurban planning and transportation systems categories. PuP selection can suggest PuPplacements to urban planners according to the population densities of existing and newplanned suburbs. Further, the POI data can be utilized by urban planners to select PuPswhich ensure a public transport backup or an alternative entertainment option. Ridesharing with PuP selection helps in utilization of existing transport resources efficiently.

182


4.6 Summary

Existing research on ride sharing systems have largely focused on reduction of trafficcongestion. Our work is the first to explicitly include privacy and safety in the designof a ride sharing system. Our scheme has established pick up points which allow usto monitor travel pick ups/drops providing safety to individuals and we achieve highlevels of privacy as fixed PuPs ensure spatial k-anonymity to the users. Further, ourride sharing system provides a good balance in privacy and trip cost. Our system com-bines a reduction in average vehicle travel distance per traveler and reduces travel costssignificantly through ride sharing. In addition, since traffic congestion is non linear, areduction in the number of vehicles on the road network is expected to reduce traveltimes significantly during peak hours. Ride sharing with PuPs may be a more viableoption than public transport and cabs, in particular for passengers who do not own aprivate vehicle, alleviating stress on existing public transport. In addition, our holisticsolution with PuPs ranked on the basis of their closeness to a public transport mode havea tremendous potential to increase the uptake of ride sharing by providing an alternativemode of travel. Our proposed ride sharing with PuP selection saves between 23-40%(average of 31.5%) of vehicle km if drivers are willing to take slight detours and pas-sengers are willing to travel to the pick up points. In a city of size of Melbourne withnearly 4.5 million population and an average trip length of 10.2 km, this would save35.95 million km per weekday.

183


184

Chapter 5

Privacy Aware TrajectoryComputation for Effective RideSharing

In Chapter 3 we presented a dynamic ride sharing system, and in Chapter 4 we enhancedthe model by optimally selecting the number and locations of pick up points. Differentsuburbs have different requirements of the number of PuPs based on the populationdensities. In our experiments we have generated the rider population according to thepopulation density of suburbs. This is a common assumption since traffic flows typicallyfollow the population densities. However, this might not be true for the entire citytraffic and might also change for special circumstances such as organized events. In thischapter, we refine the system to take care of such cases by incorporating the traffic flowinformation into the selection of PuPs for ride sharing.

Traffic demand data is usually collected in the form of origin destination matriceswhich contain the spatial and temporal distribution of traffic. The traffic flows estima-tion requires additional information in the form of trajectories. Since, such aggregatedata (in the form of trajectories and origin destination matrices) is not always available,we present a privacy aware approach model to compute the origin-destination (OD)matrix and the corresponding trajectories.

185

Chapter 5. Privacy Aware Trajectory Computation for Effective Ride Sharing

5.1 Overview

An origin destination (OD) matrix is a “trip table” that displays the number of trips foreach origin to overall destination pairs. The objectives of estimating the trip distributioncould be to replicate the overall spatial pattern of trips, or to account for the spatial sep-aration among origins and destinations (in terms of time or cost), or to make strategicplanning and management decisions for a given transportation network. While trip dis-tribution reveals the proportion of trips between pre-selected sources and destinations,it does not specify the path taken between them.

New types of traffic monitoring such as GPS traces, mobile phone logs, smart carrecords, road camera recordings provide much more detailed data, which can help touniquely identify vehicle trajectories and build OD matrices ([159], [160]). However,long term tracking of vehicles with high spatio-temporal precision can lead to the iden-tification of individual vehicles, which is a threat to the location privacy (Section 2.2).

We propose a model that computes a local transition matrix for every path betweenany two chosen network nodes. The result is a trajectory or path estimation (see Fig-ure 5.4) with the corresponding proportion of trips taken on every trajectory. Everynode in the network has a local transition matrix which is the count information of thenumber of vehicles traveling between its neighbor nodes via itself.

To avoid the privacy threats associated with the identification of vehicles, we intro-duce re-identification of vehicles locally only up to three consecutively traveled nodesto build the matrix. Vehicle re-identification is the process of matching vehicles fromone point on the road network to the next without any unique vehicle identifier. Thistechnique preserves privacy as we do not fully identify the vehicle but use the partialidentifier locally in the network.

The accuracy of a system that only uses local transition matrices as input is typicallynot high. It only captures the movement of vehicles for up to three nodes, which can leadto the estimation of false paths whenever there is a continuous flow of traffic betweentwo end nodes. l-grouping is the average number of groups (classes) into which vehiclescan be labeled by the system to build the local transition matrices. To address this wepropose the concept of k-anonymous l-grouping.

There are several vehicle re-identification techniques, which use ordinary videocameras [161], magnetic sensors [162], or loop detectors [163] to compute the vehicle

186


signature and use that to re-identify the vehicle. In our model vehicles are re-identifiedand labelled into object classes using the vehicle signature. This signature can be formedusing vehicle attributes such as color, type, speed, brand, length or even partial RFID ofthe vehicle.

k-anonymity [3] is a privacy protection model where the information of each in-dividual cannot be distinguished from at least k-1-individuals whose information alsoappears in the system (Section 2.1.1). We record the number of vehicles coming fromevery source at the destinations of the respective vehicle trajectories. Our model enablesk-anonymity for the vehicle drivers as an individual vehicle cannot be singled out withina group of at least k vehicles. k-anonymity of the system is a function of the number ofvehicles and l-grouping.

We utilize this aggregate information to enhance ride sharing by incorporating thisadditional information in optimal PuP selection process. The origins and destinationsrepresent a partition of the map at some granularity. Since the trip patterns differ atdifferent times of day and weekend/weekday traffic patterns, we have accommodatedthe trip information in the map to find PuPs which are more effective for that time. Thetrip information is encoded using the available OD matrix such that in every partition(suburb) it is marked as the location of the destination/source moving from/to it. Theobjective is to group travel to similar destinations (sources) together within a PuP cover-age such that it can provide a higher chance of ride sharing with people going to similardestinations who are picked up collectively by drivers.

In this work we make following contributions:

• We develop a technique using local transition matrices to compute the traffic flowswithout the need of a prior sample or target OD matrix as it is required by a num-ber of Bayesian [164], Generalized Least Square [165] and Maximum likelihood[166] based methods.

• Our method preserves privacy of the vehicle owners by using only partial informa-tion to build the local data structures. We introduce the concept of k-anonymousl-grouping. Our technique is also efficient in terms of communication cost sincethe information is only stored and communicated locally.

• Traditionally, root mean square error (RMSE) ([167], [159]) is used for evalua-tion, but we argue that RMSE is an insufficient measure for OD matrices as it

187


produces artificially low errors. We introduce the well known statistical measuresof precision and recall, along with RMSE to measure the performance of ourtechnique.

• We demonstrate the results by extensive experiments on real as well as syntheticroad networks.

To the best of our knowledge, our approach is the first to locally use partial vehicledata for local data structures and to build a privacy preserving model for trajectory andOD matrix estimation.

5.2 Challenges

One way to monitor traffic is to keep track of each vehicle completely using its fullidentifier (ID) (example, vehicle registration number or RFID) at every node of its trip.Monitoring each vehicle for the whole trip using the full ID to create the OD matrixor trajectory information will result in 100% accuracy. However, this data can be usedto track an individual’s location which could lead to risk of location based attacks (e.g.threats to personal safety, location based unsolicited marketing [21]) and compromiseprivacy. Also, the communication cost between nodes and the amount of data storedat every node increases significantly with the increase in traffic and trajectory length ofevery vehicle.

One approach to preserve privacy is to maintain security of the data collected byusing restricted access. The communication and data maintenance costs are significantfor the model. For a road network G = (N,E) with X vehicles and the average trajectorylength Y . If K is the average number of bytes transmitted per node and L is the averagenumber of bytes transmitted per vehicle, then the communication cost is the number ofbytes transmitted to a central server and the bytes stored at nodes is the data maintenancecost.

Communication cost = O(X .Y .K.L)

Number of bytes stored = O(X .Y .L)

To estimate the trajectories and create an OD matrix we do not need to gather suchprecise information for every vehicle. Our approach preserves privacy by gathering only

188


partial information of the vehicles. A partial ID of the vehicle can be created which isused to track the vehicle across the network. However, if the same partial ID is usedacross the system it becomes a persistent ID which in-turn can also be used to identifya vehicle uniquely.

To overcome the challenge of persistent tracking we propose the transformation ofthe partial ID after every node. This implies that every node re-identifies the vehicleusing the partial ID received and then transforms it to avoid persistent tracking.

To reduce the communication costs we propose to re-identify the vehicles locallyonly up-to three consecutive nodes in the system. Every node communicates the trans-formed partial ID of the vehicle seen only with its immediate single hop neighbours.The information maintained at every node is the number of vehicles travelling via itselfamong its neighbouring nodes.

Our local re-identification technique preserves privacy and reduces communicationcosts but alone cannot achieve high accuracy levels within the system. We proposetwo techniques to improve the accuracy of the system while preserving privacy andmaintaining lower communication costs. The two techniques are l-grouping and k-anonymity which we discuss in detail in the next section.


In this section we describe a model for road networks and explain the basic conceptsand operations of our k-anonymous l-grouping model.

5.3.1 Road Network Model

Definition 5.3.1. Road Network Graph. A road network graph is a directed graphG = (N,E) where N(n1,n2, . . .nm) is the set of nodes and E(e1,e2, . . .en) the set ofconnecting directed edges. The nodes represent the intersections of a real road networkand edges are the connecting road segments associated with the corresponding pair ofnodes.

Every node is represented by its longitude and latitude. Each edge in E is an orderedpair (ni,n j) of nodes in N.

e = (ni,n j)|ni,n j ∈ N

189


Each edge belongs to an edge class which has its corresponding maximum speed limit,length and capacity.

Two nodes are adjacent or neighbours if there is an edge connecting them. Theneighbours of a node ni are the set of nodes which have a connecting edge with ni. Wecan distinguish between incoming neighbours and outgoing neighbours according to thedirection of the connecting edge.

in-neighbours(ni) = {n j ∈ N|e = (n j,ni) ∈ E}

out-neighbours(ni) = {n j ∈ N|e = (ni,n j) ∈ E}

A trajectory T is a path, i.e., a sequence of nodes ni0,ni1 . . .nik if there is an edgeconnecting ni j and ni j+1 for j = 0,1 . . .k− 1 and ni j,ni j+1 ∈ N. The length of thetrajectory T is the sum of the lengths of edges in the trajectory.

5.3.2 Vehicle Re-identifcation and Local Transition Matrix

To estimate the trajectories and the OD matrix the full ID for every vehicle is not re-quired. Our approach preserves privacy by gathering only partial vehicles data. We usevehicle re-identification to build the local transition matrices.

Vehicle re-identification is the process of matching vehicles from one point on theroad network to the next without identifying the vehicle uniquely. We use vehicle re-identification to build the local transition matrices.

Each node has its associated list of incoming and outgoing neighbours and a localtransition matrix.

Definition 5.3.2. Local Transition Matrix. The matrix Mk for a node nk is ain-degree(nk) × out-degree(nk) matrix. The rows and columns represent the in-neighbours and out-neighbours of the node nk respectively. Every entry Mnk

i j in thematrix is the number of vehicles transitioning from node ni to n j via the node nk (Fig-ure 5.4).

A local transition matrix records the turning data of vehicles at every intersection.This data reduces the large search space of possible trajectories between a pair of nodesto a set of possible flows. To build the local transition matrix each vehicle is re-identified

190


n1 n2

n3 n4

n5

n2 n4

n1 0 3

n3 2 0 Mn5 =

Figure 5.1: Local Transition Matrix Mn5

for up to three consecutive travelled nodes. Re-identifying the vehicle for the full tripwithout changing the identifier at every node could be used to track individual vehiclesand result in re-identification attacks [21]. To overcome the challenge of persistenttracking, we propose the transformation of the partial ID after every node. Every nodeneeds to verify the partial ID received to build the local transition matrix and transformit before broadcasting to its neighbours to achieve privacy.

5.3.3 k-Anonymous l-Grouping

Every vehicle with the same partial ID is considered to be equivalent and part of thesame group or class. We now introduce l-grouping which is a bound on the number ofgroups formed in the system. It is a global measure.

Definition 5.3.3. l-grouping. l is the average number of groups (classes) into whichvehicles can be labelled by the system.

Higher values of l imply more groups and finer labelling of vehicles lead to higheraccuracy in re-identifying vehicles. If l is large enough to identify every vehicle, thenthe highest accuracy is achieved. Selecting l provides a trade-off between accuracy andprivacy. For our system, k-anonymity requires the grouping of vehicles such that eachvehicle cannot be distinguished from at least k−1 other vehicles.

We use two approaches to determine k-anonymity. We do not keep any data of theclasses for the trajectories except at the source and destination. As a result, once a

191


vehicle leaves the source its anonymity is the system anonymity which is the numberof vehicles in the system of the same class (label). We also compute anonymity at thetrajectory boundary (source and destination). This depends on the number of vehiclesof the same class at the observation intersection (source node) where the vehicle is seenfor the first time. A system’s boundary anonymity depends on the number of vehiclesin the system and the amount of traffic observed at the source and destination of thetrajectory.

Drivers are more concerned about their trajectory privacy. The system anonymityis an upper bound for their trajectory privacy. Given k trajectories, it ensures that thereare at least k− 1 trajectories with the same label. Boundary privacy is a lower bound,because we only use the end nodes of a trajectory to compute the k-anonymity level. Inpractice the actual trajectory privacy is higher as the system does not record class datafor the nodes in between.

k-anonymous l-grouping. System k-anonymity is a function of the number of vehi-cles and l, i.e., k = f(#vehicles,l). For example, if there are two nodes in network. Noden1 classifies the vehicles into 5 groups and node n2 classifies them into 10 groups thenl = 7.5 is the average.

5.4 Implementation Techniques

We now discuss the implementation techniques required to achieve k-anonymous l-grouping.

5.4.1 Local Transition Matrix

To build the local transition matrix each vehicle is re-identified up to three consecutivenodes travelled. Re-identifying the vehicle for the full trip without changing the identi-fier at every node could be used to track individual vehicles and result in re-identificationattacks [21].

Our model proposes re-identification of vehicles only up-to three consecutive nodes.The reasons to re-identify only up to three consecutive nodes are:

• This information is sufficient for the purpose of OD matrix computation.

192


A

C E

B D

(A, Label-X) (B, Label-Y)

Vehicle

Figure 5.2: Local Vehicle Re-identification

• The communication cost between the nodes is minimal.

• The model is privacy preserving.

• The information storage requirements at every node is small (indegee ×outdegree).

Vehicle re-identification with the same partial ID is not privacy preserving as it actslike a persistent ID going through the network. To overcome the challenge of persistenttracking, we propose the transformation of the partial ID after every node. Every nodeneeds to verify the partial ID received to build the local transition matrix and transformit before telling its neighbours to achieve privacy. We can say that for every incomingvehicle the two tasks performed by a node n j are:

• Verify: Verify the incoming vehicle’s signature as broadcasted by the sendingnode (ns) and acknowledge (to ns).

• Reclassify: Recompute the vehicle signature and broadcast it to its out-neighbours (n j).

Every time a node detects a vehicle for the first time it increases its source count byone and broadcasts to its out-neighbours the identifier of the vehicle (Figure 5.2). NodeA detects a vehicle for the first time, identifies the vehicle as X and updates its sourcecount by 1. Node B identifies incoming vehicle X and sends an acknowledgement tonode A, which now updates its local transition matrix. Node B transforms the partial ID

193


of the vehicle from X to Y and broadcasts to its out-neighbours. Node D acknowledgesY to B which helps B build its local transition matrix by incrementing the count for MB

AD

by one.There are various techniques that we can use to achieve vehicle re-identification and

build the local transition matrices. We now explain two such techniques.

5.4.1.1 Vehicle Re-identification using Vehicle Image Processing

We first discuss a potential technique using Vehicle Image Processing (VIP). A VIPbased system uses video or cameras to catch images of each passing vehicle, and ex-tracts individual vehicle information including vehicle length, width, color, and some-times, vehicle registration numbers. The extracted vehicle features from both upstreamand downstream detection stations are then compared with each other to find the bestmatches. There is a central look-up table which identifies every attribute (colour, type,brand, length etc) to an attribute ID. Every network intersection (node) has a pre-selected attribute which it uses to label the vehicle. When the node finds a new vehicleit broadcasts to its out-neighbours a tuple of three elements. (Source ID, Attribute ID,Label)

• Verify: The receiving node checks the incoming vehicles for the attribute ID itreceived. If the label matches it acknowledges the source and uses this informa-tion to build its local transition matrix.

• Reclassify: The receiving node (RID) has its own attribute (AID2) which isdifferent from the source node. It uses this attribute to label the same vehicle andbroadcasts to its out-neighbours (RID, AID2, Label)

Similar mechanism can be applied to vehicle registration numbers as well by se-lecting partial information. Only a few selected parts of the registration numbers arebroadcasted. The attribute ID then also contains information about which parts are se-lected by the source to label the vehicle.

5.4.1.2 Vehicle Re-identification using RFID Tags

The second technique uses RFID tags to re-identify the vehicles. Many applicationsin transportation networks use RFID sensors ( for example, Electronic Toll Collec-

194


tion(ETC) systems, vehicle identification traffic monitoring systems). Xie et al in-troduced the use of partial RFID bits to re-identify a vehicle for answering aggregatequeries required by traffic monitoring systems [168].

In our system, every node reads the RFID and sends only a few selected bits as thelabel of the vehicle. In this technique there is a central look-up table which has thesource ID and its corresponding bit selections. This table is available to every node inthe network. When the node finds a new vehicle it broadcasts to its neighbours twovalues. (RFID Bits Selected, Label)

• Verify: The adjacent node checks the RFID bits mentioned in the source packetfor every incoming vehicle. If the label matches it acknowledges the source (usingthe look-up table to find the source) and uses this information to build its localtransition matrix

• Reclassify: The receiving node has its own RFID bit selections which are dif-ferent from the source node. It uses these bits to label the same vehicle andbroadcasts to its neighbours (RFID Bits, Label)

5.4.1.3 Discussion

These re-identification techniques preserve privacy of the vehicle owners by using par-tial information but are probabilistic and introduce some inaccuracy.

Every node has its corresponding attribute selection which defines a set of classesinto which the vehicles are grouped by that node. So the number of groups formed bythe node depends on the attribute selected.

For example, in the Vehicle Image Processing technique if the attribute is colour andthe node can identify up-to 8 different colours as the classes then the number of groupsformed at this node is 8. The overall number of groups for the system is then the averageof the number of groups at every node of the network which is the l-grouping.

For the RFID technique, the number of groups depends on the number of bits se-lected by the nodes. If all the nodes select 4 bits out of the RFID, then the group size is16 for the system and l-grouping value is 16.

195


M N O P Q R

4v

2v 3v

Figure 5.3: False Path: A false path between node M and R could be detected becauseof the continuous flow of traffic between the nodes

5.4.2 k-Anonymity

Trajectory estimation using the local transition matrices is done by capturing the move-ment of vehicles up-to three nodes. This partial tracking can lead to the estimation offalse paths whenever there is a continuous flow of traffic between the two end nodes. Asshown in Figure 5.3, 3 vehicles are travelling from node M to node O, 4 vehicles fromnode O to node Q and 2 vehicles from node P to R. The local transition matrix at nodeN is (MN

MO = 3) and at node Q is MQPR = 2. Since there is no additional information

about sources and destinations, the trajectory (M → N → O 99K P→ Q→ R) with 2vehicles can be detected. Although this trajectory represents traffic flow it is a false pathfor OD matrix estimation. To improve the accuracy of our system we propose to keepadditional information at the source and destination nodes of every vehicle’s trip. Thisinformation can be

• The destination node storing the number of vehicles coming from every source.

• Partial Vehicle information stored at the source and destination nodes.

Two techniques to improve the accuracy by keeping additional information at thesource and destination nodes are described below.

There are different types of vehicular communications and networking architecturesincluding Vehicle to Vehicle (V 2V ), Vehicle to Infrastructure (V 2I, e.g. cellular) andVehicle to Roadside (V 2R, e.g. ITS infrastructure ).

V 2I is privacy invasive as the system knows the full identity of the vehicle andalso the communication costs are high in this architecture. V 2V is not very reliable insparse traffic as there could be cases where there is no vehicle close enough to exchangeinformation. Therefore, we propose V 2R communication between vehicles and roadside

196


infrastructure at the source and destination nodes of the vehicle’s path. The source canlabel the vehicle with its own ID which is then read by the destination to store thenumber of vehicles that the destination node has seen coming from the particular source.Hence, every destination node maintains a table of source ID and the correspondingnumber of vehicles that arrived from that source.

We maintain the source table only at the destination nodes with simple count in-formation for every source. Hence, we cannot distinguish a vehicle from at-least theother k−1 vehicles in the source group. The global measure of k-anonymous privacy iscomputed by taking the average minimum bound for the source-destination counts forthe system.

The second technique is to maintain partial RFID of the vehicle at both the sourceand destination. Every node in the network will have a source RFID list and a destina-tion RFID list. The same bits of the RFID are stored across the system. The number ofvehicles between the selected two nodes will be the intersection of the RFID source anddestination list at the selected end nodes. The number of bits selected will decide thedegree of anonymity of the vehicle owners.

5.5 Trajectory Estimation and OD Matrix Estimation

5.5.1 Trajectory Estimation

Our method counts the number of matching vehicles between selected source and des-tination using the local transition matrix of every node. We compute the trajectoriesbetween any two nodes in the network and the corresponding vehicle count for eachtrajectory reported.

In Algorithm 5.1 we start from the destination node and keep computing the differ-ent trajectories until we reach the source or the path ends. The local transition matrixprovides the vehicle count for each link. We keep the minimum vehicle count on thepath as the number of vehicles on the trajectory.

The path could end due to

• We reach a connection where the number of vehicles is zero

• The path length computed till now is longer than twice the fastest path

197


Algorithm 5.1 Trajectory Estimationinput : Source Node ns, Destination Node nd , FastestPath(Source, Destination) - quick-

est path found by dijikstra between the two nodes, Fastest Path Length is thesum of the length of the edges in the path, NodesObjectClassesMatch - thenumber of matching vehicles between selected source and destination

output: Trajectories List TL, Count Array (number of vehicles on every trajectoryfound) CA

if NodesOb jectClassesMatch(ns,nd) > 0 thenfor ∀nad j in ad j(nd) do

1 create new trajectory t(nad j,nd)2 check local transition matrix Mad j for desti-nation nd (Mad j

nd coloumn vector for nd) for every ith source entry (nsi) in thecolumn vector Mad j

nd doif Mad j

nsi,nd > 0 thencount = min(count,Mad j

nsi,nd ) append nsi to trajectory t if count == 0then

remove last added node from t continueif nsi == ns then

sourceCount = ns.getSourceCount count =min(count,sourceCount) if length(t) > 2 ∗length( f astest_path(ns,nd)) then

remove last added node from t continueAdd trajectory t to Trajectory list TL, and count to CA

set ns = nad j set nad j = nsi GOTO Line 2

198


length(computed using dijisktra)

Finally, we report the number of vehicles on the trajectory as the minimum of thematching vehicles at the source and destination and the vehicle count computed on thetrajectory.

We now illustrate trajectory estimation with the help of an example (Fig-ure 5.4). Suppose, the trajectories need to be computed between nodes A(source) andF(destination). In the example shown in figure 5.4, we have one vehicle between nodeA and F on the trajectory (A,K,J,F).

We start from destination node F and check the local transition matrix at the adjacentnode J for destination F (The column vector MJ

F ).

MJF =

(K 2G 0

)This shows two vehicles on the K-J-F link. We now follow this and look at the

matrix at K (MKJ ) for intermediate destination J.

MKJ =

A 1B 1C 0

This shows one vehicle on the A-K-J link. Since we have reached the required

source we report the trajectory formed as (A→ K→ J→ F) with the number of vehi-cles as minimum (MK

AJ ,MJKF) which is one. The path on B-K-J link is rejected because

B is boundary node with no other in-links.

5.5.2 OD Matrix Determination

The OD matrix computation needs to invoke the trajectory estimation (Algorithm 5.1)for every pair of nodes selected. The OD matrix just needs counts, so the vehicle countscan be summed up for the all trajectories reported between the required source anddestination. For example (Figure 5.4)), computation of OD matrix row for node A

199


D E

K 0 1

J 1 0

MI=

I J

A 0 1

B 0 1

C 1 0

MK=

I F

K 0 2

G 1 0 MJ=

1V

1V

1V

1V

Figure 5.4: Trajectory Estimation between A (source) and F (destination)

Table 5.1: OD Matrix Row for Source Node A

Origin DestinationA B C D E F G H I J K

A 0 0 0 0 0 1 0 0 0 1 1

will require the calls to Algorithm 5.1 with the source node A and destination nodesB,C,D,E,F ,G,H, I,J,K. Each call will return the trajectories between the pair of nodeswhich are summed to compute the OD matrix row (Table 5.1).

5.6 Aggregate Trip Information based Ride Sharing So-lution

The coverage based solution selects more PuPs from the densely populated areas andaims to maximize coverage. This optimization would maximize coverage while mini-mizing the number of PuPs but might not achieve effective ride sharing as ride sharingneeds might differ at PuP placements at different hours. Different PuPs might be pop-ular at different times of the day or to accommodate different traffic patterns such asweekday/weekend.

This solution is based on aggregate information of travel stored in the format oforigin destination (OD) matrices. The origins and destinations represent a partition ofthe map at some granularity. Since the trip patterns differ at different times of day andweekend/weekday traffic patterns so we have accommodated the trip information in the

200


map to find PuPs which are more effective for that time.

The trip information is encoded using the available OD matrix such that in everypartition (suburb) the trip is marked as the location of the destination/source movingfrom/to it. OD(suburbi,suburb j) is the vehicle flow from suburb i to suburb j. A pos-sible set of riders R, (passengers and drivers) are now computed using this OD matrixto populate the suburbs. For every generated rider ri, there is a start si, and correspond-ing destination di suburb. The start is identified in the destination suburb at di andmarked with the suburb id of the source and vice verse for destination. Every suburbcontains a set of source/destination points marked with the id of the corresponding desti-nation/source. The objective is to group travel to similar destinations (sources) togetherwithin a PuP catchment such that it can provide a higher chance of ride sharing withpeople going to similar destinations who are picked up collectively by drivers. The costfunction in Algorithm 4.5 computes the number of rider points covered by the PuP whotravel to similar destinations.

The map is considered covered (mapCovered function in Algorithm 4.5) if all theriders points (both sources and destinations) are within the catchment area of at leastone selected PuP.

5.7 Experimental Evaluation

5.7.1 Dataset

We use Brinkhoff’s network-based data generator [169] for the simulation of traffic.The underlying road network is generated from real road networks and the simulatorpopulates them with moving objects or vehicles.

The experimentation has been done on real road network of the city of Olden-burg, Germany and Melbourne CBD, Australia. To control the network propertieswe also generate synthetic square grid road networks with nodes varying from 400to 10000. The Melbourne CBD, Australia map is generated using Open Street Map(www.openstreetmap.org). Table 5.2 displays the number of nodes and edges in theseroad networks.

201


Network Nodes Edges StructureOldenburg 6105 7035 Loops (Highways & Freeways)

Melboune CBD 1408 1645 GridGrid 1600 3120 Grid

Table 5.2: Road Networks

5.7.2 Measurements

Root Mean Square Error (RMSE) for OD matrix estimation is defined as

RMSE(OD, OD) =∑

Xi=1 ∑

Yj=1(Mi j− Mi j)

X×Y(5.1)

OD is original OD matrix M with X sources and Y destination nodes. OD is thecorresponding OD matrix estimated by the model. RMSE produces artificially low er-rors for OD matrix estimation and is unable to capture the deviation from reality for themodel. Unlike previous approaches based on RMSE, we use the well known statisticalmeasures of precision and recall, along with RMSE to validate the performance of ourmodel.

For OD matrices there are four possible cases:

Predicted Value (= 0) Predicted Value (= 1)Actual Value (= 0) True Negative (TN)

(Correct absence ofresult)

False Positive (FP)(Unexpected Result)

Actual Value (= 1) False Negative(FN)(Missing Result)

True Positive(TP)(Correct pres-ence of result)

Table 5.3: Measurements

Precision or the positive predictive value is the proportion of the true positivesagainst all the positive results (positive trajectory counts estimated). It is a critical mea-sure of the performance of our model, as it reflects the percentage of positive predictionsthat are correct.

Precision =T P

T P+FP(5.2)

202


Accuracy is the proportion of correct results (both true positives and true negatives)in the estimation. It is the percentage of the predictions that are correct.

Accuracy =T P+T N

T P+FP+T N +FN(5.3)

Recall is the true positive rate, is the proportion of true positive reported by themodel. It reflects the percentage of positive test cases identified.

Recall =T P

T P+FN(5.4)

In our model, the number of False Negatives reported is zero for all cases, so weonly report the precision and accuracy for our experiments.

5.7.3 Experiments

We have different parameters in our model which can be changed depending on thelevels of accuracy and privacy desired.

Exp. # Vehicles l-grouping Road Network TrajectoryLength

1 5−500 50 Oldenburg, GridG=(1600,3105), MelbourneCBD

Variable

2 500 50 Grid (Nodes 400−10000) Variable3 250 5−250 Oldenburg, Grid

G=(1600,3105), MelbourneCBD

Variable

4 500 50 Grid G=(10000,19800) 4−50Table 5.4: Experiment Settings

5.7.3.1 Effect of Number of Vehicles

First, we evaluate the impact of increasing the number of vehicles on a road network.The first row of Table 5.4, shows the experimental setup. We increase the number ofvehicles from 5 to 500 on the three road networks.

203


0.025

0.03

"Melbourne CBD"

Oldenburg

0.015

0.02

RM

SE

Grid G = (1600,3120)

0.01

0.015

RM

SE

0

0.005

0

50 100 150 200 250 300 350 400 450 500

# Vehicles

Figure 5.5: RMSE vs Number of Vehicles

As expected, we observe a decrease in precision and accuracy with the increase innumber of vehicles (Figure 5.6). Three factors impact the performance:

• l-grouping: l can be computed as a percentage of the number of vehicles. Thelarger this percentage the better is the performance since there are more groupsand finer labelling of vehicles at every node.

• Network size: A larger number of nodes and edges can decrease the overlap ofthe trajectories for the same number of vehicles. Since overlapping trajectorieslead to an increased estimation of false positives, larger networks perform betterthan the smaller road networks.

• Structure of the Road Network: The Oldenburg road network has high-ways and loops, whereas the other two road networks are grids.

In line with previous approaches, we report the RMSE error measure for all threeroad networks: the range of RMSE values is relatively small for all three road networks(Figure 5.5). Since RMSE is inversely related to the network size, Oldenburg has thelowest RMSE values. Figure 5.5 shows that RMSE produces unrealistically low errorvalues and we will not report the RMSE results in our next experiments.

In Figure 5.6, Melbourne CBD has the highest precision levels due to its small sizeand grid like structure. When the number of vehicles increase beyond 300, the precisionfalls sharply as l is then less than 20% of the number of vehicles.

Oldenburg has the lowest precision, i.e., the highest number of false positives. How-ever, due to its large network size the number of true negatives reported for Oldenburgis much higher than for the grid network. As a result, the accuracy of the grid network

204


80

100

60

80

nce

Precision - Melbourne CBD

40

form

an

Precision - Melbourne CBDPrecision - OldenburgPrecision - Grid G = (1600, 3120)Accuracy Melbourne CBD

0

20

Perf Accuracy - Melbourne CBD

Accuracy - OldenburgAccuracy - Grid G = (1600,3120)

050 100 150 200 250 300 350 400 450 500

# Vehicles

Figure 5.6: Performance vs Number of Vehicles

100

110

80

90

70

80

PrecisionAccuracy

50

60

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

# Nodes

Figure 5.7: Performance vs Network Size

is worse than Oldenburg (Figure 5.6). Accuracy is a holistic measure of performance ofthe model.

5.7.3.2 Effect of Increasing the Network Size

We now evaluate the effect of network size on the performance of the model. The secondrow of Table 5.4, shows the experimental setup. Since the network size and structurecan be controlled for the synthetic grid network we do our experiments for a square gridwith size varying from G(400,760) to G(10000,19800). The number of vehicles and l

remains constant at 500 and 50 respectively.

Since the network structure and the value of l remain the same the only influencingfactor remains the network size. With the increase in network size the overlap of tra-jectories decreases and as expected the Precision increases (Figure 5.7). The increasingnetwork size also implies that the true negatives (Table 5.3) will increase and thereforeresult in an increase in accuracy (Equation 5.3).As expected, the precision and accuracy

205


increase with the increase in network size (Figure 5.7).

5.7.3.3 Effect of Increasing the l-Grouping

We now evaluate the effect of l-grouping on the performance of the model. The thirdrow of Table 5.4, shows the experimental setup. We increase the value of l from 5 to250 on the three road networks (Table 5.2).

Higher values of l imply more groups and finer labeling of vehicles leading to higheraccuracy in re-identifying vehicles. If l-grouping value is high enough to identify eachvehicle uniquely then it is the same as full identification and leads to the highest accu-racy achievable. As expected, the accuracy and precision increases with the increase ofl (Figure 5.8(b))).

The accuracy is above 95% for l ≥ 25 for all the networks (Figure 5.8(b)). Theprecision values go above 95% for l > 50 for all the road networks (Figure 5.8(a)).Therefore, we can conclude that l≥ 20% of the number of vehicles is sufficient for high(above 95%) accuracy and precision for most road networks.

5.7.3.4 Effect of Increasing the Trajectory Length

The trajectories created by the generator are essentially of random length. The averagelength of the trajectories in the model effects the performance of the model as longertrajectories will lead to a higher degree of overlap between trajectories.

In this experiment, we evaluate the effect of increasing the average trajectory lengthon the model performance. The fourth row of Table 5.4, shows the experimental setup.Since we want to increase the trajectory length up-to 50 we need a large road network.We create a square grid G(10000,19800) of 100 nodes in each direction for these ex-periments.

In the real world, the length of the trajectories depends on factors such as the roadnetwork, time of the observations and special events.

Longer trajectories imply higher degree of overlap between them and larger numberof false paths reported. As expected, we observe a decrease in the precision with theincrease in the trajectory length (Figure 5.9). The accuracy remains high because of thelarge network size.

206


100

120

60

80

ion

Melbourne CBD

40

60

Prec

is OldenburgGrid G = (1600,3120)

0

20

0 50 100 150 200 250

l-grouping

(a) Precision vs l-Grouping

100

120

60

80

acy

Melbourne CBD

40

60

Acc

ura

OldenburgGrid G = (1600,3120)

0

20

A

0 50 100 150 200 250

l-grouping

(b) Accuracy vs l-Grouping

Figure 5.8: Effect of l-Grouping

207


98100102

92949698

889092

PrecisionAccuracy

828486

0 10 20 30 40 50

Trajectory Length

Figure 5.9: Performance vs Trajectory Length

5.7.3.5 k-Anonymity

We discuss the effectiveness of our approach for privacy protection. k-anonymity of thevehicle drivers depends on: (i) the number of vehicles, if the number of vehicles is verysmall the achievable k is small; (ii) l-grouping, lower values of l imply fewer classes ofvehicles and thus a higher privacy; (iii) the number of nodes in the observed network orthe degree of sparseness. Sparseness is the ratio of the total nodes in the original roadnetwork to the total nodes in the observed road network. k-anonymity depends on thedegree of sparseness of the observed road network. If we take the full road networkwith re-identification at every node, then we will observe higher infrastructure costs andlower values of k. k-anonymity is lower for a denser network, because the number ofvehicles leaving smaller intersections is very small, i.e., there are smaller numbers ofvehicles for the formed groups.

In a real road network only a subset of the nodes face heavy traffic at one timeinterval. We only need to monitor those major intersections and build the correspondingsparse network graph.

As explained in Section 5.3.3, we evaluate the System k-anonymity and Boundaryk-anonymity. For a group size l = 10 and 100 vehicles, the maximum achievable k

value is 10. However, there are other factors that could reduce k-anonymity levels,e.g., there could be a group where the number of observed vehicles is only 5. Wemeasure the system and boundary k to the corresponding most prominent value in thedata set (Figure 5.10-5.11). As expected, higher values of lower bound on privacy canbe achieved in exchange for lower precision and accuracy levels. We observe, that

208


4

6

8

10

12

14

16

k -

an

on

ym

ity

0

2

4

1 2 3 4 5 6 7 8 9 10

k

Vehicle Class ID(Label)

Figure 5.10: System k-Anonymity, k ≥ 7 with Confidence = 80%

10

12Accuracy (Confidence Level = 50%)

8

10

y

Accuracy (Confidence Level = 60%)

Accuracy (Confidence Level = 80%)

Precision (Confidence Level = 50%)

4

6

onym

ity

Precision (Confidence Level 50%)



2

4

k-an

o

070 75 80 85 90

PerformancePerformance

Figure 5.11: Boundary k-Anonymity vs Performance

209


Figure 5.12: Ride Sharing Occupancy for PuP Selection with Traffic Flow vs OptimalCoverage Solution

the over-all (system) achievable k-anonymity is high (7-Figure 5.10) whereas the lowerbound is relatively low. However, the lower bound of k-anonymity applies only to theuser trajectory ends (source & destination) while the k-anonymity levels are still high inbetween.

5.7.3.6 PuP Selection with Traffic Flow Information

In our experiments, we limit the OD matrix to 8 suburbs with traffic flows. The PuPswhich have higher chances of ride sharing by being able to group higher number of indi-viduals traveling collectively are ranked higher. We vary k-anonymity of the candidatePuPs from 10000 to 20000 for our experiments and compare the solutions with opti-mal coverage solution based on population density (Section 4.4.3). In our experiments,we assume, a tight driver ellipse (DE = 1.1) with limited spare time. As expected (Fig-ure 5.12), when the map is 1-covered (in case of k = 10000), the ride sharing occupancydoes not improve with additional information since the passengers always have a nearbyalternate PuP choice and there is not much detour cost added to the driver trip. However,in case of partial coverage (Figure 5.12), the PuP selection has a higher impact on oc-cupancy, since a PuP selection not aligned with the direction of travel might require thedriver to make a longer detour which makes the trip infeasible and reduces occupancyrates.

210


5.8 Summary

We have examined the problem of trajectory and OD matrix estimation in a privacypreserving manner. We introduced local re-identification to create local transition ma-trices for every node of the network. We proposed k-anonymous l-grouping to achievea trade-off between privacy and accuracy. We have given an algorithm that determinesall trajectories between a pair of network nodes and reports the corresponding vehiclecounts for them. We used the statistical measures of precision and accuracy to evalu-ate our model, which we have shown to be more appropriate than RMSE. Our methodshows high levels of accuracy and precision achievable for different values of l and k on3 different road networks.

We emphasize that the reported false positives represent a continuous traffic flow be-tween two nodes but not necessarily a single path. We observe that two trajectories areconforming to one large trajectory and leading to a false path for OD matrix estimationpurposes. There are applications (like positioning of speeding cameras) where the esti-mation of traffic flows are required and in those cases we do not need to eliminate them.The proposed PuP selection solution with aggregate traffic flow data incorporation im-proves ride sharing occupancy, specifically in case of high k-anonymity requirementswhen the number of PuP selected are fewer and each selected PuP has a higher impactin terms of detour required by drivers.

211


212

Chapter 6

Conclusion and Future Work

In this thesis, we have developed the next-generation privacy preserving dynamic ridesharing system that enables users to offer and share one time rides instantly or on shortnotice. In our first part (Chapter 3), we have presented a privacy aware dynamic ridesharing system and established that it is feasible to combine privacy with conveniencewhile maintaining utility; rather our system enhances opportunities for ride sharing. Wehave proposed the Match Maker model which is a centralized trusted service providerbased model for ride sharing with limited user location information. To ensure loca-tion privacy, this model does not require the users to disclose their exact location andtime information to the service provider for finding a shared ride. In the second part(Chapter 4), we have presented a model to scale up the Match Maker model by choos-ing optimal pick up points for ride sharing. We have presented models for optimal pickup point selection to enhance ride sharing such that passengers can be picked up collec-tively by drivers and passengers can have privacy guarantees. Finally, in the third part(Chapter 5), we have managed to produce useful data to design the selection of pick uppoints such that ride sharing can be designed to take care of special events and trafficflows which do not follow the population density of suburbs. Our privacy preservingorigin destination matrix and trajectory estimation research provides us with aggregatedata which has improved ride sharing by selecting pick up points according to differenttraffic flows. We have validated the efficiency and effectiveness of our work throughextensive experiments. In Section 6.1, Section 6.1 and Section 6.3, we summarize ourprivacy aware system for dynamic ride sharing. In Section 6.4, we give an outlook for

213

Chapter 6. Conclusion and Future Work

future directions that can enhance dynamic ride sharing opportunities.

6.1 Privacy Aware Dynamic Ride Sharing

In the first part of this thesis, we have developed a model for dynamic ride sharing thatcombines negotiation and obfuscation to ensure privacy to the system participants. InChapter 3, we have proposed the Match Maker model for ride sharing in which userssubmit trip intents (obfuscated location information) to the service provider to find po-tential matches. Our approach eliminates the need for the user to disclose their exactlocation information to the centralized trusted party, and more importantly, users evenlimit the information shared with potential candidates, refining their information foronly those who are still candidates in the current negotiation round. We have com-pared the Match Maker model with the standard eBay model (where the central serviceprovider knows the exact user location and time information) in terms of privacy bydiscussing the possible attack scenarios and established that the proposed Match Makermodel provides stronger privacy.

We have developed an efficient recursive ellipse based dynamic programming algo-rithm to generate the optimal path for the driver. Since, the TTP is not aware of the exactlocation information, the optimal path computation is done locally at every driver’s end.We have evaluated the algorithm in terms of space and time complexity and proved thatthe algorithm is efficient if the number of candidates a user is communicating with isbounded. Our experimental studies confirm that bounding the number of potential can-didates to 10 controls the number of computations while maintaining occupancy ratesand addressing privacy concerns for the drivers and passengers. In addition, our ratingmechanism incorporates feedback, penalizes for failed commitments and omits biasedone time feedback to improve reliability and trustworthiness in the system.

In our experiments, we have measured the impact of different ride sharing conditionssuch as user density, driver to passenger ratios and driver spare time on occupancy andoverall travel time. Our results establish that ride sharing is particularly beneficial duringrush hours when the traffic load is high. Our proposed ride sharing model saves between9− 21% (on average 12%) of vehicle km if drivers are only prepared to accept slightdetours of their usual trips. In the city of Melbourne, with 11.6 million trips a weekdayand an average trip length of 10.2 km, this would save 14.2 million km per weekday.

214


6.2 Optimal Pick up Point Selection for Effective RideSharing

In the second part of this thesis, we have developed a model to select optimal placementand number of pick up points to enhance ride sharing. In Chapter 3, we highlighted theprivacy benefits of established PuPs and selected PuPs randomly from the set of arterialintersections, according to the population density of every suburb. In Chapter 4, wehave developed a model for optimal PuP selection that aims to maximize k-anonymityand coverage for the users, while enhancing ride sharing.

We have developed three different solutions based on algorithms tailored using thegreedy randomized adaptive search procedure. The solutions differ in their goals: opti-mize coverage, guarantee k-anonymity and PuP ranking within certain distance of POIs.Our holistic solution with PuPs ranked on the basis of their closeness to a public trans-port mode have a tremendous potential to increase the uptake of ride sharing by provid-ing an alternative mode of travel.

In our experiments, we have measured the impact of the different solutions for ridesharing. Our proposed ride sharing with PuP selection saves between 23-40% (averageof 31.5%) of vehicle km if drivers are willing to take slight detours and passengers arewilling to travel to the pick up points. In a city of size of Melbourne with nearly 4.5million population and an average trip length of 10.2 km, this would save 35.95 millionkm per weekday.

6.3 Privacy Aware Trajectory Computation for Effec-tive Ride Sharing

In the third part of this thesis, we have developed an approach to determine traffic flows(in form of trajectory and OD matrix) in a privacy preserving manner and established itsutility for ride sharing. In Chapter 4, we enhanced the ride sharing model presented inChapter 3, by optimally selecting the number and locations of pick up points. However,the PuPs selection in different solutions is based on the population densities of suburbs,better selection could be possible if the actual traffic flows do not follow the populationdensity or in the case of special events. In Chapter 5, we have developed a privacy aware

215


method to generate aggregate traffic flow data and presented its utility for ride sharing.

We have introduced the concept of local re-identification to create local transitionmatrices for every node of the network such that vehicles are tracked locally only up tothree consecutively traveled nodes. This technique preserves privacy as we do not fullyidentify the vehicle but use the partial identifier locally in the network. We proposedk-anonymous l-grouping to achieve a trade-off between privacy and accuracy. The pro-posed algorithm determines all trajectories between a pair of network nodes and reportsthe corresponding vehicle counts for them. We used the statistical measures of precision

and accuracy to evaluate our model, which we have shown to be more appropriate thanRMSE.

In our experiments, we have evaluated the performance of our model on 3 differenttypes of road networks. Our method shows that for a group size l = 10 and 100 ve-hicles, the maximum achievable k-anonymity is 10. We have measured the impact ofvarying the network parameters such as number of vehicles, network size and trajectorylength on achievable accuracy and precision of the estimated flows. The proposed PuPselection solution with aggregate traffic flow data incorporation improves ride sharingoccupancy, specifically in case of high k-anonymity requirements when the number ofPuP selected are fewer and each selected PuP has a higher impact in terms of detourrequired by drivers.

6.4 Future Work

In the future, we consider the following extensions to our work.

6.4.1 Dynamic Partial Ride Sharing

In this thesis, we have considered inclusive ride sharing (Section 2.5.1). Therefore, ridesharing only offers trips to passengers whose source and destination, both lie within thedriver’s trip. In the future, we plan to extend our model to consider partial ride sharingwhere either the source or destination of the passenger trip is not part of the driver trip.This would increase the opportunities for ride sharing and allow passengers to combinemultiple partial trips for their rides if the drivers cannot drop (or pick up) the passengersto their final destinations.

216


6.4.2 Monetary Model for Multiple Passengers Sharing Rides

Another important future direction is to investigate a monetary model which can com-pute the trip cost depending on the number of passengers sharing a trip. Passengersmight not want to pay the full fare in case of ride sharing, however if passengers sharerides with different number of people during different times in the trip then the trip costneeds to be shared proportionally. Sharing the trip should be an economic incentive forthe passenger which should increase in proportion to the change in passenger’s trip timeor travel distance.

6.4.3 Impact of Uncertainty due to Congestion and Other Events

Our work can be extended to deal with uncertainties in travel time. We aim to investi-gate congestion based traffic routing to balance the network load once the ride sharingnegotiations are done. Another important evaluation would be to measure the delays indifferent traffic conditions and study the number of passengers and drivers whose timeconstraints could not be met. Precise models of traffic congestion prediction based onreal and up-to date data can be integrated with our system to model uncertainties. Forexample, the driver ellipse computation is based on driver spare time left after consider-ing the travel time required to travel between driver’s source and destination. Therefore,the ellipse size could vary quite substantially depending on the traffic conditions whichcould lead to different amount of spare time to pick up passengers. Although our al-gorithm for optimal path computation does not require any major modifications, theellipse diameter might need changes. One approach is to over-provision the travel timefor considering the worst case such that the ellipse always ensures appropriate traveltime for driver, however, this would have a detrimental effect on ride sharing as the el-lipse would be very tight and would not present sufficient opportunities for ride sharing.Determination of the right ellipse size trade-off is a challenge that needs to be furtherstudied.

6.4.4 Integrated Public Private Ride Sharing System

In the future, we aim to build a unified system which combines public transport, taxiservices, ride sourcing services such as Uber with dynamic ride sharing. Since the se-

217


lected pick up points are public, therefore other existing services such as taxi, Uber andLyft can also utilize these points for the collective pick up of individuals. Such a systemwould provide travel guarantees such that passengers who cannot find a matching ridewithin a specified time have alternative travel options. A system which offers partialrides can also take advantage of this unified model by suggesting partial rides and alter-native travel options within the users time and cost constraints to ensure feasibility ofthe requested trips. The system goal would therefore be to ensure the feasibility of tripswhile maximizing ride sharing.

6.4.5 Highly Dynamic Ride Sharing System

A major challenge for the future is to update the system such that if a new user joinsthe system, then the system should be able to update the commitments which have beenalready made in favor of a better trip (lower cost or travel time) due to the addition ofthis new user. The research question which needs to be investigated is the possibilityto evaluate and update the trip commitments with the obfuscated location informationavailable to the central server provider so that the user location privacy can still bemaintained.

218

Bibliography

[1] Luke Vandezande. What are the 10 busiest highways in the US? AutoGu-

ide.com, May 2014. URL http://www.autoguide.com/auto-news/2014/

05/10-busiest-highways-us.html. [Online; Accessed 18-April-2016].

[2] Department of Infrastructure and Regional Development. Traffic and con-gestion cost trends for Australian capital cities. Bureau of Infrastructure,

Transport and Regional Economics, Australia Government, November 2015.ISSN 978-1-925216-99-8. URL http://bitre.gov.au/publications/

2015/is_074.aspx. [Online; Accessed 14-April-2016].

[3] L. Sweeney. k-Anonymity: A model for protecting privacy. International Journal

of uncertainty fuzziness and knowledge based systems, 10(5):557–570, 2002.

[4] O. Abul, F. Bonchi, and M. Nanni. Never walk alone: Uncertainty for anonymityin moving objects databases. In IEEE 24th International Conference on Data

Engineering, 2008, pages 376–385, Cancun, Mexico, April 2008.

[5] Rui Chen, Benjamin C. M. Fung, and Bipin C. Desai. Differentially privatetrajectory data publication. CoRR, abs/1112.2020, 2011.

[6] Chris Loader. Trends in car ownership. Charting Transport, April2016. URL https://chartingtransport.com/2011/08/07/trends-in-

car-ownership/. [Online; Accessed 14-April-2016].

[7] Chris Loader. What’s happening with car occupancy? Charting Transport, April2016. URL https://chartingtransport.com/tag/car-occupancy/. [On-line; Accessed 14-April-2016].

219

http://www.autoguide.com/auto-news/2014/05/10-busiest-highways-us.html

http://www.autoguide.com/auto-news/2014/05/10-busiest-highways-us.html

http://bitre.gov.au/publications/2015/is_074.aspx

http://bitre.gov.au/publications/2015/is_074.aspx

https://chartingtransport.com/2011/08/07/trends-in-car-ownership/

https://chartingtransport.com/2011/08/07/trends-in-car-ownership/

https://chartingtransport.com/tag/car-occupancy/


[8] Erik Ferguson. The rise and fall of the American carpool: 1970–1990. Trans-

portation, 24(4):349–376, 1997.

[9] Infrastructure Australia. Australian infrastructure audit report ex-ecutive summary. Australia Government, May 2015. URLhttp://infrastructureaustralia.gov.au/policy-publications/

publications/files/Australian-Infrastructure-Audit-Executive-

Summary.pdf. [Online; Accessed 14-April-2016].

[10] Christopher Bodeen. China’s first-ever smog ’red alert’ closes schools, re-stricts factories and traffic. Canadian Manufacturing, December 2015. URLhttp://www.canadianmanufacturing.com/sustainability/chinas-

first-ever-smog-red-alert-closes-schools-restricts-factories-

and-traffic-158760/. [Online; Accessed 14-April-2016].

[11] Express News Service. Curbing pollution: This is how the odd-evenformula works in Delhi. The Indian Express, December 2015. URLhttp://indianexpress.com/article/cities/delhi/in-delhi-odd-

numbered-cars-to-run-on-monday-wednesday-and-friday/. [Online;Accessed 14-April-2016].

[12] Niels Agatz, Alan Erera, Martin Savelsbergh, and Xing Wang. Optimization fordynamic ride-sharing: A review. European Journal of Operational Research, 223(2):295–303, 2012.

[13] Arjun Kharpal. Sydney to Paris: Uber’s 5 biggest issues right now. CNBC,December 2014. URL http://www.cnbc.com/id/102268278#. [Online; Ac-cessed 7-January-2015].

[14] Ben Smith. Uber executive suggests digging up dirt on journalists. BuzzFeed

News, November 2014. URL http://www.buzzfeed.com/bensmith/uber-

executive-suggests-digging-up-dirt-on-journalists#.vbR2wylpG.[Online; Accessed 7-January-2015].

[15] Anand Jatin. Delhi government asks Centre to ban Uber, Ola, Taxi for Sure apps.The Hindu, June 2015. URL http://www.thehindu.com/news/cities/

220

http://infrastructureaustralia.gov.au/policy-publications/publications/files/Australian-Infrastructure-Audit-Executive-Summary.pdf



http://www.canadianmanufacturing.com/sustainability/chinas-first-ever-smog-red-alert-closes-schools-restricts-factories-and-traffic-158760/



http://indianexpress.com/article/cities/delhi/in-delhi-odd-numbered-cars-to-run-on-monday-wednesday-and-friday/

http://indianexpress.com/article/cities/delhi/in-delhi-odd-numbered-cars-to-run-on-monday-wednesday-and-friday/

http://www.cnbc.com/id/102268278#

http://www.buzzfeed.com/bensmith/uber-executive-suggests-digging-up-dirt-on-journalists#.vbR2wylpG

http://www.buzzfeed.com/bensmith/uber-executive-suggests-digging-up-dirt-on-journalists#.vbR2wylpG

http://www.thehindu.com/news/cities/Delhi/delhi-govt-asks-centre-to-ban-uber-ola-taxi-for-sure-apps/article7031897.ece



Delhi/delhi-govt-asks-centre-to-ban-uber-ola-taxi-for-sure-

apps/article7031897.ece. [Online; Accessed 25-July-2015].

[16] John Kell and Smith Geoffrey. Berlin bans Uber app, citing passenger safetyconcerns. Fortune, August 2014. URL http://fortune.com/2014/08/14/

uber-berlin-band/. [Online; Accessed 25-July-2015].

[17] Masabumi Furuhata, Maged Dessouky, Fernando Ordóñez, Marc-EtienneBrunet, Xiaoqing Wang, and Sven Koenig. Ridesharing: The state-of-the-art andfuture directions. Transportation Research Part B: Methodological, 57:28–46,2013.

[18] Shuo Ma, Yu Zheng, and Ouri Wolfson. T-share: A large-scale dynamic taxiridesharing service. In 29th IEEE International Conference on Data Engineering,

ICDE 2013, pages 410–421, 2013.

[19] Andrea Attanasio, Jean-François Cordeau, Gianpaolo Ghiani, and Gilbert La-porte. Parallel tabu search heuristics for the dynamic multi-vehicle dial-a-rideproblem. Parallel Computing, 30(3):377–87, 2004.

[20] Sarana Nutanong, Egemen Tanin, Jie Shao, Rui Zhang, and Kotagiri Ramamo-hanarao. Continuous detour queries in spatial networks. IEEE Transactions on

Knowlledge and Data Engineering, 24(7):1201–1215, 2012.

[21] Matt Duckham and Lars Kulik. Location privacy and location-aware computing.In J. Drummond, R. Billen, D. Forrest, and E. Joao, editors, Dynamic & Mobile

GIS: Investigating Change in Space and Time, chapter 3, pages 34–51. CRCPress, Boca Rator, FL., 2006.

[22] John Krumm. Inference attacks on location tracks. In Anthony LaMarca, MarcLangheinrich, and Khai Truong, editors, Pervasive Computing, volume 4480 ofLecture Notes in Computer Science, pages 127–143. Springer Berlin / Heidel-berg, 2007.

[23] Eric Hal Schwartz. Now Lyft is in the hot seat over privacy policy. DC Inno,March 2014. URL http://dcinno.streetwise.co/2014/12/03/lyft-

221




http://fortune.com/2014/08/14/uber-berlin-band/

http://fortune.com/2014/08/14/uber-berlin-band/

http://dcinno.streetwise.co/2014/12/03/lyft-hot-seat-privacy-policy-al-franken/



hot-seat-privacy-policy-al-franken/. [Online; Accessed 7-January-2015].

[24] Ece Kamar and Eric Horvitz. Collaboration and shared plans in the open world:Studies of ridesharing. In IJCAI, Proceedings of the 21st International Joint

Conference on Artificial Intelligence, volume 9, page 187, 2009.

[25] Douglas Oliveira Santos and Eduardo Candido Xavier. Dynamic taxi andridesharing: A framework and heuristics for the optimization problem. In IJCAI,

Proceedings of the 23rd International Joint Conference on Artificial Intelligence,volume 13, pages 2885–2891, 2013.

[26] Matt Duckham and Lars Kulik. A formal model of obfuscation and negotiationfor location privacy. In Hans-W. Gellersen, Roy Want, and Albrecht Schmidt, ed-itors, Pervasive Computing, volume 3468 of Lecture Notes in Computer Science,pages 152–170. Springer Berlin Heidelberg, 2005.

[27] T. Dalenius. Finding a needle in a haystack-or identifying anonymous censusrecord. Journal of official statistics, 2(3):329–336, 1986.

[28] P. Samarati. Protecting respondents’ identities in microdata release. IEEE Trans-

actions on Knowledge and Data Engineering, 13(6):1010–1027, 2001. ISSN1041-4347.

[29] Shuchi Chawla, Cynthia Dwork, Frank Mcsherry, Adam Smith, and Larry JosephStockmeyer. Toward privacy in public databases. In In TCC, pages 363–385,2005.

[30] Ashwin Machanavajjhala, Daniel Kifer, Johannes Gehrke, and Muthuramakr-ishnan Venkitasubramaniam. L-diversity: Privacy beyond k-anonymity. ACM

Transactions on Knowledge Discovery from Data (TKDD), 1(1), 2007. ISSN1556-4681.

[31] N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In Data Engineering, 2007. ICDE 2007. IEEE 23rd

International Conference on, pages 106–115. IEEE, 2007.

222




[32] Jordi Soria-Comas and Josep Domingo-Ferrert. Differential privacy via t-closeness in data publishing. In Privacy, Security and Trust (PST), 2013 Eleventh

Annual International Conference on, pages 27–35. IEEE, 2013.

[33] C. Dwork. Differential privacy: A survey of results. Theory and Applications of

Models of Computation, pages 1–19, 2008.

[34] Cynthia Dwork. Differential privacy. In Michele Bugliesi, Bart Preneel,Vladimiro Sassone, and Ingo Wegener, editors, Automata, Languages and Pro-

gramming, volume 4052 of Lecture Notes in Computer Science, pages 1–12.Springer Berlin / Heidelberg, 2006.

[35] D. Alhadidi, N. Mohammed, B. Fung, and M. Debbabi. Secure distributed frame-work for achieving ε-differential privacy. In Privacy Enhancing Technologies,pages 120–139. Springer, 2012.

[36] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibratingnoise to sensitivity in private data analysis. In Proceedings of the Third confer-

ence on Theory of Cryptography, TCC’06, pages 265–284, Berlin, Heidelberg,2006. Springer-Verlag.

[37] Frank McSherry and Kunal Talwar. Mechanism design via differential privacy.In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer

Science, FOCS ’07, pages 94–103, Washington, DC, USA, 2007. IEEE ComputerSociety.

[38] Stephen B. Wicker. The loss of location privacy in the cellular age. Communica-

tions of the ACM, 55(8):60–68, August 2012. ISSN 0001-0782.

[39] John Krumm. A survey of computational location privacy. Personal Ubiquitous

Comput., 13(6):391–399, August 2009. ISSN 1617-4909.

[40] Manolis Terrovitis. Privacy preservation in the dissemination of location data.ACM SIGKDD Explorations Newsletter, 13(1):6–18, 2011.

[41] Kang G Shin, Xiaoen Ju, Zhigang Chen, and Xin Hu. Privacy protection for usersof location-based services. Wireless Communications, IEEE, 19(1):30–39, 2012.

223


[42] Alastair R. Beresford and Frank Stajano. Mix Zones: User privacy in location-aware services. In In Proc. of the 2nd IEEE Annual Conference on Pervasive

Computing and Communications Workshops (PERCOMW04, pages 127–131,2004.

[43] Marco Gruteser and Dirk Grunwald. Anonymous usage of location-based ser-vices through spatial and temporal cloaking. In Proceedings of the 1st interna-

tional conference on Mobile systems, applications and services, pages 31–42.ACM, 2003.

[44] Gabriel Ghinita, Keliang Zhao, Dimitris Papadias, and Panos Kalnis. A reciprocalframework for spatial k-anonymity. Information Systems, 35(3):299–314, 2010.

[45] Marc Langheinrich. Privacy by design principles of privacy aware ubiquitoussystems. In Ubicomp 2001: Ubiquitous Computing, pages 273–291. Springer,2001.

[46] Matt Duckham and Lars Kulik. Simulation of obfuscation and negotiation forlocation privacy. In Spatial Information Theory, pages 31–48. Springer, 2005.

[47] Claudio Ardagna, Marco Cremonini, Sabrina De Capitani di Vimercati,Pierangela Samarati, et al. An obfuscation-based approach for protecting lo-cation privacy. Dependable and Secure Computing, IEEE Transactions on, 8(1):13–27, 2011.

[48] Michael F Worboys and Eliseo Clementini. Integration of imperfect spatial in-formation. Journal of Visual Languages & Computing, 12(1):61–80, 2001.

[49] Chi-Yin Chow and Mohamed F Mokbel. Privacy in location-based services: asystem architecture perspective. Sigspatial Special, 1(2):23–27, 2009.

[50] Hidetoshi Kido, Yutaka Yanagisawa, and Tetsuji Satoh. An anonymous com-munication technique using dummies for location-based services. In Pervasive

Services, 2005. ICPS’05. Proceedings. International Conference on, pages 88–97. IEEE, 2005.

224


[51] Jason I Hong and James A Landay. An architecture for privacy-sensitive ubiqui-tous computing. In Proceedings of the 2nd international conference on Mobile

systems, applications, and services, pages 177–189. ACM, 2004.

[52] Bhuvan Bamba, Ling Liu, Peter Pesti, and Ting Wang. Supporting anonymouslocation queries in mobile environments with privacygrid. In Proceedings of the

17th international conference on World Wide Web, pages 237–246. ACM, 2008.

[53] Bugra Gedik and Ling Liu. Protecting location privacy with personalized k-anonymity: Architecture and algorithms. Mobile Computing, IEEE Transactions

on, 7(1):1–18, 2008.

[54] Mohamed F Mokbel, Chi-Yin Chow, and Walid G Aref. The new Casper: queryprocessing for location services without compromising privacy. In Proceedings

of the 32nd international conference on Very large data bases, pages 763–774.VLDB Endowment, 2006.

[55] Chi-Yin Chow, Mohamed F Mokbel, and Xuan Liu. A peer-to-peer spatial cloak-ing algorithm for anonymous location-based service. In Proceedings of the 14th

annual ACM international symposium on Advances in geographic information

systems, pages 171–178. ACM, 2006.

[56] Gabriel Ghinita, Panos Kalnis, and Spiros Skiadopoulos. MOBIHIDE: a mo-bilea peer-to-peer system for anonymous location-based queries. In Advances in

Spatial and Temporal Databases, pages 221–238. Springer, 2007.

[57] Gabriel Ghinita, Panos Kalnis, and Spiros Skiadopoulos. PRIVE: anonymouslocation-based queries in distributed mobile systems. In Proceedings of the 16th

international conference on World Wide Web, pages 371–380. ACM, 2007.

[58] Mohamed F Mokbel and Chi-Yin Chow. Challenges in preserving location pri-vacy in peer-to-peer environments. In Web-Age Information Management Work-

shops, 2006. WAIM’06. Seventh International Conference on, pages 1–1. IEEE,2006.

[59] Eija Kaasinen. User needs for location-aware mobile services. Personal and

ubiquitous computing, 7(1):70–79, 2003.

225


[60] George Danezis, Stephen Lewis, and Ross Anderson. How much is locationprivacy worth? Fourth Workshop on the Economics of Information Security,March 2005.

[61] Cross-Tab Marketing Services. Location based services usage and perceptionssurvey. December 2010. URL http://www.microsoft.com/downloads/

en/details.aspx?FamilyID=0e52758c-3ab8-49b6-9d84-20cc53c2c308.[Retrieved January, 2015].

[62] Philippe Golle and Kurt Partridge. On the anonymity of home/work locationpairs. In Hideyuki Tokuda, Michael Beigl, Adrian Friday, A. Brush, and YoshitoTobe, editors, Pervasive Computing, volume 5538 of Lecture Notes in Computer

Science, pages 390–397. Springer Berlin / Heidelberg, 2009.

[63] Aris Gkoulalas-Divanis, Vassilios S. Verykios, and Mohamed F. Mokbel. Iden-tifying unsafe routes for network-based trajectory privacy. In Proceedings of the

SIAM International Conference on Data Mining (SDM), pages 942–953, Sparks,Nevada, USA, 2009.

[64] Baik Hoh, Marco Gruteser, Ryan Herring, Jeff Ban, Daniel Work, Juan-CarlosHerrera, Alexandre M. Bayen, Murali Annavaram, and Quinn Jacobson. Virtualtrip lines for distributed privacy-preserving traffic monitoring. In Proceedings of

the 6th international conference on Mobile systems, applications, and services,MobiSys ’08, pages 15–28, New York, NY, USA, 2008. ACM.

[65] Mehmet Ercan Nergiz, Maurizio Atzori, and Yucel Saygin. Towards trajectoryanonymization: A generalization-based approach. In Proceedings of the SIGSPA-

TIAL ACM GIS 2008 International Workshop on Security and Privacy in GIS and

LBS, SPRINGL ’08, pages 52–61, New York, NY, USA, 2008. ACM.

[66] Chi-Yin Chow and Mohamed F. Mokbel. Trajectory privacy in location-basedservices and data publication. SIGKDD Explorations Newsletter, 13(1):19–29,August 2011. ISSN 1931-0145.

[67] Toby Xu and Ying Cai. Location anonymity in continuous location-based ser-vices. In Proceedings of the 15th annual ACM international symposium on Ad-

vances in geographic information systems, page 39. ACM, 2007.

226

http://www.microsoft.com/downloads/en/details.aspx?FamilyID=0e52758c-3ab8-49b6-9d84-20cc53c2c308

http://www.microsoft.com/downloads/en/details.aspx?FamilyID=0e52758c-3ab8-49b6-9d84-20cc53c2c308


[68] Balaji Palanisamy and Ling Liu. MobiMix: Protecting location privacy withmix-zones over road networks. In Data Engineering (ICDE), 2011 IEEE 27th

International Conference on, pages 494–505. IEEE, 2011.

[69] T. Wang and L. Liu. Privacy-aware mobile services over road networks. Pro-

ceedings of the VLDB Endowment, 2(1):1042–1053, 2009.

[70] Rui Chen, Benjamin C.M. Fung, Noman Mohammed, Bipin C. Desai, andKe Wang. Privacy-preserving trajectory data publishing by local suppres-sion. Information Sciences, in press. ISSN 0020-0255. URL http://

www.sciencedirect.com/science/article/pii/S0020025511003677.

[71] R. Yarovoy, F. Bonchi, L.V.S. Lakshmanan, and W.H. Wang. Anonymizing mov-ing objects: how to hide a mob in a crowd? In Proceedings of the 12th Inter-

national Conference on Extending Database Technology: Advances in Database

Technology, pages 72–83. ACM, 2009.

[72] Haibo Hu, Jianliang Xu, and Dik Lun Lee. PAM: An efficient and privacy-awaremonitoring framework for continuously moving objects. IEEE Transactions on

Knowledge and Data Engineering, 22(3):404–419, March 2010.

[73] Baik Hoh, Marco Gruteser, Hui Xiong, and Ansaf Alrabady. Preserving privacyin GPS traces via uncertainty-aware path cloaking. In Proceedings of the 14th

ACM conference on Computer and communications security, CCS ’07, pages161–171, New York, NY, USA, 2007. ACM.

[74] Josep Domingo-Ferrer, Michal Sramka, and Rolando Trujillo-Rasúa. Privacy-preserving publication of trajectories using microaggregation. In Proceedings of

the 3rd ACM SIGSPATIAL International Workshop on Security and Privacy in

GIS and LBS, SPRINGL ’10, pages 26 – 33, New York, NY, USA, 2010. ACM.

[75] C.S. Fisk. Trip matrix estimation from link traffic counts: The congested net-work case. Transportation Research Part B: Methodological, 23(5):331 – 336,1989. ISSN 0191-2615. URL http://www.sciencedirect.com/science/

article/pii/019126158990009X.

227

http://www.sciencedirect.com/science/article/pii/S0020025511003677


http://www.sciencedirect.com/science/article/pii/019126158990009X

http://www.sciencedirect.com/science/article/pii/019126158990009X


[76] Heinz Spiess. A maximum likelihood model for estimating origin-destinationmatrices. Transportation Research Part B: Methodological, 21(5):395 – 412,1987. ISSN 0191-2615. URL http://www.sciencedirect.com/science/

article/pii/0191261587900373.

[77] M.J. Maher. Inferences on trip matrices from observations on link vol-umes: A Bayesian statistical approach. Transportation Research Part B:

Methodological, 17(6):435 – 447, 1983. ISSN 0191-2615. URL http://

www.sciencedirect.com/science/article/pii/0191261583900309.

[78] Ennio Cascetta. Estimation of trip matrices from traffic counts and surveydata: A generalized least squares estimator. Transportation Research Part B:

Methodological, 18(4 - 5):289 – 299, 1984. ISSN 0191-2615. URL http:

//www.sciencedirect.com/science/article/pii/0191261584900122.

[79] Enrique Castillo, José María Menéndez, and Pilar Jiménez. Trip matrix andpath flow reconstruction and estimation based on plate scanning and link ob-servations. Transportation Research Part B: Methodological, 42(5):455 – 481,2008. ISSN 0191-2615. URL http://www.sciencedirect.com/science/

article/pii/S0191261507000975.

[80] Katharina Parry and Martin L. Hazelton. Estimation of origin destination matri-ces from link counts and sporadic routing data. Transportation Research Part

B: Methodological, 46(1):175 – 188, 2012. ISSN 0191-2615. URL http:

//www.sciencedirect.com/science/article/pii/S019126151100138X.

[81] F. Calabrese, G. Di Lorenzo, Liang Liu, and C. Ratti. Estimating origin-destination flows using mobile phone location data. Pervasive Computing, IEEE,10(4):36 – 44, 2011. ISSN 1536-1268.

[82] Yu Zheng and Xiaofang Zhou. Computing with spatial trajectories. SpringerScience & Business Media, 2011.

[83] Yu Zheng. Trajectory data mining: an overview. ACM Transactions on Intelligent

Systems and Technology (TIST), 6(3):29, 2015.

228

http://www.sciencedirect.com/science/article/pii/0191261587900373








http://www.sciencedirect.com/science/article/pii/S019126151100138X



[84] Xiaolei Li, Jiawei Han, Jae-Gil Lee, and Hector Gonzalez. Traffic density-baseddiscovery of hot routes in road networks. In Dimitris Papadias, Donghui Zhang,and George Kollios, editors, Advances in Spatial and Temporal Databases, vol-ume 4605 of Lecture Notes in Computer Science, pages 441 – 459. SpringerBerlin / Heidelberg, 2007.

[85] Z. Chen, H.T. Shen, and X. Zhou. Discovering popular routes from trajecto-ries. In Data Engineering (ICDE), 2011 IEEE 27th International Conference on,pages 900–911. IEEE, 2011.

[86] J. Xu, L. Guo, Z. Ding, X. Sun, and C. Liu. Traffic aware route planning indynamic road networks. In Database Systems for Advanced Applications, pages576–591. Springer, 2012.

[87] E. Crisostomi, S. Kirkland, and R. Shorten. A Google-like model of road networkdynamics and its application to regulation and control. International Journal of

Control, 84(3):633–651, 2011.

[88] Nelson D Chan and Susan A Shaheen. Ridesharing in North America: Past,present, and future. Transport Reviews, 32(1):93–112, 2012.

[89] Andrew Amey. Proposed methodology for estimating rideshare viability withinan organization: Application to the MIT community. In Transportation Research

Board 90th Annual Meeting, number 11-2585, pages 1–16, 2011.

[90] Alexander Kleiner, Bernhard Nebel, and Vittorio A. Ziparo. A mechanism fordynamic ride sharing based on parallel auctions. In IJCAI, Proceedings of the

22nd International Joint Conference on Artificial Intelligence, pages 266–272,2011.

[91] V. Chaube, A.L. Kavanaugh, and M.A. Perez-Quinones. Leveraging social net-works to embed trust in rideshare programs. In 43rd Hawaii International Con-

ference on System Sciences (HICSS), pages 1–8, Jan 2010.

[92] Roger F Teal. Carpooling: who, how and why. Transportation Research Part A:

General, 21(3):203–214, 1987.

229


[93] Hoyoung Jeung, Man Lung Yiu, Xiaofang Zhou, Christian S. Jensen, andHeng Tao Shen. Discovery of convoys in trajectory databases. Proceedings of

the VLDB Endow., 1(1):1068–1080, August 2008. ISSN 2150-8097.

[94] Gyozo Gidofalvi, Torben Bach Pedersen, Tore Risch, and Erik Zeitler. Highlyscalable trip grouping for large-scale collective transportation systems. In Pro-

ceedings of the 11th international conference on Extending database technology:

Advances in database technology, EDBT ’08, pages 678–689, 2008.

[95] Beihong Jin and Jiafeng Hu. Towards scalable processing for a large-scale ridesharing service. In 2012 9th International Conference on Ubiquitous Intelligence

& Computing and 9th International Conference on Autonomic & Trusted Com-

puting (UIC/ATC), pages 940–944. IEEE, 2012.

[96] Christoph Stach and Andreas Brodt. vHike - a dynamic ride-sharing service forsmartphones. In Mobile Data Management (1), pages 333–336, 2011.

[97] Kenneth Radke, Margot Brereton, Seyed Hadi Mirisaee, Sunil Ghelawat, ColinBoyd, and Juan Manuel González Nieto. Tensions in developing a secure col-lective information practice - the case of agile ridesharing. In Human-Computer

Interaction - INTERACT 2011 - 13th IFIP TC 13 International Conference, pages524–532, 2011.

[98] Elizabeth Deakin, Karen Frick, and Kevin Shively. Markets for dynamic rideshar-ing? case of Berkeley, California. Transportation Research Record: Journal of

the Transportation Research Board, (2187):131–137, 2010.

[99] John W Baugh Jr, Gopala Krishna Reddy Kakivaya, and John R Stone. In-tractability of the dial-a-ride problem and a multiobjective solution using sim-ulated annealing. Engineering Optimization, 30(2):91–123, 1998.

[100] Gerardo Berbeglia, Jean-François Cordeau, and Gilbert Laporte. A hybrid tabusearch and constraint programming algorithm for the dynamic dial-a-ride prob-lem. INFORMS Journal on Computing, 24(3):343–355, 2012.

230


[101] Shao Ma, Yu Zheng, and Ouri Wolfson. Real-time city-scale taxi ridesharing.IEEE Transactions on Knowledge and Data Engineering, PP(99):1–1, 2014.ISSN 1041-4347.

[102] Andreas Kaltenbrunner, Rodrigo Meza, Jens Grivolla, Joan Codina, and RafaelBanchs. Urban cycles and mobility patterns: Exploring and predicting trends ina bicycle-based public transport system. Pervasive and Mobile Computing, 6(4):455–466, 2010.

[103] Patrick Vogel, Torsten Greiser, and Dirk Christian Mattfeld. Understanding bike-sharing systems using data mining: Exploring activity patterns. Procedia-Social

and Behavioral Sciences, 20:514–523, 2011.

[104] Stephan Hartwig and Michael Buchmann. Empty seats traveling: next-generationridesharing and its potential to mitigate traffic-and emission problems in the 21stcentury. Technical Report NRC-TR-2007-003, Nokia Research Center, 2006.

[105] Atsuyuki Okabe, Barry Boots, Kokichi Sugihara, and Sung Nok Chiu. Spatial

Tessellations: Concepts and Applications of Voronoi Diagrams. Series in Proba-bility and Statistics. John Wiley and Sons, Inc., 2nd ed. edition, 2000.

[106] Tijs Neutens, Nico Weghe, Frank Witlox, and Philippe Maeyer. A three-dimensional network-based space-time prism. Journal of Geographical Systems,10(1):89–107, 2008. ISSN 1435-5930.

[107] Steven P Miller, B Clifford Neuman, Jeffrey I Schiller, and Jermoe H Saltzer.Kerberos authentication and authorization system. In In Project Athena Technical

Plan. Citeseer, 1987.

[108] Manoj M Prabhakaran and Amit Sahai. Secure Multi-Party Computation, vol-ume 10. IOS press, 2013.

[109] Gayatri Swamynathan, BenY. Zhao, and KevinC. Almeroth. Decoupling serviceand feedback trust in a peer-to-peer reputation system. In Parallel and Distributed

Processing and Applications - ISPA 2005 Workshops, volume 3759 of Lecture

Notes in Computer Science, pages 82–90. Springer Berlin Heidelberg, 2005.

231


[110] Victorian Integrated Survey of Travel and Activity 2007, Summaryof survey results. Technical report, Victorian Integrated Survey ofTravel and Activity, Department of Transport, Australia, 2007. URLhttp://economicdevelopment.vic.gov.au/transport/research-

and-data/vista/vista-data-and-publications. [Online; Accessed18-April-2016].

[111] Bureau Of Public Roads. Traffic assignment manual. US Department of Com-

merce, 1964.

[112] Torsten Hägerstraand. What about people in regional science? Papers in regional

science, 24(1):7–24, 1970.

[113] Harvey J Miller. Modelling accessibility using space-time prism concepts withingeographical information systems. International Journal of Geographical Infor-

mation System, 5(3):287–301, 1991.

[114] Tijs Neutens, Matthias Delafontaine, Darren M. Scott, and Philippe De Maeyer.An analysis of day-to-day variations in individual space-time accessibility. Jour-

nal of Transport Geography, 23:81–91, 2012.

[115] Bart Kuijpers and Walied Othman. Modeling uncertainty of moving objects onroad networks via space-time prisms. International Journal of Geographical

Information Science, 23(9):1095–1117, 2009.

[116] Bart Kuijpers and Walied Othman. Trajectory databases: Data models, uncer-tainty and complete query languages. Journal of Computer and System Sciences,76(7):538–560, 2010.

[117] Tobias Emrich, Hans-Peter Kriegel, Nikos Mamoulis, Matthias Renz, and An-dreas Züfle. Indexing uncertain spatio-temporal data. In Proceedings of the

21st ACM International Conference on Information and Knowledge Manage-

ment, CIKM ’12, pages 395–404, New York, NY, USA, 2012. ACM.

[118] Kai Zheng, Goce Trajcevski, Xiaofang Zhou, and Peter Scheuermann. Proba-bilistic range queries for uncertain trajectories on road networks. In Proceed-

232

http://economicdevelopment.vic.gov.au/transport/research-and-data/vista/vista-data-and-publications

http://economicdevelopment.vic.gov.au/transport/research-and-data/vista/vista-data-and-publications


ings of the 14th International Conference on Extending Database Technology,EDBT/ICDT ’11, pages 283–294, New York, NY, USA, 2011. ACM.

[119] Goce Trajcevski, Alok Choudhary, Ouri Wolfson, Li Ye, and Gang Li. Uncertainrange queries for necklaces. In Eleventh International Conference on Mobile

Data Management (MDM), 2010, pages 199–208. IEEE, May 2010.

[120] E. Kanoulas, Yang Du, Tian Xia, and Donghui Zhang. Finding fastest paths ona road network with speed patterns. In Proceedings of the 22nd International

Conference on Data Engineering, 2006. ICDE ’06., pages 10–10, April 2006.

[121] Hector Gonzalez, Jiawei Han, Xiaolei Li, Margaret Myslinska, and John PaulSondag. Adaptive fastest path computation on a road network: A traffic miningapproach. In Proceedings of the 33rd International Conference on Very Large

Data Bases, VLDB ’07, pages 794–805. VLDB Endowment, 2007.

[122] Bolin Ding, Jeffrey Xu Yu, and Lu Qin. Finding time-dependent shortest pathsover large graphs. In Proceedings of the 11th International Conference on Ex-

tending Database Technology: Advances in Database Technology, EDBT ’08,pages 205–216, New York, NY, USA, 2008. ACM.

[123] Ugur Demiryurek, Farnoush Banaei-Kashani, Cyrus Shahabi, and Anand Ran-ganathan. Online computation of fastest path in time-dependent spatial networks.In Dieter Pfoser, Yufei Tao, Kyriakos Mouratidis, MarioA. Nascimento, Mo-hamed Mokbel, Shashi Shekhar, and Yan Huang, editors, Advances in Spatial

and Temporal Databases, volume 6849 of Lecture Notes in Computer Science,pages 92–111. Springer Berlin Heidelberg, 2011.

[124] Jiajie Xu, Limin Guo, Zhiming Ding, Xiling Sun, and Chengfei Liu. Traffic awareroute planning in dynamic road networks. In Sang-goo Lee, Zhiyong Peng, Xiao-fang Zhou, Yang-Sae Moon, Rainer Unland, and Jaesoo Yoo, editors, Database

Systems for Advanced Applications, volume 7238 of Lecture Notes in Computer

Science, pages 576–591. Springer Berlin Heidelberg, 2012.

[125] Manolis Terrovitis, Spiridon Bakiras, Dimitris Papadias, and Kyriakos Moura-tidis. Constrained shortest path computation. In Claudia Bauzer Medeiros,

233


MaxJ. Egenhofer, and Elisa Bertino, editors, Advances in Spatial and Temporal

Databases, volume 3633 of Lecture Notes in Computer Science, pages 181–199.Springer Berlin Heidelberg, 2005.

[126] Pablo Samuel Castro, Daqing Zhang, Chao Chen, Shijian Li, and Gang Pan. Fromtaxi GPS traces to social and community dynamics: A survey. ACM Computing

Surveys (CSUR), 46(2):17, 2013.

[127] Gang Pan, Guande Qi, Wangsheng Zhang, Shijian Li, Zhaohui Wu, and L.T.Yang. Trace analysis and mining for smart cities: issues, methods, and applica-tions. IEEE Communications Magazine, 51(6):120–126, June 2013.

[128] Tim Roughgarden and Éva Tardos. How bad is selfish routing? Journal of the

ACM (JACM), 49(2):236–259, 2002.

[129] Daniel Ayala, Ouri Wolfson, Bo Xu, Bhaskar Dasgupta, and Jie Lin. Parking slotassignment games. In Proceedings of the 19th ACM SIGSPATIAL International

Conference on Advances in Geographic Information Systems, pages 299–308.ACM, 2011.

[130] Richard Kershner. The number of circles covering a set. American Journal of

Mathematics, 61(3):665–671, July 1939.

[131] Paola Festa and Mauricio GC Resende. GRASP: An annotated bibliography. InEssays and surveys in metaheuristics, pages 325–367. Springer, 2002.

[132] Mauricio GC Resende and Celso C Ribeiro. Greedy randomized adaptive searchprocedures: Advances, hybridizations, and applications. In Handbook of meta-

heuristics, pages 283–319. Springer, 2010.

[133] Mauricio GC Resende. Greedy randomized adaptive search procedures(GRASP). Encyclopedia of optimization, 2:373–382, 2001.

[134] Dalessandro Soares Vianna and Jose Elias Claudio Arroyo. A GRASP algorithmfor the multi-objective knapsack problem. In Computer Science Society, 2004.

SCCC 2004. 24th International Conference of the Chilean, pages 69–75. IEEE,2004.

234


[135] Hui Li and Dario Landa-Silva. An elitist GRASP metaheuristic for the multi-objective quadratic assignment problem. In Evolutionary Multi-Criterion Opti-

mization, pages 481–494. Springer, 2009.

[136] Queensland Government. Integrated regional transport plan for south eastqueensland. Queensland Government, Brisbane, 1997.

[137] Alan T Murray, Rex Davis, Robert J Stimson, and Luis Ferreira. Public trans-portation access. Transportation Research Part D: Transport and Environment,3(5):319–328, 1998.

[138] Alan T Murray. Strategic analysis of public transport coverage. Socio-Economic

Planning Sciences, 35(3):175–188, 2001.

[139] Changshan Wu and Alan T Murray. Optimizing public transit quality and systemaccess: the multiple-route, maximal covering/shortest-path problem. Environ-

ment and Planning B: Planning and Design, 32(2):163–178, 2005.

[140] KW Ogden. Privacy issues in electronic toll collection. Transportation Research

Part C: Emerging Technologies, 9(2):123–134, 2001.

[141] Moein Ghasemzadeh, Benjamin CM Fung, Rui Chen, and Anjali Awasthi.Anonymizing trajectory data for passenger flow analysis. Transportation Re-

search Part C: Emerging Technologies, 39:63–79, 2014.

[142] Mihaela Cardei and Jie Wu. Coverage in wireless sensor networks. Handbook of

Sensor Networks, 21, 2004.

[143] Parvin Asadzadeh Birjandi, Lars Kulik, and Egemen Tanin. K-coverage in regu-lar deterministic sensor deployments. In Intelligent Sensors, Sensor Networks

and Information Processing, 2013 IEEE Eighth International Conference on,pages 521–526. IEEE, 2013.

[144] Ian F Akyildiz, Weilian Su, Yogesh Sankarasubramaniam, and Erdal Cayirci.Wireless sensor networks: A survey. Computer networks, 38(4):393–422, 2002.

[145] Mihaela Cardei and Jie Wu. Energy-efficient coverage problems in wireless ad-hoc sensor networks. Computer communications, 29(4):413–420, 2006.

235


[146] Dan Li, Kerry D Wong, Yu Hen Hu, and Akbar M Sayeed. Detection, classifi-cation, and tracking of targets. Signal Processing Magazine, IEEE, 19(2):17–29,2002.

[147] Martin Liggins II, David Hall, and James Llinas. Handbook of multisensor data

fusion: theory and practice. CRC press, 2008.

[148] Tony Sun, Ling-Jyh Chen, Chih-Chieh Han, and Mario Gerla. Reliable sensornetworks for planet exploration. In Networking, Sensing and Control, 2005. Pro-

ceedings. 2005 IEEE, pages 816–821. IEEE, 2005.

[149] Shansi Ren, Qun Li, Haining Wang, Xin Chen, and Xiaodong Zhang. Design andanalysis of sensing scheduling algorithms under partial coverage for object de-tection in sensor networks. Parallel and Distributed Systems, IEEE Transactions

on, 18(3):334–350, 2007.

[150] Francesco Viti, Marco Rinaldi, Francesco Corman, and Chris MJ Tampère. As-sessing partial observability in network sensor location problems. Transportation

research part B: methodological, 70:65–89, 2014.

[151] Kiran Mehta, Donggang Liu, and Matthew Wright. Protecting location privacyin sensor networks against a global eavesdropper. IEEE Transactions on Mobile

Computing, 11(2):320–336, Feb 2012.

[152] Thomas Winkler and Bernhard Rinner. Security and privacy protection in visualsensor networks: A survey. ACM Computing Surveys, 47(1):2:1–2:42, May 2014.ISSN 0360-0300.

[153] Zvi Drezner. Facility location: a survey of applications and methods. Springer,1995.

[154] Susan Hesse Owen and Mark S Daskin. Strategic facility location: A review.European Journal of Operational Research, 111(3):423–447, 1998.

[155] S Louis Hakimi. Optimum locations of switching centers and the absolute centersand medians of a graph. Operations research, 12(3):450–459, 1964.

236


[156] Brian Boffey and SubhashC. Narula. Multiobjective covering and routing prob-lems. In MarkH. Karwan, Jaap Spronk, and Jyrki Wallenius, editors, Essays In

Decision Making, pages 342–369. Springer Berlin Heidelberg, 1997.

[157] Orhan Karasakal and Esra K Karasakal. A maximal covering location modelin the presence of partial coverage. Computers & Operations Research, 31(9):1515–1526, 2004.

[158] Yu Zheng, Licia Capra, Ouri Wolfson, and Hai Yang. Urban computing: Con-cepts, methodologies, and applications. ACM Trans. Intell. Syst. Technol., 5(3):38:1–38:55, September 2014. ISSN 2157-6904.

[159] Enrique Castillo, José María Menéndez, and Pilar Jiménez. Trip matrix andpath flow reconstruction and estimation based on plate scanning and link ob-servations. Transportation Research Part B: Methodological, 42(5):455 – 481,2008. ISSN 0191-2615. URL http://www.sciencedirect.com/science/

article/pii/S0191261507000975.

[160] F. Calabrese, G. Di Lorenzo, Liang Liu, and C. Ratti. Estimating origin-destination flows using mobile phone location data. Pervasive Computing, IEEE,10(4):36 – 44, 2011. ISSN 1536-1268.

[161] C.C. Sun, G.S. Arr, R.P. Ramachandran, and S.G. Ritchie. Vehicle reidentifica-tion using multidetector fusion. IEEE Transactions on Intelligent Transportation

Systems, 5(3):155 – 164, 2004. ISSN 1524-9050.

[162] Karric Kwong, Robert Kavaler, Ram Rajagopal, and Pravin Varaiya. Arterialtravel time estimation based on vehicle re-identification using wireless mag-netic sensors. Transportation Research Part C: Emerging Technologies, 17(6):586 – 606, 2009. ISSN 0968-090X. URL http://www.sciencedirect.com/

science/article/pii/S0968090X09000266.

[163] Benjamin Coifman. Vehicle re-identification and travel time measurement inreal-time on freeways using existing loop detector infrastructure. Transportation

Research Record: Journal of the Transportation Research Board, 1643:181 –191, 1998. ISSN 0361-1981.

237



http://www.sciencedirect.com/science/article/pii/S0968090X09000266

http://www.sciencedirect.com/science/article/pii/S0968090X09000266


[164] M.J. Maher. Inferences on trip matrices from observations on link vol-umes: A Bayesian statistical approach. Transportation Research Part B:

Methodological, 17(6):435 – 447, 1983. ISSN 0191-2615. URL http://

www.sciencedirect.com/science/article/pii/0191261583900309.

[165] Ennio Cascetta. Estimation of trip matrices from traffic counts and surveydata: A generalized least squares estimator. Transportation Research Part B:

Methodological, 18(4 - 5):289 – 299, 1984. ISSN 0191-2615. URL http:

//www.sciencedirect.com/science/article/pii/0191261584900122.

[166] Heinz Spiess. A maximum likelihood model for estimating origin-destinationmatrices. Transportation Research Part B: Methodological, 21(5):395 – 412,1987. ISSN 0191-2615. URL http://www.sciencedirect.com/science/

article/pii/0191261587900373.

[167] Katharina Parry and Martin L. Hazelton. Estimation of origin destination matri-ces from link counts and sporadic routing data. Transportation Research Part

B: Methodological, 46(1):175 – 188, 2012. ISSN 0191-2615. URL http:

//www.sciencedirect.com/science/article/pii/S019126151100138X.

[168] Hairuo Xie, L. Kulik, and E. Tanin. Privacy-aware traffic monitoring. IEEE

Transactions on Intelligent Transportation Systems, 11(1):61 – 70, 2010. ISSN1524-9050.

[169] Thomas Brinkhoff. A framework for generating network-based moving objects.GeoInformatica, 6:153 – 180, 2002. ISSN 1384-6175.

238









Documents

Private Personalized Dynamic Ride Sharing Preeti Goel