ENSING AND PREDICTING THE PULSE OF THE CITY THROUGH … · Human Computer Interaction Laboratory...

Human Computer Interaction Laboratory

@jonfroehlich Assistant Professor Computer Science

SENSING AND PREDICTING THE PULSE OF THE CITY THROUGH SHARED BICYCLING International Workshop on Spatio-temporal Data Mining for a Better Understanding of Human Mobility:

The Bicycle Sharing System Case Study

December 5th, Paris, France

A quick story re: traveling to this workshop.

I never take Vélib’. The redistribution is completely broken.

— Girl On Train

Parisian Exchange Student from England

bikeshare research stakeholders & benefits

Operators: benefit from more accurate

models of demand for load balancing

End-users: benefit from understanding and

forecasting how system will be used for trip

planning

Urban planners: can use bike models to

improve the bikeability of the city

Social scientists & human geographers:

can study and better understand human

mobility and routine

The social sciences can finally

have access to masses of data

that are of the same order of

magnitude of their older

sisters, the natural sciences

Bruno Latour, 2007

French philosopher and sociologist

my research

this talk

bikeshare data

Station Capacity Data

granularity

Data collection method

(1) Data dump from operator

(2) Scrape bikeshare website

bikeshare data

granularity

bikeshare data

granularity

(3) 3rd-party DIY APIs

bikeshare data

Station Capacity Data O/D Bike Data

granularity

(2) …

Some interesting stats

found with that data

about CabiBikeShare

Some Quick Fun Facts

(1) “Average” trip is downhill (by -1.94 meters)

(2) Last mile usage: four most common trips are short

and seem to cover areas that subway/bus do not

(3) Sixth most common trip is a return trip from

Smithsonian station back to the Smithsonian station

(4) Casual vs. members usage fees: 40.7% casual riders

incur fees vs. 3.3% of members incur fees

Source: http://greatergreaterwashington.org/post/13351/capital-bikeshare-data-already-yields-interesting-facts/

Analysis Enabled by Anonymized O/D Bike Data

[MV Jantzen, http://www.youtube.com/watch?v=-h0rV7tw1Eo]

Washington DC, Capital Bikeshare (CaBi) Flows

MV Jantzen, CaBi Trips, http://youtu.be/-h0rV7tw1Eo

Using heuristics to infer bike flow.

Martin Austwick & Oliver O’Brien, CASA-UCL, Boris Bikes Redux, https://vimeo.com/19982736

The routing is done using OpenStreetMap data & Routing routing scripts optimized for bike usage (i.e., constant speed on all road types, obeying one-way roads and taking advantage of marked cycleway). I’ve tweaked the desirability of road types, so that the trunk and primary roads are only slightly less desirable than quieter routes.

— Oliver O’Brien

UCL CASL http://oliverobrien.co.uk/2011/02/boris-bikes-flow-video-now-with-better-curves

Jo Wood, Experiments in Bicycle Flow Animation, https://vimeo.com/33712288

Fernanda Viegas & Martin Wattenberg, Wind Map, http://hint.fm/wind/

bikeshare data

Station Capacity Data O/D Bike Data O/D User Data

granularity

(2) …

(2) Scrape user account logs

“Working collaboratively with Transport for

London (TfL), customer records reporting a

unique customer identifier, gender and

postcode, have been made available. So too

has a complete set of user journeys [for those

customer ids]”

Human route repetition in car driving.

Route Prediction from Trip Observations

Description Average Median

trip distance (miles) 7.7 4.2

trip time (min) 16.3 11.5

num trips / day 4 3.9

num trips / subject 60.3 50

num days of data /

subject 15.1 13

14,468 trips / 240 subjects

Greater Seattle Area High Level Trip Stats

[Froehlich & Krumm, Route Prediction From Trip Observations, SAE 2008]

1 6 11 16 21 26 31 36 41 46 51

Day of Observation

Percentage of Trips that are Repeat Trips

1 6 11 16 21 26 31 36 41 46 51

Day of Observation

After approximately 1

month of observation,

the number of repeat

trips reaches 50%

Percentage of Trips that are Repeat Trips

1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151

Route (Ordered By Most Frequently Traveled)

No Repeat Trips

Avg Cumulative Distribution of Trips in Routes

1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151

Route (Ordered By Most Frequently Traveled)

Average

No Repeat Trips

The most frequently traveled route accounts

for 12% of a driver’s trips

The top ten most frequently traveled routes account for

50% of a driver’s trips

Avg Cumulative Distribution of Trips in Routes

bikeshare data

Event Based Station I/O with cloud

when bike docked/hired

Continuous Use sensors to track

movement

Instrumented Bicycles (e.g., GPS sensor)

Instrumented Users (e.g., mobile phones)

Instrumented Cities (e.g., CCTVs)

bikeshare data

Event Based Station I/O with cloud

when bike docked/hired

Continuous Use sensors to track

movement

Instrumented Bicycles (e.g., GPS sensor)

Instrumented Users (e.g., mobile phones)

Instrumented Cities (e.g., CCTVs)

sensing and

predicting the

movement of a

city via shared

bicycling [Froehlich et al., UrbanSense2008; IJCAI2009]

Summer 2008:

- 373 stations

- 6,000 bicycles

- 150,000 subscribers

bicing barcelona, spain

Data Collection

• Scrape bicing webpage every 2 mins

• Extract:

– station’s geo-location

– # of available bicycles

– # of vacant parking slots

fri sat sun mon tues wed thurs fri

random zero value

erroneous oscillations

after cleansing

station 15: num available bicycles

fri sat sun mon tues wed thurs fri

station 15: num available bicycles

before cleansing

How do we know what to clean?

6 hours

random zero value

and here as well

dataset

cleaned

dataset

stations 390 370

days 25K 22.7K

observations 26.1M 20.2M

parking slots 9831 9315

13 weeks of observations

Aug 27 – Dec 1, 2008

Num checked-out bicycles across all stations

morning

commute

late spanish

evening

commute

evening

commute

sleeping in on

weekends

Two-spike pattern

found in study of London

Underground

[Lathia, Froehlich & Capra, ICDM2010]

Introducing DayViews

Mon Tues Wed Thur Fri

morning

commute

starts @

begins @

return from

lunch/work

at 3:30pm

10am beach goers begin

arriving

morning

commute

starts @

begins @

return from

lunch/work

at 3:30pm

Activity Score: AS(t)=|Bt -Bt-1|

How are Bicing patterns shared across

stations and distributed in the city?

Temporal Clustering

time of day (hrs)

Activity0%

time of day (hrs)

Weekday

activity clusters available bicycle clusters

Applied dendrogram clustering with

dynamic time warping as distance metric

Cluster A2 (N=76) Cluster A3 (N=74) Cluster A4 (N=12)

Cluster A1 (N=207)

Activity Clusters

Cluster A2 (N=76) Cluster A3 (N=74) Cluster A4 (N=12) Cluster A5 (N=1)

Cluster A1 (N=207)

Activity Clusters

Temporal Clustering

time of day (hrs)

Activity0%

time of day (hrs)

Weekday

activity clusters available bicycle clusters

Cluster B2 (N=72) Cluster B1 (N=106)

outgoing

incoming

Available Bicycle Cluster

flat outgoing incoming

Can Bicing station usage be predicted?

Why Care?

• load balancing

• assist urban planners / city

officials about expected activity

• provide new web/mobile

services to bicing users

Uphill Station

(midday)

Downtown Station

(night)

76% of respondents had difficulty finding a bicycle

Downtown Station

(morning)

66% of respondents had difficulty finding a parking slot

Beach Station

(evening)

50% of respondents avoid Bicing when they are

traveling to a place where they must be on time

Station Models

t0 Bt0 =27

PW=10min # o

t0 Bt0 =27

PW=10min

PW=2hr

PredLV=27 PredLV=27

Actual Value=10

Actual Value=27

PredLV=(t0,Bt0,PW)=Bt0 #

Last Value

t0 Bt0 =27

PW=2hr

PredLV=27

Actual Value=10

PredLV=(t0,Bt0,PW)=Bt0 #

Last Value

Station’s

DayView

PredHM=(t0,Bt0,PW)=BTBt0 + PW

Historic Mean

t0 Bt0 =27

PW=10min

PW=2hr

PredHM=23

PredHM=14

Actual Value=10

Actual Value=27

Station’s

DayView

s PredHM=(t0,Bt0,PW)=BTBt0 + PW

Historic Mean

t0 Bt0 =27

PW=2hr

PredHM=14

Actual Value=10

Station’s

DayView

s PredHM=(t0,Bt0,PW)=BTBt0 + PW

Historic Mean

t0 Bt0 =27

PW=10min

PW=2hr

PredHT=27+(25-24)=28

PredHT=27+(15-24)=18

Actual Value=10

Actual Value=27

Station’s

DayView

PredHT=(t0,Bt0,PW)=Bt0+BTBt0 + PW-BTBt0

Historic Trend

t0 Bt0 =27

PW=2hr

PredHT=27+(15-24)=18

Actual Value=10

Station’s

DayView

PredHT=(t0,Bt0,PW)=Bt0+BTBt0 + PW-BTBt0

Historic Trend

PredBN=(t0,Bt0,PW)=Bt0+ delta

• time: discrete observed node

corresponding to hours in the day

• bikes: the # of avail bikes at time t

• PW: the prediction window

• delta: continuous Gaussian var that

represents change in number of

bikes at time t + PW

PW time

Prediction made by adding the value of the delta

node to the most recent observation

Bayesian Network

Prediction Evaluation november

3 weeks of data to build models

1 week of test data

models were fed: • the current time • current # of avail bicycles • each of the six pw values (10,20,30,60,90,120 mins)

Prediction Error Metric

• Absolute difference between the

predicted number of bicycles &

the ground truth observation at

time t0 + PW

• Error is in number of bicycles

– normalized by the station’s size

High Level Results

Stdev of

Random 0.37 0.27

HistoricMean 0.17 0.16

LastValue 0.09 0.14

HistoricTrend 0.09 0.13

Bayesian Network 0.08 0.12

*error is in normalized available bicycles (nab)

0.37 corresponds to roughly 9 bicycles

High Level Results

Stdev of

Random 0.37 0.27

HistoricMean 0.17 0.16

LastValue 0.09 0.14

HistoricTrend 0.09 0.13

Bayesian Network 0.08 0.12

*error is in normalized available bicycles (nab)

0.08 corresponds to roughly 2 bicycles

Prediction vs. Activity Cluster

A5 A4 A3 A2 A1

LastValue HistMean

HistTrend Bayesian

A5 (N=1) A4 (N=12) A3 (N=74) A2 (N=76) A1 (N=207)

0.1 corresponds to ~2.5 bicycles at a station with 25 slots

0 30 60 90 120

Elevation (meters)

station elevation vs. avg % of available bicycles

0 30 60 90 120

Elevation (meters)

station elevation vs. activity score

Topographical Influences?

weather

other mass transit other transit sources

cross city examination

self-sustainable system

promote usage

at current pace: expected arrival: 11 mins

current: 4 empty slots predicted: 6 (11 mins)

Longitudinal

Related Collaborators

Related Publications

NealLathia NuriaOliver JoachimNeumann JohnKrumm LiciaCapra ChrisSmith

Individuals Among Commuters: Building Personalised Transport Information Services From Fare Collection Systems

Neal Lathia, Chris Smith, Jon Froehlich, Licia Capra, Journal of Pervasive and Mobile Computing (PMC) 2012

Mining Public Transport Usage for Personalised Intelligent Transport Systems

Neal Lathia, Jon Froehlich, Licia Capra, Proceedings of ICDM2010

Sensing and Predicting the Pulse of the City Through Shared Bicycling

Jon Froehlich, Joachim Neumann, Nuria Oliver, Proceedings of IJCAI2009

Measuring the Pulse of the City Through Shared Bicycle Programs

Jon Froehlich, Joachim Neumann, Nuria Oliver, Proceedings of UrbanSense2008

Route Prediction From Trip Observations

Jon Froehlich, John Krumm, Proceedings of SAE2008

Download publications here: http://www.cs.umd.edu/~jonf/publications.html

Human Computer Interaction Laboratory

@jonfroehlich Assistant Professor Computer Science

SENSING AND PREDICTING THE PULSE OF THE CITY THROUGH SHARED BICYCLING International Workshop on Spatio-temporal Data Mining for a Better Understanding of Human Mobility:

The Bicycle Sharing System Case Study

Co-organized by GERI Animatic, le Labex Futurs Urbains et l'Ecole des Ponts ParisTech

December 5th, Paris, France

ENSING AND PREDICTING THE PULSE OF THE CITY THROUGH … · Human Computer Interaction Laboratory...

Documents

Predicting FTEs

Predicting Porosity through Simulating Sandstone ...carbcap.geos.ed.ac.uk/rsh-journal-downloads/Predicting-Porosity... · Predicting Porosity through Simulating Sandstone Compaction

PREDICTING THE MEETING ENVIRONMENTS OF … · INTRODUCTION 4 IACConline.org & IACCmeetings.com In 2016, the IACC Meeting Room of the Future™ report began providing a pulse on global

8 Predicting

Running head: PREDICTING ACADEMIC SUCCESS PREDICTING

Pulse to Pulse Stability of an Ampegon Short Pulse Modulator

Weather Radar Development and Application …Pulse-Doppler Radar for Weather Sensing Pulse-Doppler radar is the preferred approach for de-tecting and predicting the thunderstorm phenomena

VARTA pulse / pulse neo - VARTA Storage€¦ · SYSTEMDATEN PULSE / PULSE NEO 3 PULSE / PULSE NEO 6 Batteriekapazität nominal 3,3 kWh 16,5 kWh Max. AC Leistung Laden / Entladen 1,8

Predicting Success

PREDICTING PREDICTABILITY

Predicting 4,5,6

Predicting banking

Predicting Surface Heat Flux and Temperature via a Pulse

Pulse Connect Secure - Pulse Secure

Predicting NBA Player Performancecs229.stanford.edu/proj2012/Wheeler-PredictingNBAPlayerPerformance… · Predicting NBA Player Performance Kevin Wheeler Introduction Predicting the

Pulse and active pulse spectraphotometry

CMSC434 Week 01 | Lecture 01 | Sept 02, 2014 Design ... › class › fall2014 › cmsc434-0101 › CMSC434...Human Computer Interaction Laboratory @jonfroehlich Assistant Professor

CMSC434 Week 01 | Lecture 01 | Jan 28, 2016 Jon Froehlich · Week 01 | Lecture 01 | Jan 28, 2016 Design Activity & Class Intro Jon Froehlich @jonfroehlich ... iterative design process

Reading Predicting

PREDICTING FAILURES_AUGUST2007