Upload
pat-mcgrady
View
37
Download
1
Embed Size (px)
Citation preview
Predicting Post-Safetrack Metro Reliability
GU SCS Data Science Capstone Project September 10, 2016
Micah Melling Drew Wheatley Patrick McGrady
over 250m riders annually
118 miles of track
Facts
over 13 disruptions per day
Problem Statement
Problem Statement
highly publicized safety lapses & deferred maintenance
1 Year timeframe
estimated $60,000,000 price tag
improved safety & reliability?
HypothesisThe DC Metro System is a pivotal transportation asset for Washington DC and the surrounding
regions. The SafeTrack project is meant to increase system safety and reliability. While technical and operational disruptions are inevitable, we believe that available data can provide insight into
how frequently Metro riders will experience post-SafeTrack disruptions and ultimately improve their Metro commute expectations.
Scenario #1 Improvement
Scenario #2 Improvement
Scenario #3 Improvement
To quantify the outcome, we will explore several scenarios to provide riders with a
clearer picture of their post-safetrack commute.
Scenario #4 Improvement
Scenario #5 Improvement
Data Ingestion & Wrangling
System Operations Data: used to determine system behavior under optimal conditions
Disruption Data: historical data used to analyze the frequency and effect of technical and operational disruptions (ie: delays)
Ridership Data: in conjunction with operational datasets, ridership data used to quantify and extrapolate the scope of Metro delays.
The Data
ON TIMEON TIMEON TIME DELAYED
DELAYEDDELAYED
Planned Operating Schedule
Disruption Data
Data_Source: wmata.com
Data_Scope: Provided operating data under a perfectly efficient system with no delays or disruptions
Data_Scope: Provided 5 years of daily disruption logs, including; cause of disruption and minutes delayed
Data_Source: opendatadc
Planned Operating
Schedule and Disruption Data
provided a basis for
comparing pre and post-
safetrack system behavior
LN CAR DEST MINLN CAR DEST MIN
RD 6 RD 6 RD 6
RD 6 RD 6 RD 6
The Data24,335 records
between April 2012 - July
2016
All Metro lines represented in
the dataset
Description of disruption
cause. Translated as technical or operational
Delay, in minutes
Computation & Analysis: Limitations
AccuracyLocation
Station - To - Station
‘Garbage in - Garbage out’
concept
Opted to take a two-pronged approach: 1.) Build data product
2.) Develop simulation based on available data
Completeness
Compounding Delays
Computation & Analysis: Methodology
1 Calculated the number of minutes of trips per day on each line.
Broke daily delays into five tiers based on severity.
Scenario:1 Scenario:2 Scenario:3 Scenario:5
Tier 2Tier 3
Tier 4Tier 5
Tier 1Tier 2
Tier 3Tier 4
Tier 5
Tier 1Tier 2
Tier 3Tier 4
Tier 5
Tier 1Tier 2
Tier 3Tier 4
Tier 5
Tier 1Tier 2
Tier 3Tier 4
Tier 1
Scenario:4
Built in compounding delays based on expected train departures.
Injected random noise into the system.
2
3
4
Results of Simulated Scenario
A Look Under The Hood
[software system demo]
Results
Created visualizations of the various simulations
Analyzed results to determine the shape of the
data
Results
Current 9861.402 102.51522Scenario #1 9868.713 97.10936Scenario #2 9854.400 108.57028
Scenario #3 9852.256 102.1384Scenario #4 9850.429 101.7149Scenario #5 9848.057 104.1241
Current 8121.386 95.954Scenario #1 8117.496 97.341Scenario #2 8115.761 99.953
Scenario #3 8114.653 104.407Scenario #4 8104.47 99.702Scenario #5 8093.36 98.429
Current 5280.572 100.5566Scenario #1 5261.651 92.5748Scenario #2 5262.043 114.093
Scenario #3 5020.293 41.431Scenario #4 5014.868 41.251Scenario #5 5013.92 40.980
Current 6762.053 97.839Scenario #1 6765.053 97.839Scenario #2 6759.09 103.266
Scenario #3 6562.22 52.973Scenario #4 6552.85 53.316Scenario #5 6540.79 48.947
Current 6811.311 108.8495Scenario #1 6815.311 108.8495Scenario #2 6816.787 105.2023
Scenario #3 6809.531 108.5713Scenario #4 6810.966 97.1970Scenario #5 6809.322 98.0109
Current 11149.5 97.3886Scenario #1 11159.6 98.4512Scenario #2 11146.33 99.5911
Scenario #3 11138.77 112.8393Scenario #4 11132.07 97.0613Scenario #5 11123.83 101.226
Conclusions
Scenario #1 Scenario #2 Scenario #3 Scenario #4 Scenario #5
Noticeable improvements in time and probability of delay was not realized until higher scenario parameters were introduced.
Analysis of the results indicates that SafeTrack repairs must reduce disruption severity and probability by roughly 30% - 50% for Metro riders to experience
improved trip safety and reliability.
Conclusions
Improvements in Stochastic System
Biases & Assumptions
Data QualitySpringboard for
Future Work
SafeTrack’s improvements may not be noticed if they
do not overcome the system’s random noise
Recognizing biases and stating assumptions is
key to data science
The importance of accurate data cannot be
overstated
Our software can be generalized and adapted
Questions??