Gu scs team2 a_metro project_pdf

Predicting Post-Safetrack Metro Reliability

GU SCS Data Science Capstone Project September 10, 2016

Micah Melling Drew Wheatley Patrick McGrady

over 250m riders annually

118 miles of track

Facts

over 13 disruptions per day

Problem Statement

Problem Statement

highly publicized safety lapses & deferred maintenance

1 Year timeframe

estimated $60,000,000 price tag

improved safety & reliability?

HypothesisThe DC Metro System is a pivotal transportation asset for Washington DC and the surrounding

regions. The SafeTrack project is meant to increase system safety and reliability. While technical and operational disruptions are inevitable, we believe that available data can provide insight into

how frequently Metro riders will experience post-SafeTrack disruptions and ultimately improve their Metro commute expectations.

Scenario #1 Improvement



To quantify the outcome, we will explore several scenarios to provide riders with a

clearer picture of their post-safetrack commute.



Data Ingestion & Wrangling

System Operations Data: used to determine system behavior under optimal conditions

Disruption Data: historical data used to analyze the frequency and effect of technical and operational disruptions (ie: delays)

Ridership Data: in conjunction with operational datasets, ridership data used to quantify and extrapolate the scope of Metro delays.

The Data

ON TIMEON TIMEON TIME DELAYED

DELAYEDDELAYED

Planned Operating Schedule

Disruption Data

Data_Source: wmata.com

Data_Scope: Provided operating data under a perfectly efficient system with no delays or disruptions

Data_Scope: Provided 5 years of daily disruption logs, including; cause of disruption and minutes delayed

Data_Source: opendatadc

Planned Operating

Schedule and Disruption Data

provided a basis for

comparing pre and post-

safetrack system behavior

LN CAR DEST MINLN CAR DEST MIN

RD 6 RD 6 RD 6

RD 6 RD 6 RD 6

The Data24,335 records

between April 2012 - July

2016

All Metro lines represented in

the dataset

Description of disruption

cause. Translated as technical or operational

Delay, in minutes

Computation & Analysis: Limitations

AccuracyLocation

Station - To - Station

‘Garbage in - Garbage out’

concept

Opted to take a two-pronged approach: 1.) Build data product

2.) Develop simulation based on available data

Completeness

Compounding Delays

Computation & Analysis: Methodology

1 Calculated the number of minutes of trips per day on each line.

Broke daily delays into five tiers based on severity.

Scenario:1 Scenario:2 Scenario:3 Scenario:5

Tier 2Tier 3

Tier 4Tier 5

Tier 1Tier 2

Tier 3Tier 4

Tier 5

Tier 1Tier 2

Tier 3Tier 4

Tier 5

Tier 1Tier 2

Tier 3Tier 4

Tier 5

Tier 1Tier 2

Tier 3Tier 4

Tier 1

Scenario:4

Built in compounding delays based on expected train departures.

Injected random noise into the system.

2

3

4

Results of Simulated Scenario

A Look Under The Hood

[software system demo]

Results

Created visualizations of the various simulations

Analyzed results to determine the shape of the

data

Results

Current 9861.402 102.51522Scenario #1 9868.713 97.10936Scenario #2 9854.400 108.57028

Scenario #3 9852.256 102.1384Scenario #4 9850.429 101.7149Scenario #5 9848.057 104.1241











Conclusions

Scenario #1 Scenario #2 Scenario #3 Scenario #4 Scenario #5

Noticeable improvements in time and probability of delay was not realized until higher scenario parameters were introduced.

Analysis of the results indicates that SafeTrack repairs must reduce disruption severity and probability by roughly 30% - 50% for Metro riders to experience

improved trip safety and reliability.

Conclusions

Improvements in Stochastic System

Biases & Assumptions

Data QualitySpringboard for

Future Work

SafeTrack’s improvements may not be noticed if they

do not overcome the system’s random noise

Recognizing biases and stating assumptions is

key to data science

The importance of accurate data cannot be

overstated

Our software can be generalized and adapted

Questions??

Data & Analytics

Gu scs team2 a_metro project_pdf