16
Predicting Post-Safetrack Metro Reliability GU SCS Data Science Capstone Project September 10, 2016 Micah Melling Drew Wheatley Patrick McGrady

Gu scs team2 a_metro project_pdf

Embed Size (px)

Citation preview

Page 1: Gu scs team2 a_metro project_pdf

Predicting Post-Safetrack Metro Reliability

GU SCS Data Science Capstone Project September 10, 2016

Micah Melling Drew Wheatley Patrick McGrady

Page 2: Gu scs team2 a_metro project_pdf

over 250m riders annually

118 miles of track

Facts

over 13 disruptions per day

Problem Statement

Page 3: Gu scs team2 a_metro project_pdf

Problem Statement

highly publicized safety lapses & deferred maintenance

1 Year timeframe

estimated $60,000,000 price tag

improved safety & reliability?

Page 4: Gu scs team2 a_metro project_pdf

HypothesisThe DC Metro System is a pivotal transportation asset for Washington DC and the surrounding

regions. The SafeTrack project is meant to increase system safety and reliability. While technical and operational disruptions are inevitable, we believe that available data can provide insight into

how frequently Metro riders will experience post-SafeTrack disruptions and ultimately improve their Metro commute expectations.

Scenario #1 Improvement

Scenario #2 Improvement

Scenario #3 Improvement

To quantify the outcome, we will explore several scenarios to provide riders with a

clearer picture of their post-safetrack commute.

Scenario #4 Improvement

Scenario #5 Improvement

Page 5: Gu scs team2 a_metro project_pdf

Data Ingestion & Wrangling

System Operations Data: used to determine system behavior under optimal conditions

Disruption Data: historical data used to analyze the frequency and effect of technical and operational disruptions (ie: delays)

Ridership Data: in conjunction with operational datasets, ridership data used to quantify and extrapolate the scope of Metro delays.

Page 6: Gu scs team2 a_metro project_pdf

The Data

ON TIMEON TIMEON TIME DELAYED

DELAYEDDELAYED

Planned Operating Schedule

Disruption Data

Data_Source: wmata.com

Data_Scope: Provided operating data under a perfectly efficient system with no delays or disruptions

Data_Scope: Provided 5 years of daily disruption logs, including; cause of disruption and minutes delayed

Data_Source: opendatadc

Planned Operating

Schedule and Disruption Data

provided a basis for

comparing pre and post-

safetrack system behavior

LN CAR DEST MINLN CAR DEST MIN

RD 6 RD 6 RD 6

RD 6 RD 6 RD 6

Page 7: Gu scs team2 a_metro project_pdf

The Data24,335 records

between April 2012 - July

2016

All Metro lines represented in

the dataset

Description of disruption

cause. Translated as technical or operational

Delay, in minutes

Page 8: Gu scs team2 a_metro project_pdf

Computation & Analysis: Limitations

AccuracyLocation

Station - To - Station

‘Garbage in - Garbage out’

concept

Opted to take a two-pronged approach: 1.) Build data product

2.) Develop simulation based on available data

Completeness

Compounding Delays

Page 9: Gu scs team2 a_metro project_pdf

Computation & Analysis: Methodology

1 Calculated the number of minutes of trips per day on each line.

Broke daily delays into five tiers based on severity.

Scenario:1 Scenario:2 Scenario:3 Scenario:5

Tier 2Tier 3

Tier 4Tier 5

Tier 1Tier 2

Tier 3Tier 4

Tier 5

Tier 1Tier 2

Tier 3Tier 4

Tier 5

Tier 1Tier 2

Tier 3Tier 4

Tier 5

Tier 1Tier 2

Tier 3Tier 4

Tier 1

Scenario:4

Built in compounding delays based on expected train departures.

Injected random noise into the system.

2

3

4

Page 10: Gu scs team2 a_metro project_pdf

Results of Simulated Scenario

Page 11: Gu scs team2 a_metro project_pdf

A Look Under The Hood

[software system demo]

Page 12: Gu scs team2 a_metro project_pdf

Results

Created visualizations of the various simulations

Analyzed results to determine the shape of the

data

Page 13: Gu scs team2 a_metro project_pdf

Results

Current 9861.402 102.51522Scenario #1 9868.713 97.10936Scenario #2 9854.400 108.57028

Scenario #3 9852.256 102.1384Scenario #4 9850.429 101.7149Scenario #5 9848.057 104.1241

Current 8121.386 95.954Scenario #1 8117.496 97.341Scenario #2 8115.761 99.953

Scenario #3 8114.653 104.407Scenario #4 8104.47 99.702Scenario #5 8093.36 98.429

Current 5280.572 100.5566Scenario #1 5261.651 92.5748Scenario #2 5262.043 114.093

Scenario #3 5020.293 41.431Scenario #4 5014.868 41.251Scenario #5 5013.92 40.980

Current 6762.053 97.839Scenario #1 6765.053 97.839Scenario #2 6759.09 103.266

Scenario #3 6562.22 52.973Scenario #4 6552.85 53.316Scenario #5 6540.79 48.947

Current 6811.311 108.8495Scenario #1 6815.311 108.8495Scenario #2 6816.787 105.2023

Scenario #3 6809.531 108.5713Scenario #4 6810.966 97.1970Scenario #5 6809.322 98.0109

Current 11149.5 97.3886Scenario #1 11159.6 98.4512Scenario #2 11146.33 99.5911

Scenario #3 11138.77 112.8393Scenario #4 11132.07 97.0613Scenario #5 11123.83 101.226

Page 14: Gu scs team2 a_metro project_pdf

Conclusions

Scenario #1 Scenario #2 Scenario #3 Scenario #4 Scenario #5

Noticeable improvements in time and probability of delay was not realized until higher scenario parameters were introduced.

Analysis of the results indicates that SafeTrack repairs must reduce disruption severity and probability by roughly 30% - 50% for Metro riders to experience

improved trip safety and reliability.

Page 15: Gu scs team2 a_metro project_pdf

Conclusions

Improvements in Stochastic System

Biases & Assumptions

Data QualitySpringboard for

Future Work

SafeTrack’s improvements may not be noticed if they

do not overcome the system’s random noise

Recognizing biases and stating assumptions is

key to data science

The importance of accurate data cannot be

overstated

Our software can be generalized and adapted

Page 16: Gu scs team2 a_metro project_pdf

Questions??