Upload
cambridge-consultants
View
301
Download
0
Embed Size (px)
Citation preview
15 July 2016 P2175-P-011 v0.2
Commercially Confidential
Generating Insight from Data
Tailoring Analytic Algorithms and Visualization to Address User Requirements
15 July 2016 P2175-P-011 v0.22Commercially Confidential
The Challenge
How do you get from a user with data to one acting on the basis of said data?
We will elaborate by way of example(s)
Data +User
Action !
NeedsAnalysis
Visualisation
15 July 2016 P2175-P-011 v0.23Commercially Confidential
TfL Data Overview
The Transport for London (TfL) Tube travel data set provides a large, open-access data set we can play around with to demonstrate the process
Over ½ million journeys logged giving location and time of the start and end of each.– Needs cleaning (unstarted, unfinished, not applicable etc…)
While nominally about journeys, the data can be re-analysed to give information about:
– Stations– Lines
Other meta data allows for potentially interesting analyses, such as user type (elderly pass user, season ticket user etc…)
15 July 2016 P2175-P-011 v0.24Commercially Confidential
User: Needs
Consider a potential user. What questions do they want answers to? What information is of use to them?
We consider a user who wants to know about stations – not just in terms of usage but in terms of which stations are similar to other stations. This may be for several reasons:
– Interested in advertising based on likely users;– Interested in appropriate staffing and rostering of stations;– Interested in issue tracking, learning and apply lessons to similar stations;
Alternative users might be interested in the traffic flow on lines and how they are affected by station closure:
– Emergency / contingency planning;– Sophisticated travel advice apps;
15 July 2016 P2175-P-011 v0.25Commercially Confidential
Analysis: Station Profiles
We refocus the data set to give a profile of the usage of a station – recording both arrival and departure rates across the working day
Comparing total usage of stations is easily done by the users (already) so we scale each of these profiles to have a maximum value of 1.
15 July 2016 P2175-P-011 v0.26Commercially Confidential
Analysis: Dissimilarity Metric
User is interested in type of station (e.g. commuter source) but not interested in the precise timing of the commuter rushes
Stations close to the centre (e.g. Harrow) have later morning departure peaks than stations further out (e.g. Chorleywood)
The reverse is true for the evening arrivals rush
Dissimilarity between stations is determined by minimum Euclidian distance between arrival and departure profiles allowing for small timeshifts
Timeshifts must be applied in opposite directions for arrivals and departures
15 July 2016 P2175-P-011 v0.27Commercially Confidential
Analysis: Automatic Clustering
Agglomerative hierarchical clustering technique was used with group average linkage to merge clusters
Complete dendrogram is easy to calculate – deciding where to split is Splitting into 6 clusters provided useful insight (more clusters are also insightful)
15 July 2016 P2175-P-011 v0.28Commercially Confidential
Analysis: 6 Clusters
Some insights can be gained just from looking at the clusters – e.g. the clusters were labelled by observing their membership
Commuter Source: 168 stations, characterised by a morning departures peak and an evening arrivals peak, mainly located in the suburbs (e.g. Barnet)
Commuter Destination: 44 stations, characterised by a morning arrivals peak and an evening departures peak, mainly central London (e.g. Canary Wharf)
Transit: 44 stations, with peaks as a commuter destination but also keeping high usage throughout the day, includes most rail/tube interchanges, (e.g. Kings Cross)
Social: 3 stations, with peaks as a commuter destination, but with extra arrivals early evening and many departures very late in the evening, (e.g. Covent Garden)
Heathrow Terminal 4: Cluster of one whose behaviour is highly variable - dependent upon flights rather than typical work patterns.
Heathrow Terminals 1,2,&3: Cluster of one whose behaviour is highly variable - dependent upon flights rather than typical work patterns.
…Text is a poor way of displaying these
15 July 2016 P2175-P-011 v0.29Commercially Confidential
Geographic Visualisation
Further insight can be achieved by using an interactive, web based, visualisation tool to show the location, cluster and current usage of each station
15 July 2016 P2175-P-011 v0.210Commercially Confidential
Geographic Visualisation
Rush-hour becomes startlingly clear as the size of each station is proportional to how busy it is
15 July 2016 P2175-P-011 v0.211Commercially Confidential
Geographic Visualisation
Pan and zoom (inherited from Google Maps) allow a user to focus their interest in an intuitive manner – clicking on a station brings up details on the right
15 July 2016 P2175-P-011 v0.212Commercially Confidential
Conclusions
To generate new insight you need to determine users needs, apply appropriate analytics and display with suitable visualisations.
Data analysis without an understanding of the goal may just be empty maths;
Data visualisation on it’s own may be very pretty, but not useful;
New insights can be generated from analytics + visualisation
Target these to address user needs and you have something useful
15 July 2016 P2175-P-011 v0.2
Cambridge UK
Registered No. 1036296 England
Cambridge Consultants is part of the Altran group, a global leader in Innovation. www.Altran.com
www.CambridgeConsultants.comThe contents of this presentation are commercially confidential and the proprietary information of Cambridge Consultants © 2016 Cambridge Consultants Ltd. All rights reserved.
Boston USA Singapore