Upload
sambourton
View
263
Download
0
Embed Size (px)
DESCRIPTION
This is a 15 minute presentation at the UK Azure User Group - Lightning Talk night, on how we use Azure at QB, a Creative Data Science agency.
Citation preview
QuantumBlack ©2014Page 1
QuantumBlack
How we use AzureAzure Lightning Talk
Sam Bourton, CTO
[email protected]@sambourtonhttp://www.quantumblack.com
13 May 2014
QuantumBlack ©2014Page 2
Agenda
Who we are 2 mins
What we do 2 mins
Examples 2 mins
How we use Azure 3 mins
Pleasures 2 mins
Pains 2 mins
How we want to use Azure 2 mins
= 15 mins!*
QuantumBlack ©2014Page 3
QuantumBlack ©2014Page 4
What we do
Engineering
DataScience
Design
Machine LearningAdvanced Maths / StatsData Analytics
User Interface DesignInformation ArchitectureProduct / Solution Design
Application DevelopmentWeb, Enterprise, iPad
Interactive Dashboards
Applied AnalyticsAnalytics Deployment
Data Visualisation
QuantumBlack ©2014Page 5
Who we are
Back EndDev
Front EndDev
Generalist
DataArchitect
/ Engineer
DataViz
A Back End Developer with excellent SQL DB skills, poss. NoSQL, application data architecture, and middle-tier enterprise applications
A Front End Web Developer builds web applications, using JavaScript, SPA, HTML5, JavaScript Fx͛s, D3.js, iPad.
A Data Visualisation Developer or Creative Coder with excellent JavaScript, D3, and some design skills, for Data Visualisation-heavy Web UIs
Analytics Engineers/Integrators can be either Data specialists (Big Data, NoSQL, SQL) with some Data Science skills (incl. R, Matlab, Python) and Big Data Analytics tech like Hadoop/Hbase, Mahout, Spark.
Or, exceptional Generalist developers with C#, C++, Python, R, and strong background and interest in Maths, Analytics, Data Science, and Big Data tech.
A Generalist Engineer - strong Enterprise Application Developer with good middle-tier (C#, Java, Cloud), back-end DB skills and good Web dev skils.
DataScientist
UI / UXDesigner
Pure Research and algorithms, using Matlab, R, Python.
An experienced Data Architect or Data Engineer with excellent SQL, NoSQL database design/implementation, Hadoop, Data warehousing, BI, ETL, SSIS – handling and managing large amounts of corporate data. Interested in Big Data tech.
AnalyticsEngineer
Designers specialise in Visual Design (User Interface) or Information Architecture (User Experience), or a mix of both. UX Designers understand the user stories to create wireframes and logical flow. UI Design adds high-fidelity visual styles and polish.
QuantumBlack ©2014Page 6
Crossrail - Arup & Atkins
Data analytics and visualisation for improving risk management and cost efficiency from monitoring
Risk management
Improve visibility of the data to ensure nothing is missed, understand the issues and make better judgment on responses
Both leadership and engineers challenged by making sense of the vast volume of data produced from 250,000 sensors across London
Crossrail design problemTwo key challenges
Cost efficiency
Improve efficiency through new analytical methods that identify more optimal approaches
Expensive problem; Crossrail spending a significant amount on monitoring per annum
Overview of ApproachAnalytics deployed on Crossrail
Anomaly detection
Identify signatures in the data that warrant investigation.
Identify settlement patterns that run counter to expectation
Improve coverage and speed of monitoring regime
Forecasting
Forecast near-term settlement or end of construction settlement
Provide early warning of events
Optimal Sampling
Optimise monitoring programmes for accuracy and/or cost savings.
Evaluate tradeoff between monitoring intensity and accuracy of displacement knowledge over time
Adaptive Instrumentation and Monitoring (AIM)Combines powerful analytics with visual interpretation
AIM Tailored to different roles and priorities
QuantumBlack ©2014Page 12
QuantumBlack ©2014Page 13
How we use Azure (1/2)
• Quickly set up and deploy new projects / POCs / prototypes :
DevOps team setup Dev, QA, Staging, Prod environments with:
1. CI with TeamCity, Mercurial
2. SQL Server / SQL Azure
3. Virtual Machines / Cloud Service
4. Websites
• Quickly (and cheaply / temporarily) stand-up Virtual Machines, pre-installed or to install:
• Oracle
• SQL Server
• Exchange Server
• Linux
QuantumBlack ©2014Page 14
How we use Azure (2/2)
• Routing execution of algorithms via Service Bus and AMQP
• Between Worker Roles
• Across C#, Java, Python, Matlab, R
• From Windows -> Linux
• From Cloud -> On-Premise
• Automating virtual machines - start-up and shutdown
• Creating multiple instances of analytics worker roles
QuantumBlack ©2014Page 15
Pres
enta
tion
Dat
aW
eb
Clou
d
Data Warehouse
SQL Server
Web Application
Web API
Domain Model
Data Access
Web Front-EndTo
ol s Data Tools Analytics Tools
Archive
Blob Storage
Web
Br
owse
r
Data ProcessingCloud Service
FTP Server
Analytics Cloud Service
Analytic #1
Analytic #2
Analytic #n
File Processing
File Repository
Web Application Analytics Data Uploads
Live Dashboard
Email Reports
QuantumBlack ©2014Page 16
Example: AIM Processing Pipelines
Data Ingestion Smoothing
Charts
KPI͛3s
Site
Sensor
Asset
Facet
SettlementHistorySensor
Readings
Anomalies
Smoothing
Optimal SamplingPredicted Sampling
SettlementProfile
Twist
Cant
Rail Profile
New ReadingFiles
Triggers
Settlement
Slope
Deflection Ratio
Cant
Twist
New Construction
Progress
Validation
Persistence
Data Ingestion
Validation
Persistence
Zones of Influence
Sensors
Excavations / Advances
KPI͛3s
Site
Asset
Run Forecasting
Forecasting Analytics
Forecasting Charts
ForecastingAlerts
RunSampling
Optimal Sampling Analytics
SamplingCharts
Forecasting Calibration
Data Anomalies
Late Data
Spikes
Stuck Sensor
Reduced Freq
Calibration
Analytics
Sampling Calibration
QuantumBlack ©2014Page 17
Pleasures
• Very quick / easy / cheap to set up a projectIn a solid and robust Enterprise-ready(able) environmentWe do lots of small projects and Proof of Concepts (POCs)
• Very easy to scale for different environmentsDevelopment -> QA -> Staging -> ProductionPOC -> Prototype -> Alpha -> Beta -> Live
• Temporarily spinning up servers and software to match our clients
• MSDN credits!
• Automation - Using Powershell to start-up and shutdown
• Routing execution of units of code via Service Bus and AMQPBetween different languages and environments
• Email reports with SendGrid
• Simple Storage options – Table, Queue, Blob
QuantumBlack ©2014Page 18
Pains• Transient faults with SQL Azure
• Throttling on SQL Azure
• Matlab MCR on a Cloud Service
• Analytics at Scale
• Distributing processing and analytics pipelines across machines
• And operating systems
• Machine Learning / Maths – no one language to rule them all
• Running the analytics closer to the Data
• Not Azure’s fault! (It would be just be really nice if they helped solve it)
• Subscription and costs management – we don’t have the Enterprise Admin
• Costs add up for always-on VMs (and SQL Server / Oracle, etc)
• Integration across C# / Java / Python / Matlab / R
• VMs are quite slow at processingE.g. disappointing performance with Matlab
QuantumBlack ©2014Page 19
How we want to use Azure
• Much easier to run R, Python, Matlab code in the cloud
• At scale and at speed
• Distributing analytics and machine learning across instances
• Better integration and transition across:
• C# / Java / Python / Matlab / R
• Development / Testing / Production
• On-Premise / On-Site / Cloud
• Windows / Linux / MacOS
• Optimisation Solvers – Gurobi, IBM Cplex
• Our own Virtual Machine image for QB Analytics Services
• More automation
QuantumBlack ©2014Page 20
QuantumBlack ©2014Page 21