Upload
jeff-weinstein
View
2.615
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
How monitoring can improve the rest of the company
Monitorama EU 2013 @jeff_weinstein
I real-time and batch data analytics
Monitoring can wildly improve the whole company by
sharing data and sharing techniques.
Monitoring Folks
Developers
Business Analysts
ExecuIves & Product
Data ScienIsts
Data
Apps & Services & Systems
Users
Data
Code & Config
Monitoring
Some problems…
Data Processing
Apps
Systems Logs / Events
Metrics Graphs & Alerts Apps
3rd Party Reports & Queries ETL AnalyIc
Systems
Monitoring: Streaming
BI: Batch
Data Needs
Logs Metrics Logs Metrics
Streaming Batch
Data
Mon
itorin
g
BI
Data Tools Stack
Monitoring • Ad hoc
– sed, grep, awk – ES, LogStash, Splunk, …
• Storage – Hosts, Ganglia, OTSDB – Central syslog server
• VisualizaIon/ReporIng – Graphite, RRDTool, 3rd party – Homegrown
• AlerIng/EscalaIon – Nagios, Sensu, PagerDuty, …
Rest of company • Ad hoc
– Excel, SQL, Hive – MapReduce, …
• Storage – Lots o’ databases, Excel – Hadoop, RDBMS…
• VisualizaIon/ReporIng – Excel, R, Tableau ... – Dinosaur apps, …
• AlerIng/EscalaIon – nada
Metrics
Views
Unintelligible generated views Too granular for long term trends
Lack of historical Intolerant to anomalies
Team and incenIves
• What team? • Change vs. reliability • Planning • Budget • Churn
Good or bad?
• Specific Tools • Decentralized • Focus • Ownership
• Lost context • Siloed work • Data dark • Misunderstanding
Some fixes
End to End Data Pipeline
ü Structured logs ü (Config) ü Measure once ü AutomaIc metrics ü API ü Graph tools ü Glossary ü AnnotaIons and tags ü Pipeline
Structured events
• JSON (or whatever) • (opIonal) config • Tags per key – Type – Tag: latency, funnel,… – DescripIon – Storage
Auto: Graphs, Glossary, & Storage
• Graphs and dashboards • * templates • Views and stats • Glossary • Batch analyIcs • Long term storage
build learn communicate inspire
Developers
• Logging toolkit • Data pipeline
• Pain points • Outage causes
• Deployment pracIces • EscalaIon playbook
• Measurement as TDD • Monitor staging env
Business Analysts
• Structured logs • Config for ETL
• Metrics definiIons • Slices and visualizaIons
• Data size and cardinality • Outages and delays
• Flexibility • VisualizaIon and tools
Data ScienIsts
• Access to (meta)data • Query monitoring
• StaIsIcs and models • New data streams
• Context of data issues • What’s in the logs
• Validate algorithms • Teach stats and models!
Product & ExecuIves
• Curated dashboards • Graph/alert tools
• Learn the business • PrioriIze alerts by $
• Incident post mortems • Metrics granularity
• Data driven decisions • Recognize and celebrate
Monitoring can become the data plahorm and improve all teams
with its techniques.
Icons from The Noun Project: Dmitry Baranovskiy, Benjamin Orlovski, Luis Prado, MikaDo Nguyen, Yarden Gilboa, Javier Cabezas, Icons Pusher, Jeremy Bristol, Blake Thomas, RiIka Khasgiwale, Mayene de Leon, Yorlmar Campos, Sergey Shmid
@jeff_weinstein
Thanks! hiring ;)