24
How monitoring can improve the rest of the company Monitorama EU 2013 @jeff_weinstein

Monitorama: How monitoring can improve the rest of the company

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Monitorama: How monitoring can improve the rest of the company

How  monitoring  can  improve  the  rest  of  the  company  

   

Monitorama  EU  2013  @jeff_weinstein  

Page 2: Monitorama: How monitoring can improve the rest of the company

I real-time and batch data analytics

Page 3: Monitorama: How monitoring can improve the rest of the company

Monitoring  can  wildly  improve    the  whole  company  by  

sharing  data    and  sharing  techniques.  

Page 4: Monitorama: How monitoring can improve the rest of the company

Monitoring  Folks  

Developers  

Business    Analysts  

ExecuIves  &  Product  

Data    ScienIsts  

Data  

Page 5: Monitorama: How monitoring can improve the rest of the company

Apps  &  Services  &  Systems  

Users  

Data  

Code  &  Config  

Monitoring  

Page 6: Monitorama: How monitoring can improve the rest of the company

Some  problems…  

Page 7: Monitorama: How monitoring can improve the rest of the company

Data  Processing  

Apps  

Systems   Logs  /  Events  

Metrics   Graphs  &  Alerts   Apps  

3rd  Party  Reports  &  Queries  ETL   AnalyIc  

Systems  

Monitoring:  Streaming  

BI:  Batch  

Page 8: Monitorama: How monitoring can improve the rest of the company

Data  Needs  

Logs   Metrics   Logs   Metrics  

Streaming   Batch  

Data  

Mon

itorin

g  

BI  

Page 9: Monitorama: How monitoring can improve the rest of the company

Data  Tools  Stack  

Monitoring  •  Ad  hoc  

–  sed,  grep,  awk  –  ES,  LogStash,  Splunk,  …  

•  Storage  –  Hosts,  Ganglia,  OTSDB  –  Central  syslog  server  

•  VisualizaIon/ReporIng  –  Graphite,  RRDTool,  3rd  party  –  Homegrown  

•  AlerIng/EscalaIon    –  Nagios,  Sensu,  PagerDuty,  …  

Rest  of  company  •  Ad  hoc  

–  Excel,  SQL,  Hive  –  MapReduce,  …  

•  Storage  –  Lots  o’  databases,  Excel  –  Hadoop,  RDBMS…  

•  VisualizaIon/ReporIng  –  Excel,  R,  Tableau  ...  –  Dinosaur  apps,  …  

•  AlerIng/EscalaIon    –  nada  

Page 10: Monitorama: How monitoring can improve the rest of the company

Metrics  

Page 11: Monitorama: How monitoring can improve the rest of the company

Views  

Unintelligible  generated  views  Too  granular  for  long  term  trends  

Lack  of  historical   Intolerant  to  anomalies  

Page 12: Monitorama: How monitoring can improve the rest of the company

Team  and  incenIves  

•  What  team?  •  Change  vs.  reliability  •  Planning  •  Budget  •  Churn  

Page 13: Monitorama: How monitoring can improve the rest of the company

Good  or  bad?  

•  Specific  Tools  •  Decentralized  •  Focus  •  Ownership  

•  Lost  context  •  Siloed  work  •  Data  dark  •  Misunderstanding  

Page 14: Monitorama: How monitoring can improve the rest of the company

Some  fixes  

Page 15: Monitorama: How monitoring can improve the rest of the company

End  to  End  Data  Pipeline  

ü Structured  logs  ü (Config)  ü Measure  once  ü AutomaIc  metrics  ü API  ü Graph  tools  ü Glossary  ü AnnotaIons  and  tags  ü Pipeline  

Page 16: Monitorama: How monitoring can improve the rest of the company

Structured  events  

•  JSON  (or  whatever)  •  (opIonal)  config  •  Tags  per  key  – Type  – Tag:  latency,  funnel,…  – DescripIon  – Storage  

Page 17: Monitorama: How monitoring can improve the rest of the company

Auto:  Graphs,  Glossary,  &  Storage  

•  Graphs  and  dashboards  •  *  templates  •  Views  and  stats  •  Glossary  •  Batch  analyIcs  •  Long  term  storage  

Page 18: Monitorama: How monitoring can improve the rest of the company

build  learn  communicate  inspire  

Page 19: Monitorama: How monitoring can improve the rest of the company

Developers  

•  Logging  toolkit  •  Data  pipeline  

•  Pain  points  •  Outage  causes  

•  Deployment  pracIces  •  EscalaIon  playbook  

•  Measurement  as  TDD  •  Monitor  staging  env  

Page 20: Monitorama: How monitoring can improve the rest of the company

Business  Analysts  

•  Structured  logs    •  Config  for  ETL  

•  Metrics  definiIons    •  Slices  and  visualizaIons  

•  Data  size  and  cardinality  •  Outages  and  delays  

•  Flexibility  •  VisualizaIon  and  tools  

Page 21: Monitorama: How monitoring can improve the rest of the company

Data  ScienIsts  

•  Access  to  (meta)data  •  Query  monitoring  

•  StaIsIcs  and  models  •  New  data  streams  

•  Context  of  data  issues  •  What’s  in  the  logs  

•  Validate  algorithms  •  Teach  stats  and  models!  

Page 22: Monitorama: How monitoring can improve the rest of the company

Product  &  ExecuIves  

•  Curated  dashboards  •  Graph/alert  tools  

•  Learn  the  business  •  PrioriIze  alerts  by  $  

•  Incident  post  mortems    •  Metrics  granularity  

•  Data  driven  decisions  •  Recognize  and  celebrate  

Page 23: Monitorama: How monitoring can improve the rest of the company

Monitoring  can  become  the  data  plahorm  and  improve  all  teams  

with  its  techniques.  

Page 24: Monitorama: How monitoring can improve the rest of the company

Icons  from  The  Noun  Project:  Dmitry  Baranovskiy,  Benjamin  Orlovski,  Luis  Prado,  MikaDo  Nguyen,  Yarden  Gilboa,  Javier  Cabezas,  Icons  Pusher,  Jeremy  Bristol,  Blake  Thomas,  RiIka  Khasgiwale,  Mayene  de  Leon,  Yorlmar  Campos,  Sergey  Shmid  

@jeff_weinstein  

Thanks!  hiring  ;)