Upload
martin-goodwell
View
135
Download
0
Embed Size (px)
Citation preview
DEV
OPS
Push the hassle
from production
to developers.
Easily
DevOpsDays Ghent 2016
October 27th
@MartinGoodwell
Dynatrace
@MartinGoodwell
About me
Passionate about life,
technology, and the people
behind both of them.
• Started with Commodore 8-bit (VC-20 and C-64)
• Built Null-modem connections for playing Doom and WarCraft
• Built IPX/SPX networks between MS-DOS 5.0 and Windows 3.1
• Did DevOps before they called it that way (mainly Java and Web)
for about 10 years
• Now at Dynatrace Innovation Lab
• Tech Lead for Microsoft Technologies
and Software Architecture
• Find me on Twitter: @MartinGoodwell
@MartinGoodwell
@MartinGoodwell
Agenda
• The Rules
• Warm-up
• The Ops dilemma (I call it that)
• The second Ops dilemma
• On Monitoring ...
• ... and Logging
• ... and Call Tracing
• ... and Databases
• Commercial offerings
@MartinGoodwell
@MartinGoodwell
The Rules
• Please, ask or interrupt anytime
• But keep ideas for open space discussions
• Or track me down anytime around
@MartinGoodwell
@MartinGoodwell
Warm up
• What's your occupation?
• Dev, Ops, BinExec?
• What's your technology stack?
• Node.js
• Go
• Java
• .net
• Who of you does
• Monitoring
• Logging
• Call-Tracing
• Application performance management/monitoring (APM)
@MartinGoodwell
@MartinGoodwell
The Ops dilemma (1)
Dev
• Single transaction
• Deal with a specific problem
• No impact on real users and business
• Concentrate on single component
• Deadlines refer to sprints
• Weeks, usually
Ops
• 100s or 1000s of txns
• No idea, what the cause is
• Real user impact
• Lots of moving parts
• Deadlines usually mean SLAs
• Hours, maybe just minutes
@MartinGoodwell
@MartinGoodwell
The Ops dilemma (1)
Dev
• Single transaction
• Deal with a specific problem
• No impact on real users and business
• Concentrate on single component
• Deadlines refer to sprints
• Weeks, usually
Ops
• 100s or 1000s of txns
• No idea, what the cause is
• Real user impact
• Lots of moving parts
• Deadlines usually mean SLAs
• Hours, maybe just minutes
@MartinGoodwell
@MartinGoodwell
The Ops dilemma (2)
Automation
• Continuous {Integration/Deployment/Delivery} pipeline
• triggering unit tests for fast feedback
• Build servers
• Repositories
• Automatic deployments
• Helps devs getting stuff into production
• Does nothing for the opposite direction
@MartinGoodwell
@MartinGoodwell
DevOps is about collaboration.
Collaboration requires documentation.
Automation is implicit documentation.
But there is no automation for
supporting Ops with troubleshooting.
@MartinGoodwell
@MartinGoodwell
Host metrics
• CPU usage
• Memory usage
• Disk I/O
• Network performance
• No insight into app's
problems and performance
@MartinGoodwell
@MartinGoodwell
statsd real quick
http://www.slideshare.net/DatadogSlides/dev-opsdays-tokyo2013effectivestatsdmonitoring
@MartinGoodwell
@MartinGoodwell
Downsides?
• "Polluting" business logic with monitoring code
• Code introspection (ie AOP) requires advanced skills
• Not using something like statsd leads to cluttered metrics
• Great for single component insight
• what about called 3rd parties?
• what about microservices (ie distributed transactions)?
• what about calls to databases, queues, etc.
@MartinGoodwell
@MartinGoodwell
http://theburningmonk.com/2015/05/a-consistent-approach-to-track-correlation-ids-through-microservices/
@MartinGoodwell
@MartinGoodwell
http://theburningmonk.com/2015/05/a-consistent-approach-to-track-correlation-ids-through-microservices/
@MartinGoodwell
@MartinGoodwell
http://theburningmonk.com/2015/05/a-consistent-approach-to-track-correlation-ids-through-microservices/
@MartinGoodwell
@MartinGoodwell
Logging learnings
• Use a logging server (eg ELK stack)
• directly log as JSON
• at least store as JSON
• Using logging for monitoring is expensive
• log analysis is a real resource hog
• works great for troubleshooting
• works great with limited problem scope
• for Java, use Logback via SLF4J
• to local logfiles
• to logstash
• to syslog
@MartinGoodwell
@MartinGoodwell
Google Dapper paper
• The Dapper paper (2010)
http://research.google.com/pubs/archive/36356.pdf
• OpenTracing
http://opentracing.io/documentation/
• OpenZipkin (by Twitter)
• http://zipkin.io/
@MartinGoodwell
@MartinGoodwell
https://github.com/spring-cloud/spring-cloud-sleuth
Spring Cloud Sleuth is a distributed tracing solution on top of Spring Cloud
@MartinGoodwell
@MartinGoodwell
Getting database insight
• Database automation
• eg. DB Maintain
• https://dbmaintain.github.io/
• Database performance logging
• log4jdbc
• https://github.com/arthurblake/log4jdbc
@MartinGoodwell
@MartinGoodwell
Let's talk
• From monolith to microservice
• cloud migration
• performance optimization
• team culture
@MartinGoodwell
@MartinGoodwell