Upload
trainline-engineering
View
34
Download
0
Tags:
Embed Size (px)
Citation preview
Some vitals
• ~40 Environments
• over 1000 servers
• over 100 products
• Windows/.NET
• New Relic .NET agent / Server Monitor
• Automation is key!
Before New Relic
• Application errors logged to disk
• Production support team look at logs
– After production issue identified from customer
reports
– After platform release to check change in patterns
• Ad-hoc and reactive
• Errors difficult to reproduce as usually
hours/days after the event and out of context
Introducing New Relic at thetrainline
• Zero capital outlay, subscription model, up and
running in an hour
• Identified a product: leisure website
• Continuous delivery pipeline with blue/green
deployments to all environments
• Needed solution for continuous monitoring
Introducing New Relic at thetrainline
• New Relic agent / server monitor part of
webserver recipe
• Deployed with high security enabled
• Out of the box
– Near-real time error logging / alerting
– Application / end-user performance
– Deployment markers
– User funnels
Immediate value
• Error rate as a team key performance
indicator
• Drive down error rate through weekly health
checks
• Remediate top three errors by adding directly
to dev team backlog
• Stack traces visible and actionable by
developers without further analysis
Taking it further
• Roll out New Relic across all machines in all
environments
– New machines created by Chef automation install
New Relic by default
– Else use SCCM to manage installation
Application/server monitoring built in and
zero effort for dev teams
Taking it further
Custom attributes
• Mimic high security mode in newrelic.config– Create and deploy Chocolatey package through Chef /
SCCM
• Observations:– New Relic .NET agent doesn’t check in to verify
highSecurity setting matches once it has started
<highSecurity enabled=“true” />
More value…
• Use custom attributes to augment Transaction
and PageView events with more information
to form other business metrics.
• Phoenix’s real-time payments dashboard
– Spread of payment methods
– Effect of payment outages
Users of New Relic at thetrainline
• Monitoring/Production Support for near
real time running health of system
• Product owners home in and use funnels to
prioritise product spend
• Developers get rapid feedback on new
features
• Management get a holistic view of the
system through the map feature
What we’d like to see
• Javascript errors in Insights
• Better Javascript stack traces
• Per application retention period in Insights
• .NET async support