Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Data Sheet
Elik Eizenberg, CTO and Co-Founder
Using BigPanda's Features forRoot Cause Analysis
OverviewThe ever-growing complexity, scale and pace-of-change of IT environments puts a huge burden on IT Ops,
NOC, and DevOps teams, who are tasked with keeping these environments up and running. One of the
biggest challenges is Root Cause Analysis (RCA). When something breaks, we need to determine what
broke it, and we need to do it fast.
Unfortunately, there is no silver-bullet for root cause analysis: there is simply no one technique that teams
can master to address all types of incidents and outages. Modern IT environments are just too chaotic to
allow for it. IT professionals should be wary when vendors pitch a single, miraculous technique for RCA.
BigPanda’s philosophy is different. We encourage operations teams to embrace the chaos of modern IT
environments, and design an RCA strategy that relies on a range of techniques. Each technique will be
2 | BigPanda’s Top-5 Root Cause Analysis Features
useful in different contexts, and sometimes multiple techniques will be needed. This datatsheet outlines
five BigPanda features that assist in Root Cause Analyis.
Incident Timeline
Root Cause Changes
IT incidents usually materialize in the form of
many symptoms across your monitoring
systems. There’s often a very distinct order of
events. For example, load issues on a database
are followed by transaction timeouts in your logs,
which are then followed by application-level
latency alerts.
BigPanda’s Incident Timeline was designed to
help you easily make sense of this “cascading”
effect, see how an incident evolved over time,
and determine the cause-and-effect relationships
between alerts.
Around 85% of incidents are caused by changes.
In other words, in the vast majority of cases, the
root-cause of an incident is a change in the IT
environment.
BigPanda aggregates change data from all your
change feeds and tools, including CI/CD, Change
Management and Auditing. It then uses Open-Box
Machine Learning technology to identify the
changes that likely caused the incident.
Note that correlation does not imply causation,
which is why we’ve developed a proprietary Causal
Inference algorithm that can distinguish between
the two cases.
3 | BigPanda’s Top-5 Root Cause Analysis Features
2. Root Cause Changes
1. Incident Timeline
Dynamic Smart TitlesBigPanda’s Open-Box Machine Learning engine
automatically correlates related alerts into high-level
incidents. It then verbalizes the relationship
between these alerts, to help identify the common
denominator.
For example, if ten VMs experience load issues
because of a problem with one underlying physical
server, BigPanda will automatically display the
problematic physical server's name as the title of the
incident. Or if twenty devices report connectivity
issues due to one faulty upstream network switch,
BigPanda will display the switch's name as the title.
What makes this feature even more powerful is that the incident title is updated in real-time when new
alerts are added to the incident. This ensures operators have ongoing situational awareness.
Incident TopologyFew things are as difficult as navigating the dependencies
of a modern application, which is comprised of
microservices, databases, queues, storage, cloud and
on-prem servers, network devices, etc.
BigPanda makes it easy to understand these
dependencies in the context of a specific incident.
The Topology tab visualizes all the affected components
and their respective relationships.
While other vendors provide topology views, BigPanda
is unique in that it pieces together the topology by
merging information from configuration management
platforms, orchestration tools, APMs and CMDBs into a real-time topology mesh. This gives you unparalleled
visibility across the stack, encompassing applications architectures, cloud environments and physical networks.
4 | BigPanda’s Top-5 Root Cause Analysis Features
4. Incident Topology
3. Dynamic Smart Titles
5 | BigPanda’s Top-5 Root Cause Analysis Features
SummaryResolving incidents quickly and reducing MTTR requires that you detect incidents as close as possible to
real-time, investigate them and isolate their root-cause quickly, and then kick off remediation/orchestration
actions to resolve them. BigPanda provides a suite of root cause analysis features that allow IT Ops, NOC
and DevOps teams to develop an RCA strategy based on their needs and the different contexts in which
they operate.
Deep Links5. Deep LinksAs previously mentioned, there is no silver bullet for
Root Cause Analysis. While BigPanda offers users
several powerful RCA features, we also acknowledge
that there’s a vast number of RCA techniques offered
by other vendors. Therefore BigPanda enables
operators to dive quickly into any other tool that
might provide additional RCA insight.
The Deep Links feature turns BigPanda into an
intelligent gateway for operational context. The
relevant dashboard in a monitoring tool, a related
search in a log management tool, or an
appropriate runbook article in a knowledge base -
all of these are always just a click away in BigPanda.
www.bigpanda.io (650) 562-6555 [email protected]
About BigPandaBigPanda helps IT Ops, NOC and DevOps teams detect, investigate, and resolve IT incidents and outages,
faster and more easily than ever before.
Powered by Open Box Machine Learning, BigPanda captures alerts, changes and topology data from all
your disparate tools and uses machine learning to reduce IT noise, detect incidents and outages, and
surface their probable root cause, in real time.
Customers such as Intel, TiVO, Turner Broadcasting and Workday rely on BigPanda to reduce their operating
costs, improve service availability and performance, and de-risk and accelerate their digital transformation
initiatives.