Next-Generation Data- Driven Monitoring and Analytics ... Yeung.pdf · (kyeung@splunk.com) What is...

Preview:

Citation preview

Copyright © 2015 Splunk Inc.

Next-Generation Data-

Driven Monitoring and

Analytics Platform with

Splunk IT Service Splunk IT Service

Intelligence

Kelvin Yeung, Splunk

(kyeung@splunk.com)

What is IT Operations?

2

Keep an Eye on

All Systems

Figure Out when

things break

Fix Them

Paradigm Shift in IT Operation Management

First Age (~1988)

Little Data

Faults Only

Amount of Data

What’s Measured

Next Generation

Big (Lots of) Data

Everything

3

Components Level

Extreme

You Name Them!

Monitor Focus

Skill Required

Tools

Service (Every) Level

Reasonable

Splunk ITSI

First Age: Focus on Components

4

We’ve monitored our cars in the same way

5

We’ve monitored our cars in the same way

6

First Age: Focus on Faults

Ignore all the good things,

Focus on only the bad things,

7

Ignore all the good things,

Focus on only the bad things,

Just like Dad

The New Age of IT Operation ManagementThe New Age of IT Operation Management

DATA CENTRICDATA CENTRIC

APPROACHAPPROACH

DATA CENTRICDATA CENTRIC

APPROACHAPPROACH

TECHNOLOGY

DRIVEN

TECHNOLOGY

DRIVEN

NOT GOOD

ENOUGH

NOT GOOD

ENOUGH

Just Not Good Enough

9

Component-level Health without

Service Context (“Big Picture”)The ”Big Pictures” without

Correlation to Component-level

details

Reality Check: Cutting Edge Technologies

Identity

VPN

IP Phone

HR

Email

Finance

App Svr

DB

Web Svr SaaS/PaaS

10

SERVERS STORAGE NETWORKING

VIRTUALIZATION

INFRASTRUCTUREAPPLICATIONS

PACKAGED APPLICATIONS

CUSTOM APPLICATIONS

VPN Email DB

IaaS

Visibility Across All Layers

Correlations & Drill Down Capability AppsServersNetwork

of Your Application and Technology Stack

Application-

Based Silos

NetworkStorage

Zones of Virtualization Private Cloud Hybrid Cloud

Traditional Approach: Collect Only Required

CPU Utilization

HTTP Status

12

Web Server Memory

Consumption

Access Count

Service

Up/Down

Data Centric Approach: Collect Everything

CPU Utilization

HTTP Status

13

Web Server Memory

Consumption

Access Count

Service

Up/Down

Bad URL Access

Is the Data

Available or

Flexible to

Monitor?

Data Centric Approach: Big Data Analytics

● Hard-coded creates false positive (For Example: > 85% CPU

Utilization = Not Normal always)

● Collect, ingest historical and current data to learn the Normal of

your business14

The Next Generation

A Few Metric from Brittle Instrumentation

15

Look at “Bad News” Data

Focus on Up & Down

The Next Generation

A Few Metric from Brittle Instrumentation

LOTS of Metrics from Machine Data

Look at “Bad News” Data

16

Look at “Bad News” Data

Look at Everything (and Behavior)

Focus on Up & Down

Focus on Normal vs Not Normal

Data-driven service insights

for root-cause isolation and improved service operations

Turning Machine Data Into Business Value

Index Untapped Data: Any Source, Type, Volume

Online

ServicesWeb

Services

ServersSecurity

GPS

LocationPackaged

On-

Premises

Ask Any Question

Application Delivery

IT Operations

18

StorageDesktops

Networks

Packaged

Applications

Custom

ApplicationsMessaging

TelecomsOnline

Shopping

Cart

Web

Clickstreams

Databases

Energy

Meters

Call Detail

Records

Smartphones

and Devices

RFID

Private

Cloud

Public

Cloud

Security, Compliance, and Fraud

Business Analytics

Industrial Data andthe Internet of Things

Splunk IT Service IntelligenceData-driven service monitoring and analytics

Dynamic Service Models

Dynamic ThresholdsMachine Learning

Root Cause Deep Dive Analysis

Simplified Incident Workflows

19

SPLUNK IT SERVICE INTELLIGENCE

Time-Series Index

Platform for Machine Data

Schema-on-Read Data ModelCommon

Information Model

Achieve Service Visibility Faster

Service Analyzer

High-level view of services and composite health scores

Glass Tables

Personalized visualizations of your services

Deep Dives

Organized view of performance indicators across silos

Multi KPI Alerts

Correlation rules to generate notable events

Notable Events

Easy-to-understand report on results of correlation searches

Anomaly Detection and Adaptive Thresholds

Machine learning to baseline normal operations and

identify anomalous behavior

20

Copyright © 2015 Splunk Inc.

Demo

21

Demo

Paradigm Shift in IT Operation Management

First Age (~1988)

Little Data

Faults Only

Amount of Data

What’s Measured

Next Generation (Now)

Big (Lots of) Data

Everything

22

Components Level

Extreme

You Name Them!

Monitor Focus

Skill Required

Tools

Service (Every) Level

Reasonable

Splunk ITSI

23

Thank YouThank You

Recommended