49
Copyright © 2015 Splunk Inc. IT Service Intelligence The Hands-On Version Juergen Magiera Senior Architect, IT Operations Analytics

SplunkLive! London 2016 Get your service intelligence off to a flying start

  • Upload
    splunk

  • View
    143

  • Download
    0

Embed Size (px)

Citation preview

Copyright ©2015Splunk Inc.

ITServiceIntelligenceTheHands-OnVersion

JuergenMagieraSeniorArchitect,ITOperationsAnalytics

SetupBeforeYouCanPlayDownloadthispresentationslidedeck:https://splunk.box.com/v/ITSI-HandsOnFollowtheinstructions onyourpaperhand-out tologintoyourVM.

Pleaseloginaseither• [email protected][email protected]• Passwordis“Changeme1”or

“Changeme2”

After logging in,selectITServiceIntelligencefromthelistofappsattheleft

2

ITSICoreConcepts

3

WhatisaService?

ServiceRequestsResponses

InITSI,aService isalogicalgroupof technology components thatauserdeemsneedtobemonitored together.

Itcanoftenbegeneralizedasa“blackbox”whichwesend requests,andexpectresponses

4

WhatisaService?

DNS RequestsResponses

TechnicalServices

Auth RequestsResponses

Web RequestsResponses

Servicescanbelower level(technical)…

5

WhatisaService?

DNS RequestsResponses

TechnicalServices

CustomerTransactions

RequestsResponses

BusinessServices

Auth RequestsResponses

Web RequestsResponses

SupportDesk RequestsResponses

Servicescanalsobehigher level(business) …

6

WhatisaService?

PacketNetwork

HypervisorandHosts

RBMDBs

StorageTier

APIServices

WebServices

CustomerTransactions

Mobile

API/Middlew

are

PartnerPortal

DNS

Servicescanencompassmultiple tiersof theITdomain. Servicesmayalsodependuponother services

7

WhatisaKPI?

DNS RequestsResponses

KPI:NumberofrequestsKPI: ErrorrateKPI:AverageresponsetimeKPI: ServicerCPUloadKPI: ServernetworkI/Ferrors

CustomerTransactions

RequestsResponses

KPI:NumberoftransactionsKPI: ErrorrateKPI:AverageresponsetimeKPI: CountofIncidentTicketsKPI: SyntheticTransxHealth

KPIsandHealthscoresconstitutethemeansbywhichServicesaremonitored.

8

KeyPerformanceIndicators(KPIs)

9

AKeyPerformanceIndicator(KPI)isaSplunksavedsearchcreatedwithin theITSIUIthathelpsmonitor aspecificfieldlikeCPU,Memory, NumberofErrors

andsoon.KPIsarecontainedwithinServices.

ServiceHealthScores

10

AHealthscoreisascoreform0-100(0beingcriticaland100beingnormal)thathelpsdetermine thehealthofaService.ItiscalculatedbasedonallKPIs

importanceanditsstatus(e.g.green,orange, red),onceeveryminute.

ITSITour

11

ServiceDecompositioninITSI

12

CLICK“GlassTables”

ServiceDecompositioninITSI

13

CLICK(openinnewtab)“ButtercupGamesBusinessProcess”

ServiceDecompositioninITSI

14

CLICK(openinnewtab)“ButtercupGamesOnlineStore”

ServiceDecomposition(Refresher)

15

1-What isahigh-valuebusinessservice?(“OnlineStore”inButtercupGames)

ServiceDecomposition(Refresher)

16

1- Whatisahigh-valuebusinessservice? (OnlineStore)

2- Processflow,andunderlyingsub-services?(Web->Middleware ->DB->Middleware ->Web)

ServiceDecomposition(Refresher)

17

1- Whatisahigh-valuebusinessservice? (OnlineStore)

2- Processflow,andunderlyingsub-services? (Web->Middleware…)

3- Foreach(sub)service:KPIstoshowhealth&status?(Database:errors,SQLhits,responsetime,…)

ServiceDecomposition(Refresher)

18

1- Whatisahigh-valuebusinessservice? (OnlineStore)

2- Processflow&underlyingsub-services? (Web->Middleware…)

3- Foreach(sub)service: KPIs?(Database:errors,SQLhits,…)

4- ForeachKPI:NeedaSplunksearch(index=DB(warn*ORerror*)|statscount)

ServiceDecomposition(Refresher)

19

1- Whatisahigh-valuebusinessservice? (OnlineStore)

2- Processflow&underlyingsub-services? (Web->Middleware…)

3- Foreach(sub)service: KPIs?(Database:errors,SQLhits,…)

4- ForeachKPI:NeedaSplunksearch(index=DB(warn*ORerror*)|statscount)

ServiceDecomp:TheBusinessProcesses

20

ServiceDecomp:End-To-EndProcessFlow

21

NewRequirements!

22

● CreateanewKPIfortheDBService:● NetworkUtilization

● ModifytheExecutiveGlassTableinordertoshowofftheservicesyouslaveover

“WEonlyhaveabout15minTODOWHAT???!!???”

Thinkabouthowlongthiswouldtakeyoutoday?

23

ConfigurationofDBService

Click Configure >Click Services

Let’sTalkEntities

24

● SelectDBService

● Entitiesaretherelevantthingswhichsupportthisservice(usuallyhosts)

● Selecttherightentrieswithfilters,ANDs,ORs● OriginalEntitylistcancomefromCMDB,

spreadsheet,Splunksearch,others

AKPIin5minutes?Absolutely!

25

ClickNew– GenericKPI

NamethatKPI!

26

● Clickontitle● Callit“NetworkUtilization”,

withyourusernameupfront

Nowstart

27

SelectDataModel● HostOperatingSystem● Network● #bytes● Next

KPIContinued….

28

SplunkBuildsSearchesforyou–OhYeah,that’shappeningJ

● Select Yesfor Splitby& Filteroptions● Select hostfor EntityLookup& Aliasoptions● ClickNext

AlmostThere…

29

Select● KPISearchSchedule:EveryMinute● EntityCalculation:Average● Service/AggCalculation:Average● CalculationWindow:LastMinute● Next

● Unit:Bps● Next

FinalSteps…

30

Setyourthresholds● Aggregate(All)● PerEntity

● Click “AddThreshold”TWICE● MaketheNeapolitanicecreamcolors

Yellow,Green,Yellow● Dragtheslidersaroundinordertoget

thecurrentdatagraphentirelyinsidetheGreen(normal) band

● Finish● Otheroptionsarealsoavailable,

includingadaptivethresholdsandanomalydetection

AdaptiveThresholds

31

WhatifyourKPIdatalookslikethis?

32

AdaptiveThresholdsStaticthresholdswillnotwork…

33

AdaptiveThresholdsAdaptiveThresholdingworksbeautifullywithcyclicaldata

AnomalyDetection

34

● MachineLearning

● Workswellfordatawithpatterns

● Requiressome“training” (trial&error)

tozeroinonbestsensitivity

● Moresophisticated capabilitiescoming!(multivariate, morealgorithms, etc)

Let’sFixthatGlassTable

35

ClonetheGlassTable

36

ReturntoSavedGlassTablespage(click onGlassTablesintheuppermenubar)

CLICKEdit for“ButtercupGamesBusinessProcess”• SelectClone• Title:Addyourusername

tothefront• Permissions:SharedinApp• ClonePage

• Click onyournewGlassTablefromthelist,toviewit

Edit&HaveFun!

37

ClickonEdit intheupperrightcornerofyourGlassTable

Usethe“Services”panelonthe lefttoselectIndividualKPIs,orAggregateServiceHealthScores• Choose2KPIsfromOnlineStore thatwouldbeusefulin

the“OrderProcess”section• Dragtheselectedwidgetsontothecanvas,positioningin

thegrayoval

• What’sthedifferencebetweenthe

andtoolsatthetopleft?

MoreFunwiththeGlassTableEditor…

38

UsetheConfigurations panelontherighttoeditaselectedwidget• Canchangethevisualizationtype,drilldown

behavior,andothersettings

• YoushouldhitSave frequently• IwonderwhatAutoLayoutdoes?• (YIKES!)RevertAllChangesmightbehelpful

Finishingup…

39

• AddaServiceHealthScorewidgetforOnlineStoreunderButtercup

• ChooseaVizTypewithasparklinegraph,thenresizetomakeitlookpretty

• ModifytheCustomDrilldownactiontogotothesavedglasstable,ButtercupGamesOnlineStore

• BonusPoints:Makethelabelbigger,morereadable

• Save• Viewwhendone

ATroubleshootingExercise

40

Let’suseITSItotroubleshootanoutage● StartatyourGlassTable,“<UserName>ButtercupBusinessProcess”● CustomerCarereportsthatunhappycustomersarecomplainingoffailures

andlongdelayswhentryingtopurchase● Thecallsbegancominginataround40minutesafterthe(previous)hour.● IntheupperrightcorneroftheGlassTable,changethetimepickerfromNow

toXX:40:00.0,whereXXistheprevioushour. Forexample,ifitiscurrently14:05,setthetimepickerto13:40:00.0, thenApply

● Thisishowwecan“timetravel”backtoseeconditionsataparticularoutage– ohyeah!

ATroubleshootingExercise,cont’d

41

● TheOnlineStoreseemstobedegraded,justasCustomerCarereported.ClickonthewidgetunderButtercuptodrilldownfurther

ATroubleshootingExercise,cont’d

42

● TheOnlineStoreGlassTableshowsamuchmoredetailedview,includingtheimpactedcustomer-facingKPIsatthefarleft(Revenue,etc)

● Basedonthisviewofalltherelevantservices,wheredoyouthinktherootcauselies?

● Whichserviceshouldwetroubleshoot first?● ClickonHealthwidgetforthatservice,to

drilldowntoaDeepDive

DeepDive

43

● DeepDiveshowsmultipleKPIsandHealthScoresinparallel“swimlanes”.Theinitialtimespanshownis15minutes.

● TheHealthScoreforthisDBServiceisthetopswimlane.Canyouseewhenitbeginstodegradefrom100%?

● Mousingoverthispoint intime,canyouspottheKPIwiththeleadingfaultindication?I.e.,whatbustedfirst?

● Toimprovereadability,changethePrimaryTimeRange(lowerleftcorner)toPresets >Last60minutes

Multi-KPIAlertsandNotableEvents

44

● Click onNotableEventsReview● MultipleKPIsandHealthscorescan

becombinedinsophisticatedwaystocreateMulti-KPIalerts

● WhenaMulti-KPIalertfires,oneoftheoutcomesisthecreationofaNotableEvent

● NotableEventsallowNOCpersonnelandotherstotriageandcoordinateeventmanagementefforts

ServiceAnalyzer

45

● Click onServiceAnalyzer> DefaultServiceAnalyzer

● Backwherewestarted!● Thisviewshowsa“no-frills”listof

services(top)andhottestKPIs(bottom)

● ProvidesaquickjumpingoffpointintoDeepDivesandtheNotableEventsReview

● ItisusefulforNOCsandotherswhoneedahigh-levelsituationalview

Review

46

● High-valueservicescanbedecomposedandmodeledinITSI,usingmachinedatafromtherelevantsystems

● Services andKPIs canbecreatedinminutes,withsophisticatedthresholdingtechniquestodistinguish“normal”from“notnormal”

● GlassTablesallowservicehealthandKPImetricstobedisplayedinawaythatmakessensetospecificgroups,suchasExecutiveLeadership,BusinessServiceOwners,theNOC,DevOps&Others

● DeepDivesallowKPIstobecomparedside-by-sideacrossanytimerange,acceleratingrootcauseanalysisandsignificantlyreducingMTTR

● Multi-KPIAlertsandNotableEventsreducealertnoise,producingactionableeventsandameanstomanagethem

● …andit’sfuntobuild!

PLAYTIMEISOVER!Everyoneoutofthesandbox!

47

NOT!Youcanhaveyourveryown15-dayfreeevalsandbox,tocontinueplaying:

● http://splunk.com/ITSI Thenselect:

AndaGuidebooktohelpyouexploreITSI’scapabilities:● https://splunk.box.com/ITSI-Sandbox-Guidebook

48

SEPT26-29,2016WALTDISNEYWORLD,ORLANDOSWANANDDOLPHINRESORTS

• 5000+IT&BusinessProfessionals• 3daysoftechnicalcontent• 165+sessions• 80+CustomerSpeakers• 35+Apps inSplunk AppsShowcase• 75+Technology Partners• 1:1networking: AskTheExpertsandSecurityExperts,BirdsofaFeatherandChalkTalks

• NEWhands-on labs!• Expandedshowfloor, DashboardsControlRoom&Clinic,andMORE!

The7th AnnualSplunkWorldwideUsers’Conference

PLUSSplunkUniversity• Threedays:Sept24-26,2016• GetSplunkCertified forFREE!• GetCPE creditsforCISSP,CAP,SSCP• Savethousands onSplunk education!

ThankYou

JuergenMagieraSeniorArchitect,ITOperationsAnalytics