Upload
splunk
View
143
Download
0
Embed Size (px)
Citation preview
Copyright ©2015Splunk Inc.
ITServiceIntelligenceTheHands-OnVersion
JuergenMagieraSeniorArchitect,ITOperationsAnalytics
SetupBeforeYouCanPlayDownloadthispresentationslidedeck:https://splunk.box.com/v/ITSI-HandsOnFollowtheinstructions onyourpaperhand-out tologintoyourVM.
Pleaseloginaseither• [email protected]• [email protected]• Passwordis“Changeme1”or
“Changeme2”
After logging in,selectITServiceIntelligencefromthelistofappsattheleft
2
WhatisaService?
ServiceRequestsResponses
InITSI,aService isalogicalgroupof technology components thatauserdeemsneedtobemonitored together.
Itcanoftenbegeneralizedasa“blackbox”whichwesend requests,andexpectresponses
4
WhatisaService?
DNS RequestsResponses
TechnicalServices
Auth RequestsResponses
Web RequestsResponses
Servicescanbelower level(technical)…
5
WhatisaService?
DNS RequestsResponses
TechnicalServices
CustomerTransactions
RequestsResponses
BusinessServices
Auth RequestsResponses
Web RequestsResponses
SupportDesk RequestsResponses
Servicescanalsobehigher level(business) …
6
WhatisaService?
PacketNetwork
HypervisorandHosts
RBMDBs
StorageTier
APIServices
WebServices
CustomerTransactions
Mobile
API/Middlew
are
PartnerPortal
DNS
Servicescanencompassmultiple tiersof theITdomain. Servicesmayalsodependuponother services
7
WhatisaKPI?
DNS RequestsResponses
KPI:NumberofrequestsKPI: ErrorrateKPI:AverageresponsetimeKPI: ServicerCPUloadKPI: ServernetworkI/Ferrors
CustomerTransactions
RequestsResponses
KPI:NumberoftransactionsKPI: ErrorrateKPI:AverageresponsetimeKPI: CountofIncidentTicketsKPI: SyntheticTransxHealth
KPIsandHealthscoresconstitutethemeansbywhichServicesaremonitored.
8
KeyPerformanceIndicators(KPIs)
9
AKeyPerformanceIndicator(KPI)isaSplunksavedsearchcreatedwithin theITSIUIthathelpsmonitor aspecificfieldlikeCPU,Memory, NumberofErrors
andsoon.KPIsarecontainedwithinServices.
ServiceHealthScores
10
AHealthscoreisascoreform0-100(0beingcriticaland100beingnormal)thathelpsdetermine thehealthofaService.ItiscalculatedbasedonallKPIs
importanceanditsstatus(e.g.green,orange, red),onceeveryminute.
ServiceDecomposition(Refresher)
15
1-What isahigh-valuebusinessservice?(“OnlineStore”inButtercupGames)
ServiceDecomposition(Refresher)
16
1- Whatisahigh-valuebusinessservice? (OnlineStore)
2- Processflow,andunderlyingsub-services?(Web->Middleware ->DB->Middleware ->Web)
ServiceDecomposition(Refresher)
17
1- Whatisahigh-valuebusinessservice? (OnlineStore)
2- Processflow,andunderlyingsub-services? (Web->Middleware…)
3- Foreach(sub)service:KPIstoshowhealth&status?(Database:errors,SQLhits,responsetime,…)
ServiceDecomposition(Refresher)
18
1- Whatisahigh-valuebusinessservice? (OnlineStore)
2- Processflow&underlyingsub-services? (Web->Middleware…)
3- Foreach(sub)service: KPIs?(Database:errors,SQLhits,…)
4- ForeachKPI:NeedaSplunksearch(index=DB(warn*ORerror*)|statscount)
ServiceDecomposition(Refresher)
19
1- Whatisahigh-valuebusinessservice? (OnlineStore)
2- Processflow&underlyingsub-services? (Web->Middleware…)
3- Foreach(sub)service: KPIs?(Database:errors,SQLhits,…)
4- ForeachKPI:NeedaSplunksearch(index=DB(warn*ORerror*)|statscount)
NewRequirements!
22
● CreateanewKPIfortheDBService:● NetworkUtilization
● ModifytheExecutiveGlassTableinordertoshowofftheservicesyouslaveover
“WEonlyhaveabout15minTODOWHAT???!!???”
Thinkabouthowlongthiswouldtakeyoutoday?
Let’sTalkEntities
24
● SelectDBService
● Entitiesaretherelevantthingswhichsupportthisservice(usuallyhosts)
● Selecttherightentrieswithfilters,ANDs,ORs● OriginalEntitylistcancomefromCMDB,
spreadsheet,Splunksearch,others
KPIContinued….
28
SplunkBuildsSearchesforyou–OhYeah,that’shappeningJ
● Select Yesfor Splitby& Filteroptions● Select hostfor EntityLookup& Aliasoptions● ClickNext
AlmostThere…
29
Select● KPISearchSchedule:EveryMinute● EntityCalculation:Average● Service/AggCalculation:Average● CalculationWindow:LastMinute● Next
● Unit:Bps● Next
FinalSteps…
30
Setyourthresholds● Aggregate(All)● PerEntity
● Click “AddThreshold”TWICE● MaketheNeapolitanicecreamcolors
Yellow,Green,Yellow● Dragtheslidersaroundinordertoget
thecurrentdatagraphentirelyinsidetheGreen(normal) band
● Finish● Otheroptionsarealsoavailable,
includingadaptivethresholdsandanomalydetection
AnomalyDetection
34
● MachineLearning
● Workswellfordatawithpatterns
● Requiressome“training” (trial&error)
tozeroinonbestsensitivity
● Moresophisticated capabilitiescoming!(multivariate, morealgorithms, etc)
ClonetheGlassTable
36
ReturntoSavedGlassTablespage(click onGlassTablesintheuppermenubar)
CLICKEdit for“ButtercupGamesBusinessProcess”• SelectClone• Title:Addyourusername
tothefront• Permissions:SharedinApp• ClonePage
• Click onyournewGlassTablefromthelist,toviewit
Edit&HaveFun!
37
ClickonEdit intheupperrightcornerofyourGlassTable
Usethe“Services”panelonthe lefttoselectIndividualKPIs,orAggregateServiceHealthScores• Choose2KPIsfromOnlineStore thatwouldbeusefulin
the“OrderProcess”section• Dragtheselectedwidgetsontothecanvas,positioningin
thegrayoval
• What’sthedifferencebetweenthe
andtoolsatthetopleft?
MoreFunwiththeGlassTableEditor…
38
UsetheConfigurations panelontherighttoeditaselectedwidget• Canchangethevisualizationtype,drilldown
behavior,andothersettings
• YoushouldhitSave frequently• IwonderwhatAutoLayoutdoes?• (YIKES!)RevertAllChangesmightbehelpful
Finishingup…
39
• AddaServiceHealthScorewidgetforOnlineStoreunderButtercup
• ChooseaVizTypewithasparklinegraph,thenresizetomakeitlookpretty
• ModifytheCustomDrilldownactiontogotothesavedglasstable,ButtercupGamesOnlineStore
• BonusPoints:Makethelabelbigger,morereadable
• Save• Viewwhendone
ATroubleshootingExercise
40
Let’suseITSItotroubleshootanoutage● StartatyourGlassTable,“<UserName>ButtercupBusinessProcess”● CustomerCarereportsthatunhappycustomersarecomplainingoffailures
andlongdelayswhentryingtopurchase● Thecallsbegancominginataround40minutesafterthe(previous)hour.● IntheupperrightcorneroftheGlassTable,changethetimepickerfromNow
toXX:40:00.0,whereXXistheprevioushour. Forexample,ifitiscurrently14:05,setthetimepickerto13:40:00.0, thenApply
● Thisishowwecan“timetravel”backtoseeconditionsataparticularoutage– ohyeah!
ATroubleshootingExercise,cont’d
41
● TheOnlineStoreseemstobedegraded,justasCustomerCarereported.ClickonthewidgetunderButtercuptodrilldownfurther
ATroubleshootingExercise,cont’d
42
● TheOnlineStoreGlassTableshowsamuchmoredetailedview,includingtheimpactedcustomer-facingKPIsatthefarleft(Revenue,etc)
● Basedonthisviewofalltherelevantservices,wheredoyouthinktherootcauselies?
● Whichserviceshouldwetroubleshoot first?● ClickonHealthwidgetforthatservice,to
drilldowntoaDeepDive
DeepDive
43
● DeepDiveshowsmultipleKPIsandHealthScoresinparallel“swimlanes”.Theinitialtimespanshownis15minutes.
● TheHealthScoreforthisDBServiceisthetopswimlane.Canyouseewhenitbeginstodegradefrom100%?
● Mousingoverthispoint intime,canyouspottheKPIwiththeleadingfaultindication?I.e.,whatbustedfirst?
● Toimprovereadability,changethePrimaryTimeRange(lowerleftcorner)toPresets >Last60minutes
Multi-KPIAlertsandNotableEvents
44
● Click onNotableEventsReview● MultipleKPIsandHealthscorescan
becombinedinsophisticatedwaystocreateMulti-KPIalerts
● WhenaMulti-KPIalertfires,oneoftheoutcomesisthecreationofaNotableEvent
● NotableEventsallowNOCpersonnelandotherstotriageandcoordinateeventmanagementefforts
ServiceAnalyzer
45
● Click onServiceAnalyzer> DefaultServiceAnalyzer
● Backwherewestarted!● Thisviewshowsa“no-frills”listof
services(top)andhottestKPIs(bottom)
● ProvidesaquickjumpingoffpointintoDeepDivesandtheNotableEventsReview
● ItisusefulforNOCsandotherswhoneedahigh-levelsituationalview
Review
46
● High-valueservicescanbedecomposedandmodeledinITSI,usingmachinedatafromtherelevantsystems
● Services andKPIs canbecreatedinminutes,withsophisticatedthresholdingtechniquestodistinguish“normal”from“notnormal”
● GlassTablesallowservicehealthandKPImetricstobedisplayedinawaythatmakessensetospecificgroups,suchasExecutiveLeadership,BusinessServiceOwners,theNOC,DevOps&Others
● DeepDivesallowKPIstobecomparedside-by-sideacrossanytimerange,acceleratingrootcauseanalysisandsignificantlyreducingMTTR
● Multi-KPIAlertsandNotableEventsreducealertnoise,producingactionableeventsandameanstomanagethem
● …andit’sfuntobuild!
PLAYTIMEISOVER!Everyoneoutofthesandbox!
47
NOT!Youcanhaveyourveryown15-dayfreeevalsandbox,tocontinueplaying:
● http://splunk.com/ITSI Thenselect:
AndaGuidebooktohelpyouexploreITSI’scapabilities:● https://splunk.box.com/ITSI-Sandbox-Guidebook
48
SEPT26-29,2016WALTDISNEYWORLD,ORLANDOSWANANDDOLPHINRESORTS
• 5000+IT&BusinessProfessionals• 3daysoftechnicalcontent• 165+sessions• 80+CustomerSpeakers• 35+Apps inSplunk AppsShowcase• 75+Technology Partners• 1:1networking: AskTheExpertsandSecurityExperts,BirdsofaFeatherandChalkTalks
• NEWhands-on labs!• Expandedshowfloor, DashboardsControlRoom&Clinic,andMORE!
The7th AnnualSplunkWorldwideUsers’Conference
PLUSSplunkUniversity• Threedays:Sept24-26,2016• GetSplunkCertified forFREE!• GetCPE creditsforCISSP,CAP,SSCP• Savethousands onSplunk education!