Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Copyright©2016Splunk Inc.
RandylLongmireSeniorOperationsEngineer,SurescriptsLLC
FindingStrawinaHayFieldTheArtofDevOpsLogFarming
Disclaimer
2
Duringthecourseofthispresentation,wemaymakeforwardlookingstatementsregardingfutureeventsortheexpectedperformanceofthecompany.Wecautionyouthatsuchstatementsreflectourcurrentexpectationsandestimatesbasedonfactorscurrentlyknowntousandthatactualeventsorresultscoulddiffermaterially.Forimportantfactorsthatmaycauseactualresultstodifferfromthose
containedinourforward-lookingstatements,pleasereviewourfilingswiththeSEC.Theforward-lookingstatementsmadeinthethispresentationarebeingmadeasofthetimeanddateofitslivepresentation.Ifreviewedafteritslivepresentation,thispresentationmaynotcontaincurrentoraccurateinformation.Wedonotassumeanyobligationtoupdateanyforwardlookingstatementswemaymake.Inaddition,anyinformationaboutourroadmapoutlinesourgeneralproductdirectionandissubjecttochangeatanytimewithoutnotice.Itisforinformationalpurposesonlyandshallnot,beincorporatedintoanycontractorothercommitment.Splunkundertakesnoobligationeithertodevelopthefeaturesor
functionalitydescribedortoincludeanysuchfeatureorfunctionalityinafuturerelease.
Agenda
3
IntroductionsWhereintheDevOpscyclethissessionisfocusedTurningthe‘hayfield’oflogentriesintoavaluableresourceusingSplunksoftwareQueries,transactions,alerts,andautomationSummaryWhat’snext?
Whatarewedoinghere?
WhoisSurescripts?SurescriptsisHowHealthcareGetsConnected.
Anationwidehealthinformationnetworksecurelyconnectingdoctors’offices,hospitals,pharmacists,andhealthplansthroughanintegratedandtechnologyneutralplatform.
• Wepartnerwithmorethan700EHRapplicationsusedbyover900,000healthcareprofessionalsandmorethan1,000hospitals,impactingmorethan270millioninsuredlives.
• Weprocessmorethan6billiontransactionseachyear,includingnearly700millionmedicationhistories,morethan1billione-prescriptionsandnearly10millionclinicalmessages.
WhoisRandylLongmire?
• SeniorOperationsEngineer(10yrs.)atSurescripts• BornandraisedintheNorthwestUnitedStates• 14yearsinHealthcareTechnology• 20+yearsinComputerSupport,SystemsandOperations
The‘ProblemResolver’
TheScopeofthisSession
6
ServerandsoftwaredeploymentsMonitoringofserverandapplicationhealthTroubleshootingandproblemresolution– System/OSErrors– ApplicationandDatabaseErrors– HTTP/SMTPCommunicationFailures
WhichelementsoftheDevOpscyclearewefocusingon?
7
WearealertedtoanHTTP500erroronasinglesiteUsingatexteditororlogparser,wemanuallysearchforanythingthatlookslikeanerroraroundthetimethatitwasreportedWethenmanuallycorrelatethiserrorwithlogsfromrelatedsystemsaroundthesametimestamp
Scenario1:ErrorTroubleshootingTroubleshootingtheoldway
8
Theanatomyofthehaystack
9
Modernmethodsoffindinganeedleinthehaystack
10
Typicalmethodoffindinganeedleinthehaystack
11
Fromahaystacktoahayfield
SurescriptsHostedWebApps• 1800servers(haystacks)• 68millionlogentriesdaily
FindingStrawinaHayFieldWithSplunk Software
12
Queryscopecanrangefromveryfocusedtoverygeneral
Querytofindthestring“TimeoutException”inasinglelogtypeonasingleserver:
Querytofindthestring“TimeoutException”inanylogonanyserverwithinthe‘webApps’index:
UsingTimechart toGraphtheHistory
13
Querytofindthestring“TimeoutException”andvisualizethefrequencythroughtime:
Usingthesamequery,wecannowlookintothepast
Scenario2:Alerts
14
NowthatwehavethepowerofSplunk queriesavailable,wecanusethemtocreateproactivealerts.WhenthesameTimeoutExceptionerroroccurs,wecannowbealertedtoitimmediatelyaswellastriggerotheractions.
UsingAlertsforproactivemonitoring
15
UsingAlertsforErrorDetection
Step1:DefinetheQuery
16
UsingAlertsforErrorDetection
Step2:SettheAlertTypeandTriggerConditionScheduledorReal-TimeTriggerbasedonresultcounts
17
UsingAlertsforErrorDetection
Step3:SpecifyTriggerActionsLogeventsSendanEmailRunascriptOpenaServiceNow incidentPOSTtoawebhook URLetc
Scenario3–Automation
18
TheProblem:– Partnerapplicationhasversiondependencieswithourapplication– Whenonesideupgrades,theconnectivityisbrokenuntiltheotherside
upgradesSolutionbeforeSplunk– Supportticketisopenedrequestinganupgradeonorafteracertaindate– ConnectivitywouldbebrokenforanywherefromhourstodaysSolutionwithSplunk– PowerShellscriptrunsqueryagainstSplunkAPIevery30minutes– Whenanunsupportedversionerrorisdetectedinthelogsfromanyofthe1800
servers,theupgradeforthatserverisqueuedautomatically
UsingPowerShellwiththeSplunkRESTAPI
19
#region Variables# Splunk Server Address
[string]$SplunkServer = "https://splunk.example.com:8089"# Splunk API Username
[string]$SplunkAPIUser = "splunkAPI"# Limit the number of results
[int]$resultLimit = 1000# Splunk Search String
[string]$SearchString = "search index=""webapps"" source=""*webApp.log"" ""unsupported ver*"" | stats count as ErrorCount by host | head $resultLimit"# Value for time frame from now to search
[string]$incrementValue = "-5m" # -5m,-5h,-5d, etc# Seconds to wait for the job to complete
[int]$timeLimit = 60 # Generate hashtable of body contents used to perform the search
$RestBody = @{search=$SearchStringoutput_mode="json"earliest_time="$incrementValue"}
#endregion Variables
Step0:DeclarethesearchqueryandparametersUsingtheRESTAPIwithPowerShell
20
# Get a session Key from Splunk API#region GetSessionKey
$object = @{"username" = $SplunkAPIUser"password" = $(GetSecret $SplunkAPIUser)}try {
$token = Invoke-RestMethod -Uri "$SplunkServer/services/auth/login/" -Body$object -Method Post}catch {
log "Error getting Splunk login token. $($_.Exception)"exit
}$header = @{"Authorization" = "Splunk $($token.response.sessionKey)"}
#endregion GetSessionKey
Step1:GetanAPISessionKeyUsingtheRESTAPIwithPowerShell
21
# Submit the search job#region SubmitJob
try { $JobID = (Invoke-RestMethod -Method Post -Uri
"$SplunkServer/services/search/jobs/" -Headers $header -Body$restBody -ErrorAction Stop).sid}catch {
log "Error submitting Query to Splunk API. Please check your search parameters and try again."
log "Error Detail: $_"exit
}#endRegion SubmitJob
Step2:SubmitthesearchjobUsingtheRESTAPIwithPowerShell
22
$JobStatus = Invoke-RestMethod -Method Post -Uri "$SplunkServer/services/search/jobs/$jobID"-Headers $header -ErrorAction Stop# Wait for job to completeWhile (((($JobStatus.entry.content.dict.key | where {$_.name -eq "dispatchState"})."#text") -ne "DONE") -and (!($timeOut))){
If (((New-TimeSpan $startTime (Get-Date))).totalSeconds -gt $timeLimit) {log "Timeout exceeded!"# Delete the job and exittry {
$JobDelete = Invoke-RestMethod -Method DELETE -Uri"$SplunkServer/services/search/jobs/$jobID" -Headers $header -ErrorAction Stop
exit}catch {
exit}
}sleep -Seconds 1$JobStatus = Invoke-RestMethod -Method Post -Uri
"$SplunkServer/services/search/jobs/$jobID" -Headers $header -ErrorAction Stop}
Step3:WaitforthejobtocompleteUsingtheRESTAPIwithPowerShell
23
# Get search results from the job######################
try{$JobResults = Invoke-RestMethod -Method Get -Uri
"$SplunkServer/services/search/jobs/$jobID/results?output_mode=json&count=0" -Headers $header -ErrorAction Stop
log "$($JobResults.results.Count) Search Results received"}catch {
log "Could not obtain search results from Splunk API. $_"}
# Delete the jobtry {
$JobDelete = Invoke-RestMethod -Method DELETE -Uri"$SplunkServer/services/search/jobs/$jobID" -Headers $header -ErrorAction Stop
log "Job Deleted successfully"}catch {
log "Could not delete Splunk job ($JobID). $_"}
}
Step4:GettheresultsUsingtheRESTAPIwithPowerShell
24
# Process the search results# We could also limit this list by only returning results with an ErrorCount over a specified number[array]$results = $JobResults.results.host
ForEach($hostname in $results) {# Do some automation based on the results, in our case queue the
server for an upgrade.}
Step5:ProcesstheresultsUsingtheRESTAPIwithPowerShell
BonusScenario– LogReadability
25
UsingtransactionstocreatereadableSMTPlogs
TheProblem:– CustomerreportsanSMTPmessagewassentbutneverdelivered
SolutionbeforeSplunk– SupportticketisopenedreportingSMTPdetailsformissingmessage– ManuallysearchingandparsingSMTPlogsforhours,iftheystillexist
SolutionwithSplunk– Usea‘transaction’querytodisplayallcommunicationthreadsfromsender’sIP
ReadingSMTPcommunicationbeforeSplunk
26
BeforeSplunk:
ConvertingSMTPlogsintoreadabletransactions
27
index=webApps sourcetype=iis c_ip="74.125.69.27"|transactionc_ip startswith=EHLOendswith=QUITmaxspan=4s
AfterSplunk:
Summary
28
BeforeSplunk– Logfilesweremoreofamanagementtaskthanausefultool– Themanualprocessoflogparsingwastediousandtime-consuming
WithSplunk– LogfilesareanempoweringresourceacrossallaspectsofDevOps– Queriescantargetabroadscopeorlaserfocusforerroridentificationand
troubleshooting– Alertsprovidepro-activemonitoringandautomation– Timecharts enablegraphingfordashboardsandhistoricaldata– TheAPIopensthepowerofSplunk tocountlessotherapplications
What’sNext?
29
BeCreative– Splunk’s applicationsexpandwithyourimagination
BeCollaborative– UsetheSplunk communitytools
BeAdventurous– Discovernewcommandsandmethodsandwaystheycanbeapplied
BeInspired– Adaptandtransformexistingsolutionsintonewandexcitingtools
THANKYOU