Upload
carlos-goff
View
31
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Advanced Management Technologies For Exchange 5.5 Greg Todd Program Manager NT Solutions Group BMC Software, Inc. Agenda. Current issues with problem diagnosis Application availability timeline Theory of root cause analysis (RCA) Primer on RCA How RCA can help you today - PowerPoint PPT Presentation
Citation preview
Advanced Management Advanced Management Technologies For Technologies For Exchange 5.5Exchange 5.5
Greg ToddGreg ToddProgram ManagerProgram ManagerNT Solutions GroupNT Solutions GroupBMC Software, Inc.BMC Software, Inc.
AgendaAgenda
Current issues with Current issues with problem diagnosisproblem diagnosis Application availability timelineApplication availability timeline
Theory of root cause analysis (RCA)Theory of root cause analysis (RCA) Primer on RCAPrimer on RCA
How RCA can help you todayHow RCA can help you today Demos of RCA on Exchange 5.5Demos of RCA on Exchange 5.5
Systems management visionSystems management vision Management maturity curveManagement maturity curve The future of Exchange managementThe future of Exchange management
The Business ProblemThe Business Problem
Event automation #1 priority of Event automation #1 priority of IT executivesIT executives
Problem diagnosis is a critical Problem diagnosis is a critical aspect that requires attentionaspect that requires attention Wasted TimeWasted Time
80% of down time spent diagnosing80% of down time spent diagnosing20% of time spent fixing20% of time spent fixing
Wasted ResourcesWasted ResourcesDiagnosis often a finger-pointing exerciseDiagnosis often a finger-pointing exercise
Frustrated UsersFrustrated UsersUsers have no idea what to expectUsers have no idea what to expect
Gartner, 1998
Application Availability Application Availability TimelineTimeline
MonitoringMonitoring AnalysisAnalysis RecoveryRecovery EvolutionEvolution
PoF
Point ofPoint of
FailureFailure
PoD
Point ofPoint of
DiagnosisDiagnosis
PoR
Point ofPoint of
RecoveryRecovery
PoP
Point ofPoint of
PostmortemPostmortem
PoN
Point ofPoint of
NotificationNotification
Application Availability Application Availability TimelineTimeline
PoFPoF PoNPoN PoDPoD PoRPoR
Application Violating Service LevelApplication Violating Service Level
MonitoringMonitoringRoot CauseRoot Cause
AnalysisAnalysisRecoveryRecovery
PoPPoP
EvolutionEvolution
timetime
Application Availability Application Availability TimelineTimeline
PoFPoF PoNPoN PoDPoD PoRPoR
Application Violating Service LevelApplication Violating Service LevelSignificant DecreaseSignificant Decrease
MonitoringMonitoringRoot CauseRoot Cause
AnalysisAnalysis
Diagnosis Time ReducedDiagnosis Time Reduced
RecoveryRecovery
PoPPoP
EvolutionEvolution
timetime
FasterFasterServiceService
RestorationRestoration
Benefits Of RCABenefits Of RCA
Based on well-established theoriesBased on well-established theories Quicker problem resolutionQuicker problem resolution
Problem isolation saves resources to address Problem isolation saves resources to address the the realreal problem problem
Symptom filtering allows administrator to Symptom filtering allows administrator to ignore ignore sympathetic eventssympathetic events
Performs Performs teststests to find the root cause to find the root cause Far superior to rules-based approachFar superior to rules-based approach Key enabler to make systems Key enabler to make systems
self-sufficientself-sufficient Provides impact analysis capability Provides impact analysis capability
RCARCAKey conceptsKey concepts
SymptomsSymptoms are problems to are problems tobe investigatedbe investigated
FaultsFaults are the root causes of are the root causes ofthese symptomsthese symptoms
TestsTests are active tasks which are active tasks whichgather informationgather information
RCA is a RCA is a problem analysis methodologyproblem analysis methodology geared geared towards finding the real cause of a problem and towards finding the real cause of a problem and preventing it from happening again.preventing it from happening again.
RCA is a RCA is a problem analysis methodologyproblem analysis methodology geared geared towards finding the real cause of a problem and towards finding the real cause of a problem and preventing it from happening again.preventing it from happening again.
Rules-Based Approach Vs. RCARules-Based Approach Vs. RCA
Rules-BasedRules-Based Symptom receivedSymptom received
Possible causes looked Possible causes looked up in aup in a fixed table of rulesfixed table of rules
Set of Set of possible causespossible causes presented to userpresented to user
Only Only suggestedsuggested actionsactions can be provided to usercan be provided to user
Rules-BasedRules-Based Symptom receivedSymptom received
Possible causes looked Possible causes looked up in aup in a fixed table of rulesfixed table of rules
Set of Set of possible causespossible causes presented to userpresented to user
Only Only suggestedsuggested actionsactions can be provided to usercan be provided to user
Root Cause AnalysisRoot Cause Analysis Symptom receivedSymptom received
Possible causes Possible causes determined from a determined from a genericgeneric fault model fault model
Each cause is Each cause is testedtested against suspectsagainst suspects
Actual root causeActual root cause is is presented to user after presented to user after suspects are eliminatedsuspects are eliminated
SpecificSpecific actionsactions can be can be provided to userprovided to user
Root Cause AnalysisRoot Cause Analysis Symptom receivedSymptom received
Possible causes Possible causes determined from a determined from a genericgeneric fault model fault model
Each cause is Each cause is testedtested against suspectsagainst suspects
Actual root causeActual root cause is is presented to user after presented to user after suspects are eliminatedsuspects are eliminated
SpecificSpecific actionsactions can be can be provided to userprovided to user
Root Cause AnalysisRoot Cause AnalysisFor Exchange ServerFor Exchange Server
IP NetworkIP NetworkWindows NTWindows NT
Exchange ServerExchange Server
Three components that work synergisticallyThree components that work synergisticallyThree components that work synergisticallyThree components that work synergistically
High Level RCA ArchitectureHigh Level RCA Architecture
EnterpriseEnterpriseConsoleConsole
Mid-LevelMid-LevelManagerManager
ManagedManagedNodeNode
ManagedManagedNodeNode
ManagedManagedNodeNode
RCA Architecture RCA Architecture BMC PATROLBMC PATROL Exchange Server Exchange Server
and OS KMsand OS KMs
KMKM
KMKMRTEPRTEP
ARBARB
Managed NodeManaged Node
KMKM
KMKMRTEPRTEP
ARBARB
Managed NodeManaged Node
MonitorMonitor
OtherOtherCustomCustom
ARBARB
Managed NodeManaged Node
Diagnostic Diagnostic
KMKM
RCA EngineRCA Engine
Javalink BridgeJavalink Bridge
RealtimeRealtimeEvent ProxyEvent Proxy
Agent RequestAgent RequestBrokerBroker
ProtocolProtocolLayerLayer
BridgeBridge
Mid-Level ManagerMid-Level Manager
Mid-level agentMid-level agent
EnterpriseConsole
Root Cause AnalysisRoot Cause AnalysisSample problemSample problem
BridgeheadBridgeheadServerServer
BridgeheadBridgeheadServerServer
Exchange Exchange Server DServer D
Remote Remote Office Office
Exchange Exchange ServerServer
InboundInboundServerServer
OutboundOutboundServerServer
Firewall Firewall Firewall Firewall To InternetTo Internet
LegendLegendInternal MailInternal Mail
Internet MailInternet Mail
Internal & Internal & Internet MailInternet Mail
T1 Link to T1 Link to Remote OfficeRemote Office
InboundInbound
MessagesMessages
OutboundOutbound
MessagesMessages
Exchange Exchange Server CServer C
Exchange Exchange Server AServer A
Exchange Exchange Server BServer B
BridgeheadBridgeheadServerServer
BridgeheadBridgeheadServerServer
PATROL RCAPATROL RCASample problemSample problem
Symptom received by modelSymptom received by model Queue Growth Alarms fromQueue Growth Alarms from
multiple Exchange Servers multiple Exchange Servers
CPU UsageCPU Usage
HighHigh
CPU UsageCPU Usage
HighHighMemoryMemory
BottlenecksBottlenecks
MemoryMemory
BottlenecksBottlenecksMTA down on MTA down on
target machinetarget machine
MTA down on MTA down on
target machinetarget machineNetwork Network
ProblemProblem
Network Network
ProblemProblem
Queue GrowthQueue Growth
on Server Aon Server A
Queue GrowthQueue Growth
on Server Aon Server AQueue GrowthQueue Growth
on Server Bon Server B
Queue GrowthQueue Growth
on Server Bon Server BQueue GrowthQueue Growth
on Server Con Server C
Queue GrowthQueue Growth
on Server Con Server CQueue GrowthQueue Growth
on Server Don Server D
Queue GrowthQueue Growth
on Server Don Server D
Suspected root causes found in Suspected root causes found in modelmodel
PATROL RCAPATROL RCASample problemSample problem
Suspected root causes testedSuspected root causes tested
MTA down onMTA down on
target machinetarget machine
MTA down onMTA down on
target machinetarget machineMemoryMemory
BottlenecksBottlenecks
MemoryMemory
BottlenecksBottlenecksCPU Usage
HighNetworkNetwork
ProblemProblem
NetworkNetwork
ProblemProblem
Root cause isolatedRoot cause isolated CPU usage high on bridgeheadCPU usage high on bridgehead
CPU UsageCPU Usage
HighHigh
CPU UsageCPU Usage
HighHighMemoryMemory
BottlenecksBottlenecks
MemoryMemory
BottlenecksBottlenecksMTA down onMTA down on
target machinetarget machine
MTA down onMTA down on
target machinetarget machineNetworkNetwork
ProblemProblem
NetworkNetwork
ProblemProblem
???? ???? ???? ????
DemoDemo
Simple RCA ScenarioSimple RCA Scenario
Sample Generic Fault ModelSample Generic Fault Model
Sample Specific Fault ModelSample Specific Fault Model
Sample Specific Fault ModelSample Specific Fault ModelClose-upClose-up
DemoDemo
RCA EngineRCA EngineCausal Directed GraphsCausal Directed Graphs
DemoDemo
Root Cause AnalysisRoot Cause AnalysisExchange, NT, IP NetworkExchange, NT, IP Network
DemoDemo
Impact AnalysisImpact AnalysisExchange, NT, IP NetworkExchange, NT, IP Network
Benefits Of RCABenefits Of RCA
Based on well-researched theoriesBased on well-researched theories Quicker problem resolutionQuicker problem resolution
Problem isolation saves resources to Problem isolation saves resources to address the address the realreal problem problem
Symptom filtering allows administrator Symptom filtering allows administrator to to ignoreignore sympathetic events sympathetic events
Performs Performs teststests to find the root cause to find the root cause Far superior to rules-based approachFar superior to rules-based approach Key enabler to make systems Key enabler to make systems
self-sufficientself-sufficient Provides impact analysis capability Provides impact analysis capability
Systems Management Systems Management VisionVision
Where’s all this stuff going?Where’s all this stuff going?
MONITOR MONITOR
VIRTUALIZEVIRTUALIZE
MANAGE MANAGE
CONTROL CONTROL
STABILIZESTABILIZE
Phases Of Management MaturityPhases Of Management Maturity
Applies directly to Applies directly to management of complex management of complex software systemssoftware systems
Based on commonly Based on commonly known process known process control theorycontrol theory
Maturity PhasesMaturity Phases
Monitoring is Monitoring is plumbingplumbing Included with Windows 2000 and Included with Windows 2000 and
Exchange 2000Exchange 2000
Server-centric data and event Server-centric data and event collectioncollection Monitors component and system dataMonitors component and system data No awareness of other systems or appsNo awareness of other systems or apps
Basic alerting, scripting, and actionsBasic alerting, scripting, and actions WMI, PerfMon, HealthMon,WMI, PerfMon, HealthMon,
Exchange 2000 monitoringExchange 2000 monitoring
MONITOR MONITOR
Maturity PhasesMaturity Phases
Application-specific and Application-specific and server-centricserver-centric View and take action on componentsView and take action on components Availability and performance monitoringAvailability and performance monitoring Rich reportingRich reporting
Application SLA definitionApplication SLA definition ASAP resolution when out of complianceASAP resolution when out of compliance Most correlation done in your headMost correlation done in your head
Some tools have reached this levelSome tools have reached this level Key enabler to Control phaseKey enabler to Control phase
MANAGE MANAGE
Maturity PhasesMaturity Phases
Places Places system automationsystem automation in control in control Provides holistic view of systemsProvides holistic view of systems Enables high level of SLA complianceEnables high level of SLA compliance
Quick problem diagnosisQuick problem diagnosis Action <--> ReactionAction <--> Reaction Proactive correction before users Proactive correction before users
feel impact feel impact
Management automation maturingManagement automation maturing
CONTROL CONTROL
Maturity PhasesMaturity Phases
Provides Provides utility-levelutility-level service service Reliable as electric, telephone, waterReliable as electric, telephone, water Assures continuous application serviceAssures continuous application service
ClustersClusters Built-in fault tolerance, re-routing, Built-in fault tolerance, re-routing,
workload managementworkload management Failure does not impact serviceFailure does not impact service
Prediction / impact analysisPrediction / impact analysis Awareness of impact on SLAs caused Awareness of impact on SLAs caused
by planned changesby planned changes
STABILIZESTABILIZE
Maturity PhasesMaturity Phases
The system The system learns learns how to intelligently how to intelligently deal with various issuesdeal with various issues
Automatic everythingAutomatic everything Actions and responses for the IT groupActions and responses for the IT group Alerts and communicationsAlerts and communications Acquires and stores knowledge for Acquires and stores knowledge for
future referencefuture reference Uses policy engines to control actionsUses policy engines to control actions
Systems become truly self-sufficientSystems become truly self-sufficient User becomes self-servicedUser becomes self-serviced
VIRTUALIZE VIRTUALIZE
Virtualization ExampleVirtualization ExampleProblem Research AssistantProblem Research Assistant
Correlates problem root cause Correlates problem root cause diagnoses with:diagnoses with: Previous resolutionsPrevious resolutions - presents the user - presents the user
with previous remedies based on exact with previous remedies based on exact matches or best guessmatches or best guess
On-line technical documentationOn-line technical documentation - integrates - integrates with vendor-supplied support documentation with vendor-supplied support documentation (e.g. Microsoft Knowledge Base articles)(e.g. Microsoft Knowledge Base articles)
Technical Support Request GeneratorTechnical Support Request Generator - formats - formats required user information and diagnosed fault required user information and diagnosed fault into a support request, according to vendor- into a support request, according to vendor- specific templatesspecific templates
OnlineOnlineTechnicalTechnicalArticlesArticles
Virtualization ExampleVirtualization ExampleProblem Research AssistantProblem Research Assistant
BridgeBridge
RCA ServerRCA Server
Domain ModelDomain Model
IP Reachability IP Reachability AnalyzerAnalyzer
Domain ModelDomain Model
Domain ModelDomain Model
Correlation Correlation BackendBackend
DiagnosedDiagnosedFaultsFaults
ProblemProblemResponseResponse
HistoryHistoryRepositoryRepository
PreviousPrevious
ResolutionsResolutions
HelpHelp
Problem ResearchProblem ResearchAssistantAssistant
SupportSupportRequestsRequests
MONITOR MONITOR
VIRTUALIZEVIRTUALIZE
MANAGE MANAGE
CONTROL CONTROL
STABILIZESTABILIZE
RCA Takes Management To RCA Takes Management To The Next LevelThe Next Level
Many Players Many Players Many ChoicesMany ChoicesMany Players Many Players Many ChoicesMany Choices
RootRootCauseCause
AnalysisAnalysis
RootRootCauseCause
AnalysisAnalysis
SummarySummary
GOAL: GOAL: No interruptions in serviceNo interruptions in service RCA is key to Exchange availabilityRCA is key to Exchange availability
Accelerates the diagnosis processAccelerates the diagnosis process Can assess impact of failures Can assess impact of failures
before-handbefore-hand Not unreasonable to achieve “five 9’s”Not unreasonable to achieve “five 9’s”
RCA paves the way to virtualizationRCA paves the way to virtualization Managed systems that learn and adaptManaged systems that learn and adapt You never have to interveneYou never have to intervene Free to invest more time in pro-activityFree to invest more time in pro-activity
RCA is in beta RCA is in beta now!!now!!
Call To ActionCall To Action
Demand sophistication and simplicity Demand sophistication and simplicity in Exchange management solutionsin Exchange management solutions Solutions that learnSolutions that learn Solutions that are easy to useSolutions that are easy to use
Start thinking of Exchange availability Start thinking of Exchange availability in terms of utility-level servicein terms of utility-level service
Consider where to implement RCA in Consider where to implement RCA in your current environmentyour current environment
Bring along those whom you serviceBring along those whom you service Take care of your usersTake care of your users Communicate with them as you progressCommunicate with them as you progress