Upload
premchand-gupta
View
381
Download
13
Embed Size (px)
Citation preview
Sara Moggi ([email protected])
IBM Tivoli Agentless Monitoring Overview
© 2010 IBM Corporation2
Agenda
Overview
Installation
Configuration
Workspaces Reference
Troubleshooting– Log Files
– Problem determination
Known Problems/APARS
Q&A
© 2010 IBM Corporation3
Agent and Agentless technology
Agent-based technology resides directly on a managed server
Agentless technology resides primarily on a management server and gets its data via a remote application programming interface (API)
© 2010 IBM Corporation4
Agent and Agentless technology
The IBM Tivoli Monitoring product uses agent and agentless technology.
– Agent Technology:
– Database agents– Operating system agents, etc.
– Agentless Tecnology
– Customer Agent or Agentless Solution with Universal Agent/Agent Builder
– ITM for Virtual Servers– ITM for Applications (SAP)– Operating Systems (starting from ITM 6.21 version)
© 2010 IBM Corporation5
ITM Agentless
IBM Tivoli Monitoring Agentless provides a way to monitor the availability and performance of all the systems in your enterprise from one or several designated workstations
– Agentless Monitoring for Windows Operating Systems (r2)
– Agentless Monitoring for AIX Operating Systems (r3)
– Agentless Monitoring for Linux Operating Systems (r4)
– Agentless Monitoring for HP-UX Operating Systems (r5)
– Agentless Monitoring for Solaris Operating Systems (r6)
© 2010 IBM Corporation6
Key Features (1/2)
It is possible to monitor an IT environment from a small set of workstations.
It is not needed to install an agent on each box we want to monitor.
Agentless provides a more flexible monitoring solution.
© 2010 IBM Corporation7
Key Features (2/2)
Tivoli Monitoring Agentless can monitor multiple operating system nodes that do not have standard OS agents running on them.
An Agentless obtains data from nodes that are monitored via:
– SNMP (Simple Network Management Protocol)
– CIM (Common Information Model)
– WMI (Windows Management Instrumentation)
© 2010 IBM Corporation8
Agenda
Overview
Installation
Configuration
Workspaces Reference
Troubleshooting– Log Files
– Problem determination
Known Problems/APARS
Q&A
© 2010 IBM Corporation9
Agentless Monitoring Installation Supported Platform
– AIX 5.3 (32/64 bit) or AIX 6.1 (64 bit)
– Solaris 9 or higher
– HP-UX 11i or higher
– Windows:
– Windows 2003 Server SE (32/64 bit)
– Windows Server 2003 Datacenter Edition
– Windows Vista Enterprise, Business and Ultimate (32/64 bit)
– Windows Server 2008 SE (32/64 bit)
– Windows Server 2008 EE (32/64 bit)
– Windows Server 2008 Data Center
– Windows Server 2008 Data Center (64 bit)
– Linux
– Red Hat Enterprise Linux 4 or higher
– SUSE Linux Enterprise Server 9 or highter
© 2010 IBM Corporation10
Agentless Monitoring Installation
When you use the Agent DVD in a Windows box, then you need to select the Agenless features in the following screen:
© 2010 IBM Corporation11
Agentless Monitoring Installation
On the other hand, in a unix box, when you run the install.sh, you need to select the agentless from the following menu:
© 2010 IBM Corporation12
Agenda
Overview
Installation
Configuration
Workspaces Reference
Troubleshooting– Log Files
– Problem determination
Known Problems/APARS
Q&A
© 2010 IBM Corporation13
Agentless Monitoring Configuration
We can perform agentless configuration by using different sources:
– Manage Tivoli Enterprise Monitoring Services (MTEMS)
– itmcmd command
– tacmd command
– TEP gui : right click on the agentless icon and then click on Configure option
© 2010 IBM Corporation14
Agentless Monitoring Configuration
It is possible to configure the Agentless to collect all the data from the monitored box using the SNMP protocol (SNMP Version 1, SNMP Version 2c or SNMP Version 3)
The Agentless for Solaris provides an additional possibility: CIM
The Agentless for Windows provides an additional possibility: WMI
© 2010 IBM Corporation15
Agentless Monitoring Configuration
For each agentless, as soon as you start the configuration you’ll see the following window:
© 2010 IBM Corporation16
Agentless Monitoring Configuration
Agentless Monitoring for AIX
© 2010 IBM Corporation17
Agentless Monitoring Configuration
Agentless Monitoring for AIX
© 2010 IBM Corporation18
Agentless Monitoring Configuration
Agentless Monitoring for AIX
© 2010 IBM Corporation19
Agentless Monitoring Configuration
Agentless Monitoring for Solaris: CIM
© 2010 IBM Corporation20
Agentless Monitoring Configuration
Agentless Monitoring for Solaris: CIM
© 2010 IBM Corporation21
Agentless Monitoring Configuration
Agentless Monitoring for Solaris: CIM
© 2010 IBM Corporation22
Agentless Monitoring Configuration
Agentless Monitoring for Windows: WMI
© 2010 IBM Corporation23
Agentless Monitoring Configuration
Agentless Monitoring for Windows: WMI
© 2010 IBM Corporation24
Agentless Monitoring Configuration
Agentless Monitoring for Windows: WMI
© 2010 IBM Corporation25
Agentless Monitoring Configuration
Agentless Monitoring for Windows: WMI
© 2010 IBM Corporation26
Agentless Monitoring Configuration
Relationship between Managed System Details and TEP gui tree
© 2010 IBM Corporation27
Agentless Monitoring Configuration
Relationship between Managed System Details and TEP gui tree
© 2010 IBM Corporation28
Agenda
Overview
Installation
Configuration
Workspaces Reference
Troubleshooting– Log Files
– Problem determination
Known Problems/APARS
Q&A
© 2010 IBM Corporation29
Workspaces Reference
Agentless Linux OS
– contains agent instance level workspaces.
SNMP Linux Systems: LNX subnode
– Each node is an individual server.
© 2010 IBM Corporation30
Workspaces Reference
Agentless Navigator Item
– lists the collection status of the managed systems, and lists which systems are being monitored
– Agentless for Windows: two views that list the Windows systems that are monitored through the SNMP and the WMI subnode
– Agentless for Solaris: three views that list the Solaris systems that are monitored through the SNMP subnode (Sun Management Center or System Management Agent) and the CIM subnode.
© 2010 IBM Corporation31
Workspaces Reference
Metrics collected by the Agentless Monitoring:
– Disk utilization
– Physical and Virtual Memory
– Network Interface
– Processes running
– Processor capacity of the system
– System level
© 2010 IBM Corporation32
Agenda
Overview
Installation
Configuration
Workspaces Reference
Troubleshooting– Log Files
– Problem determination
Known Problems/APARS
Q&A
© 2010 IBM Corporation33
Log Files
RAS1 Logs
– Windows:
%CANDLE_HOME%\TMAITM6\logs\<hostname>_<pc>_k<pc>agent_<instance>_<timestamp>-nn.log
– Unix/Linux:
$CANDLE_HOME/logs/<hostname>_<pc>_<instance>_<timestamp>-nn.log
– where:
– pc is the product code of the specific Agentless monitoring
© 2010 IBM Corporation34
Trace Levels Startup/Initialization problems
– ERROR (UNIT:query ALL) Windows
– ERROR (UNIT:ct_main ALL) Unix/Linux
WMI Data Provider– ERROR (UNIT:WMI ALL)
Perfmon Data Provider– ERROR (UNIT: QueryClass ALL)
SNMP Data Provider– ERROR (UNIT:SNMP ALL)
Windows Event Log Data Provider– ERROR (UNIT:EventLog ALL) (UNIT:WinLog ALL)
CIM-XML Data Provider– ERROR (UNIT:CIM ALL)
© 2010 IBM Corporation35
Problem Scenario 1
Using the itmcmd command to start agentless we obtained the following error message:
KCIIN0201E Specified product is not configured.
We need to use the following command, including –o option:
itmcmd agent -o SNMP start r4
© 2010 IBM Corporation36
Problem Scenario 2
Agentless Monitoring configured to use the SNMP Blank workspaces
In the agent log you find error messages as the following:
Check if the snmpd process is running
© 2010 IBM Corporation37
Problem Scenario 3
Agentless monitoring for AIX
– Agentless is configured to collect the data using SNMP data provider
– The only blank workspaces are Disk and Process
By default in AIX 5.x, 6.x, the aixmibd daemon is excluded access to the MIBD
Modify the /etc/snmpdv3.conf file to comment out a line that is excluding access.
# exclude aixmibd managed MIBs from the default view#VACM_VIEW defaultView 1.3.6.1.4.1.2.6.191 -excluded-
© 2010 IBM Corporation38
Problem Scenario 4 (1/2)
Agentless monitoring for Linux or Solaris
– Agentless is configured to collect the data using SNMP data provider
– All the workspaces are blank
– In the agentless trace logs we found messages as the following:
© 2010 IBM Corporation39
Problem Scenario 4 (2/2)
Check connectivity with the monitored box:
– Make sure you can ping the remote system
– Check about firewalls that are blocking communications on the SNMP port (UDP 161)
– Check community string and passwords specified in the Agentless configuration
– Check the SNMP system is not restricting access to localhost (see snmpd.conf file)
– Run the following command to check the connectivity with the SNMP system:
– snmpwalk –c public –v 1 <hostIP>
– Check the MIB branches are not restricted (see snmpd.conf file)
© 2010 IBM Corporation40
Problem Scenario 5 (1/2)
Agentless Monitoring for AIX
– Agentless is configured to collect the data using SNMP data provider
– Some workspaces are blank
In AIX, the SNMP daemon is composed by 4 processes:
– snmpd System workspaces
– aixmibd Disk and File System Capacity, Volume Group, Logical Volume, Physical Volume, Page System, Process
Availability, and User Account Information workspaces
– hostmibd Memory and Processor workspaces
– snmpmibd Network workspace
© 2010 IBM Corporation41
Problem Scenario 5 (2/2)
Check all the snmp processes are running
If the community string is not public, verify that the three SNMP processes: aixmibd, hostmibd and snmpmibd are started with the -c <community> command line option
© 2010 IBM Corporation42
Problem Scenario 6
Agentless Monitoring for Linux
– Agentless is configured to collect the data using SNMP data provider
– On TEP gui, there are data only for Network and System workspaces
For Red Hat operating systems, the /etc/snmpd.conf must be modified to allow the Host Resources MIB and ucdavis MIB to be viewed by all users.
Add the following system views to the SNMP configuration:– view systemview included .1.3.6.1.2.1.25
– view systemview included .1.3.6.1.4.1.2021
© 2010 IBM Corporation43
Problem Scenario 7
Agentless Monitoring for Windows
(48E294C2.00011D28:queryclass.cpp,803,"internalCollectData") Authentication failed against host <host> as user <Domain>\<User>, return code = 1326
(48E294C3.000016EC:wmiqueryclass.cpp,757,"internalCollectData") ::collectData==>Could not connect. Error code = 0x80070005, subnode = <name>
These errors indicate an invalid password, invalid username, or username without Administrators group membership.
© 2010 IBM Corporation44
Problem Scenario 8
Agentless Monitoring for Windows
(4891C694.0066-1558:queryclass.cpp,1006,"start") Error adding query for class PhysicalDisk.
(4891C694.0067-1558:queryclass.cpp,1007,"start") \\<hostname box>\PhysicalDisk(*)\% Disk Write Time - add returned C0000BB8
Check if the counter exists
Check if the Remote Registry service is enabled.
Check if the counter indexes are corrupted. You need to check in HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Perflib\009
© 2010 IBM Corporation45
Agenda
Overview
Installation
Configuration
Workspaces Reference
Troubleshooting– Log Files
– Problem determination
Known Problems/APARS
Q&A
© 2010 IBM Corporation46
Known Problems/APARS 1 (1/2)
IZ80454: CPU metrics for Solaris SMA should be per interval
Recreation Steps:– Install the Agentless Monitoring for Solaris (KR6) product.
– Configure it to use "SNMP (System Management Agent)" or SMA.
– In Tivoli Enterprise Portal (TEP), click on the Processor workspace.
– In the "Overall CPU Utilization over Time",select to have data viewed in table format.
Symptom: for each polling interval the values of the following attributes increase over time:
– User CPU, System CPU, Nice CPU, Idle CPU
– Total CPU, CPU Used Pct, CPU Idle Pct
© 2010 IBM Corporation47
Known Problems/APARS 1 (2/2)
The following CPU metrics are collected via SNMP and the values are cumulative since the machine was started:
– User CPU, System CPU, Nice CPU, Idle CPU
These values are used in the calculations of the following attributes:– Total CPU, CPU Used Pct, CPU Idle Pct
The CPU metrics were changed to represent the values since the last polling interval instead of a cumulative value since the machine was last rebooted.
Fixed in 6.2.2-TIV-ITM-FP0003
© 2010 IBM Corporation48
Known Problems/APARS 2
IZ71871: Add ability to monitor services that are down
The following functions have been added to the agent:– a new query;
– a new workspace named Windows Services under the System navigator item;
– a new attribute group with several attributes that will provide information about the services. For example: Display Name, Description, Process ID, Status, State.
Fixed in 6.2.1-TIV-ITM-FP0002 and 6.2.2-TIV-ITM-FP0002
© 2010 IBM Corporation49
Known Problems/APARS 3 (1/2)
IZ77565: Perfmon data stops collecting when other systems are down
When one or more remote hosts go down, the calls to perfmon for other remote hosts (which are up) fail with a timeout error:
(4BCF113E.0000-400:queryclass.cpp,1001,"internalCollectData“) Errorcollecting query data for class Terminal Services host <host>. Error is 00000102.
(4BCF113E.0001-744:queryclass.cpp,1001,"internalCollectData") Errorcollecting query data for class Terminal Services host <host> Error is 00000102.
© 2010 IBM Corporation50
Known Problems/APARS 3 (2/2)
The Microsoft code is single-threaded in the API call the agent calls. When a system is down, the call to collect the perfmon data will time-out for the system that is down. All other requests queued up at the time (for the same system or other systems) will also time-out.
The agent code was updated to ensure the call to the Microsoft API is also single-threaded.
Fixed in 6.2.1-TIV-ITM-FP0003
© 2010 IBM Corporation51
Known Problems/APARS - Missing Operator
Example:R2 Agentless running on a Linuxsystem, and monitoring a remoteWindows system
<instance>:<hostname>:R2– R2:<WindowsHost>:WIN
© 2010 IBM Corporation52
Known Problems/APARS - Missing Operator
Define a situation that monitors if a process is MISSING on the monitored Windows box
When the situation will be true, then we’ll have an alert on <instance>: <hostname>:R2 node
Enhancement Requests:– MR1126096811
– MR0109086845
– MR0204095945
DCF http://www-01.ibm.com/support/docview.wss?uid=swg21420788
© 2010 IBM Corporation53
Known Problems/APARS - Missing Operator
Workaround: Agent Builder solution
Agent which remotely monitor one machine
Agent built with the multi-instance support
Agent to monitor these metrics on Windows:
– Computer System including Model and Serial Number, Operating System
– Windows Event Log
– Disk Usage including Logical Disk, Physical Disk
– Processor
– Memory including Physical Memory,Page File Usage
– Network Interfaces
– Windows Terminal Services
Also available for AIX, HP-UX, Solaris and Linux platforms
© 2010 IBM Corporation54
Known Problems/APARS - Missing Operator
In the Agent package you can find the following scripts:
– installIra.bat/sh (installs all components on a single machine)
– installIraAgent.bat/sh (installs the Agent)
– installIraAgentTEMS.bat/sh (installs the TEMS application support)
– installIraAgentTEPS.bat/sh (installs the TEPS application support)
© 2010 IBM Corporation55
Known Problems/APARS - Missing Operator
© 2010 IBM Corporation56
Agentless Monitoring Scale Information
Agentless Monitors are multi-instance agents
Support for up to 10 active instances on a single system
Each instance supports communication with 100 remote nodes
– 10 instances x 100 remote nodes = 1000 monitored systems
© 2010 IBM Corporation57
Performance VariablesVariable Name Default Value Description
CDP_DP_CACHE_TTL 60 Time in seconds before a query will trigger a new data collection
CDP_DP_THREAD_POOL_SIZE 60 The number of threads created to perform background data collections. The Thread Pool is shared among all attribute groups in all remote nodes in an agent.
CDP_DP_REFRESH_INTERVAL 60 The interval in seconds at which each attribute group cache is updated in the background
CDP_DP_IMPATIENT_COLLECTOR_TIMEOUT 2 The number of seconds to wait for a data collection to happen before timing out and returning cached data.
CDP_SNMP_RESPONSE_TIMEOUT 2 The number of seconds to wait for each request to time out. Each row in an attribute group is a separate request
CDP_SNMP_MAX_RETRIES 2 The number of times to retry sending the SNMP request after a response timeout
CDP_NT_EVENT_LOG_GET_ALL_ENTRIES_FIRST_TIME
NO Configures whether or not the Windows Event Log data provider should report old log entries on startup, or only new ones
CDP_NT_EVENT_LOG_CACHE_TIMEOUT 3600 Cache lifetime in seconds of an event from the Windows Event Log
CDP_PURE_EVENT_CACHE_SIZE 100 Number of pure events held in cache at any one time. When a query is made, reports all events in the cache at that time. When cache is full, oldest events are removed to make room for new ones
© 2010 IBM Corporation58
QUESTIONS