18
What is System Monitoring? System monitoring is a daily routine activity and this document provides a systematic step by step procedure for Server Monitoring. It gives an overview of technical aspects and concepts for proactive system monitoring. Few of them are: Sno Task ABAP Stack Checks 1 Check process overview(SM50) 2 Check overall system process overview(SM66) 3 Check application servers status(SM51) 4 Check for any pending locks (SM12) 5 Check for Dumps in the system(ST22) 6 Check System log for any errors(SM21) 7 Check for any hanged updates or update status(SM13) 8 Check for excessive swapping (ST02) 9 Check for critical job status like backup,updatestats,checkdb etc(DB13) 10 Check for longrunning/failed jobs status(SM37) 11 Check database alertlogs and performance(ST04) 12 Check spool job status (SP01) 13 Check cache status (sxi_cache) for PI System 14 Check SLD functionality(SLDCHECK) 15 Check SXI_MONITOR for PI system 16 Check for Database locks(DB01) Java Stack Checks 1 Check java portal accessibility using link 2 Check server0 log for java system for critical errors 3 Check accessibility of management console 4 Check server node status 5 Check default trace for critical java errors 6 Check java reports for memoryconsumption/swapping Os level checks 1 Check filesystems usage (shouldb be <80%) 2 Check for swap space using topas etc 3 Check for work directory log files at oslevel for errors Checking Application Servers [ SM51 ] This transaction is used to check all active application servers and also this transaction is a central transaction from which you can select the instance to be monitor.

System Monitoring Steps

Embed Size (px)

DESCRIPTION

SAP System monitoring

Citation preview

  • What is System Monitoring?

    System monitoring is a daily routine activity and this document provides a systematic step by step procedure for Server Monitoring. It gives an overview of technical aspects and concepts for proactive

    system monitoring. Few of them are:

    Sno Task

    ABAP Stack Checks

    1 Check process overview(SM50)

    2 Check overall system process overview(SM66)

    3 Check application servers status(SM51)

    4 Check for any pending locks (SM12)

    5 Check for Dumps in the system(ST22)

    6 Check System log for any errors(SM21)

    7 Check for any hanged updates or update status(SM13)

    8 Check for excessive swapping (ST02)

    9 Check for critical job status like backup,updatestats,checkdb etc(DB13)

    10 Check for longrunning/failed jobs status(SM37)

    11 Check database alertlogs and performance(ST04)

    12 Check spool job status (SP01)

    13 Check cache status (sxi_cache) for PI System

    14 Check SLD functionality(SLDCHECK)

    15 Check SXI_MONITOR for PI system

    16 Check for Database locks(DB01)

    Java Stack Checks

    1 Check java portal accessibility using link

    2 Check server0 log for java system for critical errors

    3 Check accessibility of management console

    4 Check server node status

    5 Check default trace for critical java errors

    6 Check java reports for memoryconsumption/swapping

    Os level checks

    1 Check filesystems usage (shouldb be

  • Here you can see which services or work processes are configured in each instance.

    Monitoring Work Processes for Individual Instances [ SM50 ]:

    This Transaction is used to check Work Process Overview and for Systems without Servers. Here all work processes with a Running or Waiting State. Displays all running, waiting, stopped and PRIV processes related to a particular instance. Under this step we check all the processes; the process status should always be waiting or running. If any process is having status other than waiting or running we need to check that

    particular process and report accordingly.

    This transaction displays lot of information like:

    1. Status of Work process (whether its occupied or not) 2. If the work process is running, you may be able to see the action taken by it in Action column.

    3. You can which table is being worked upon

    Some of typical problems:

    User take long time to log on/not able to logon/online transaction very slow. This could be the result of the DIA work processes are fully utilized. There could be also the result of long running jobs (red indicator under the Time column). If necessary you can cancel the session by selecting the jobs then go to Process>Cancel Without core. This will cancel the job and release the work process for other user/process

    Some users may have PRIV status under Reason column. This could be that the user transaction is so big

    that it requires more memory. When this happen the DIA work process will be owned by the user and will not let other users to use. If this happens, check with the user and if possible run the job as a background job.

    If there is a long print job on SPO work process, investigate the problem. It could be a problem related to the print server or printer.

  • Monitoring System wide Work Processes [SM66]

    By checking the work process load using the global work process overview, we can quickly investigate the

    potential cause of a system performance problem.

    Monitor the work process load on all active instances across the system

    Using the Global Work Process Overview screen, we can see at a glance:

    The status of each application server The reason why it is not running Whether it has been restarted The CPU and request run time The user who has logged on and the client that they logged on to The report that is running

    Monitor Application User [ AL08 and SM04 ]

    This transaction displays all the users of active instances. This transaction displays User Ids and Terminal

    Names who are currently logged on the system.

  • Work Process Calculation

    SAPinst installs SAP systems with a minimum number of work processes, which are calculated using the following formula: - Number of dialog work processes = RAM/256 (min 2, max 18) - Number of update work processes = RAM/768 (min 1, max 6) - Number of update2 work processes = RAM/1024 (min 1, max 3) - Number of batch work processes = RAM/1024 (min 2, max 3) - Number of enqueue work processes = 1 - Number of spool work processes = 1

    Monitoring Update Processes [ SM13 ]

    This Transaction is used for Check for any Errors in Update Records. A failed or Update Terminate is an update to the failed database. These failed Updates occurs when a user entry or transaction is not entered or Updated in database. And we should check Whether Update is ACTIVE or not. And also we should check any

    Update Records are terminated.

    Execute Transaction SM13 and put * in the field USER and click on button.

    If there are no long pending updates records or no updates are going on then this queue will be empty as

    shown in the below screen shot.

  • But, if the Update is not active then find the below information:

    Is the update active, if not, was it deactivated by system or by user?

    o Click on button and get the information.

    o Click on button and get the below information: Is any update cancelled?

    Is there a long queue of pending updates older than 10 minutes?

    Monitoring Lock Entries [ SM12 ]

    This Traction is used for Check for Lock Entries. There may be old locks still in place from transactions that did not release, or from when the user was cut off from the Network. If any lock entries are exist then we have to check since how long this lock existing. If, this lock existing from more than one day then we should check that particular User is available or not (By using the Transaction Code AL08). If he is not available then delete that lock. Otherwise we should wait 1 or 2 Days.

    These old locks need to be cleared or it could prevent access or changes to the records.

    We can use lock statistics to monitor the locks that are set in the system. We record only those lock entries

    which are having date time stamp of previous day.

    Execute Transaction SM12 and put * in the field User Name

  • Monitoring System Log [ SM21 ]

    This Transaction is used to check System Logs. We can use the log to pinpoint and rectify errors occurring in the system and its environment. The log is important because unexpected or unknown warnings and errors could indicate a serious problem.

    We check log for the previous day with the following selection/option:

    Enter Date and time. Select Radio Button Problems and Warnings Press Reread System Log.

  • Tune Summary [ ST02 ]

    This transaction is used for Buffer tuning or Buffer Statistics. And it is used to tune Buffer perameters of R/3. The Buffer is important because significant buffer swaping reduces performance. Look under Swaps for Red Entries. Regularly check these entries to establish trends and get a feel of the Buffer behaviour. And in this transaction we should Maintain Hit Ratio is in between 90 100% for all , and Heap Memory allocated should be 0 , and Swaps is 0.

    Step 1: Go to ST02 to check the Tune summary.

    Step 2: If you see any red values, in SWAPS, double click the same.

  • Step 3: In the below screen click on the tab Current Parameters

    Step 4: Note down the value and the Profile parameters

    Step 5: Go to RZ10 (to change the Profile parameter values)

    Step 6: Save the changes.

    Step 7: Restart the server to take the new changes effect.

  • CPU Utilization [ ST06 ]

    Idle CPU utilization rate must be 60-65%,if it exceeds the value then we must start checking at least below things:

    Run OS level commands top and check which processes are taking most resources. Go to SM50 or SM66 .Check for any long running jobs or any long update queries being run. Go to SM12 and check lock entries Go to SM13 and check Update active status. Check for the errors in SM21.

    ABAP Dumps [ ST22 ]

    This Transaction is used to Analyze ABAP Dumps or Short Dumps. This transaction is also used for Analyse and Determine why the error occurred and takes corrective action. It looks for an excessive number of dumps, and also dumps of an unusual nature. A ABAP dump is generated when a report or transaction terminates as a result of a serious error. The system records in the System Log (SM21) and writes a snapshot of the program

    terminate to a special table. Here we check for previous days dumps

  • Spool Request Monitoring [ SP01 ]

    For spool request monitoring, execute SP01 and select as below:

    Put * in the field Created By

    Click on execute button.

    Here we record only those requests which are terminated with problems.

    Monitoring Batch Jobs [ SM37 ]

    This transaction is used for Monitoring Background Jobs. Background are batch jobs scheduled to run at specific time during the day. Here we should check for cancelled jobs.

    For Monitoring background jobs, execute SM37 and select as below:

    Put * in the field User Name and Job name In Job status, select: Scheduled, Cancelled, Released and Finished requests.

  • Transactional RFC Administration [ SM58 ]

    Transactional RFC (tRFC, also originally known as asynchronous RFC) is an asynchronous communication

    method which executes the called function module in the RFC server only once.

    We need to select the display period for which we want to view the tRFCs and then select * in the username field to view all the calls which have not be executed correctly or waiting in queue.

    QRFC Administration (Outbound Queue-[ SMQ1 ])

    We should specify the client name over here and see if there any outgoing qRFCs in waiting or error state.

  • QRFC Administration (Inbound Queue- [ SMQ2 ])

    We should specify the client name over here and see if there any incoming qRFCs in waiting or error state.

  • Database Administration [ DB02 ]

    This transaction is used to check all table Spaces and Indexes. Here we should check table spaces sizes. This transaction is provides a way to examine database allocation. And this transaction is allows us to monitor items such as: a) DB Space History b) DB Analysis By this transaction we can view Database history by Dates and

    Times.

    After you select Current Sizes on the first screen we come to the below screen which shows us the current

    status of all the tablespaces in the system.

    If any of the tablespace is more than 95% and the autoextent is off then we need to add a new datafile so that the database is not full.

  • We can also determine the history of tablespaces.

    We can select Months, Weeks or Days over here to see the changes which takes place in a tablespace.

    We can determine the growth of tablespace by analyzing these values.

  • Database Backup logs [ DB12 ]

    From this transaction we could determine when the last successful backup of the system was. We can review the previous days backups and see if everything was fine or not.

    We can also review the redo log files and see whether redo log backup was successful or not.

    DB Performance Monitor [ ST04 ] :

    This Transaction is used for Database Performance Analysis. This transaction provides the ability to Monitor

    the Database in related to

    1. In Shared Pool SQL Area Get Ratio should be 90 95% 2. In Log Buffer Allocation Retries / Entries

  • Time/User call should be less than 20ms. Note that the value can be much higher due to the inclusion of

    special idle events which limits the relevance of this data. The ratio of Busy wait time to CPU time should be close to 60:40. This is an indication of a well-tuned system. If you see very high values (such as 80:20), system performance can be improved using 'wait event tuning'. If the CPU time is significantly higher than 40%, check the CPU utilization on the database server.

    Hit ratio (Quality) of the data buffer should be more than 94%. A low hit ratio might be due to small size of

    the data buffer.Chek SAP note 619188 for a deeper analysis. Reads/User Calls should be less than 15. If it is too high, check for expensive SQL statements. Check SAP Note 766349. Go to Detail Analysis Menu -> File System Requests and check the average read time (Avg(ms) for Blk Reads) for individual data files or in total (Total under the column). If the values is very high in comparison to 10 ms, check whether the problem can be solved by improving the data distribution or if there is any Disk I/O

    problem at hardware level. You can analyse more on disk usage from ST06/OS07 (-> Detail Analysis Menu -> Disk). If the load on the database server's disks is more than 80%, you need to redistribute the data files. DD-cache quality should be more than 80%; pinratio should be more than 95%; reloads/pin should be lower than 0.04; User/recursive calls should be more than 2. If there is too much deviation in these

    indicators, you need to increase shared pool size.

    Work Load Monitor [ ST03 / ST03N ]:

    This Transaction is used to Check Work Load Analysis of . Workload Analysis is used to determine System performance. Check Statistics and Record Trends to get a feel for the Systems Behaviour and Performance. Understanding the System when it is running well helps us to determine what changes may need to be made when it is not.

  • General performance indicating factors: factors indicating good performance.

    Wait time < 10% of response time.

    Average roll in time < 20 milliseconds.

    Average roll wait time < 200 ms.

    Average load (& generation time) < 10% of response time( Communication problem with GUI or external system

    Large load time -> Program buffer, CUA buffer or screen buffer too small

    Large database request times -> CPU/ memory bottleneck on database server, network problems,

    expensive SQL statements, database locks missing indexes, missing statistics, small buffers

    Large CPU times -> Expensive ABAP processing, for example, processing large tables, frequent accessing

    of R/3 buffers

    System Trace [ ST01 ]:

    This Transaction is used for System Trace. Sometimes you may face a strange behavior in DMS functions which are caused by wrong authorization customizing or you do not know how and where authorization objects are checked by the system. This trace shows all checked authorization objects, the values which are handed over to the check and which object leads to the missing authorization behavior.