52
BASLE BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH RAC Diagnostics .. with OraCHK, CHM and TFA Markus Flechtner Senior Consultant

RAC Diagnostics

Embed Size (px)

Citation preview

BASLE BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH

RAC Diagnostics .. with OraCHK, CHM and TFA

Markus Flechtner Senior Consultant

Our company.

RAC Diagnostics 2 30.11.15

Trivadis is a market leader in IT consulting, system integration, solution engineering and the provision of IT services focusing on and technologies in Switzerland, Germany, Austria and Denmark. We offer our services in the following strategic business fields: Trivadis Services takes over the interactive operation of your IT systems.

O P E R A T I O N

COPENHAGEN

MUNICH

LAUSANNE BERN

ZURICH BRUGG

GENEVA

HAMBURG

DÜSSELDORF

FRANKFURT

STUTTGART

FREIBURG

BASLE

VIENNA

With over 600 specialists and IT experts in your region.

RAC Diagnostics 3 30.11.15

14 Trivadis branches and more than 600 employees

200 Service Level Agreements

Over 4,000 training participants

Research and development budget: CHF 5.0 / EUR 4 million

Financially self-supporting and sustainably profitable

Experience from more than 1,900 projects per year at over 800 customers

About Markus Flechtner

  Senior Consultant, Trivadis, Duesseldorf/Germany, since April 2008

  Discipline Infrastructure Database @Trivadis

  Working with Oracle since the 1990’s –  Development (Forms, Reports, PL/SQL) –  Support –  Database Administration

  Focus –  Oracle Real Application Clusters –  Database Migration Projects

  Teacher –  O-RAC – Oracle Real Application Clusters –  O-NF12CDBA – Oracle 12c New Features for the DBA

Blog: http://markusdba.de/ @markusdba

RAC Diagnostics 4 30.11.15

Our database doctors ..

  Dr. ORAchk –  Regular screening examination

  Dr. CHM –  Electrocardiogram (ECG)

  Dr. TFA –  In case of emergency

RAC Diagnostics 5 30.11.15

Oracle Support Tools Bundle

  Collection of database and RAC support tools

  Includes: –  ORAchk –  ExaChk (*) – like OraChk, but for Engineered Systems –  OSWatcher –  ProcWatcher (*) – tool to examine and monitor Oracle database and/or clusterware

processes –  ORATOP (*) - near real-time monitoring of databases –  SQLT (*) – helps in tuning SQL statements –  DARDA (*) - Diagnostic Assistant - interface for other diagnostic tools

  Integrated in TFA collector since release 12.1.2.3.0

–  Tools can be downloaded separately, too

(*) not covered by this talk

RAC Diagnostics 6 30.11.15

Running the tools from TFA collector

oracle> /u00/app/oracle/tools/tfa/bin/tfactl toolstatus .---------------------------------------. | External Support Tools | +----------+--------------+-------------+ | Host | Tool | Status | +----------+--------------+-------------+ | dbserver | alertsummary | DEPLOYED | | dbserver | darda | DEPLOYED | | dbserver | oratop | DEPLOYED | | dbserver | rgrep | DEPLOYED | | dbserver | exachk | DEPLOYED | | dbserver | orachk | DEPLOYED | | dbserver | oswbb | RUNNING | | dbserver | sqlt | DEPLOYED | | dbserver | prw | NOT RUNNING | | dbserver | tail | DEPLOYED | | dbserver | ncdvi | DEPLOYED | '----------+--------------+-------------' oracle> /u00/app/oracle/tools/tfa/bin/tfactl run alertsummary

RAC Diagnostics 7 30.11.15

Agenda

RAC Diagnostics 8 30.11.15

1.  ORAchk

2.  Cluster Health Monitor (CHM)

3.  Trace File Analyzer (TFA) Collector

4.  Other tools

RAC Diagnostics 9 30.11.15

OraChk

ORAchk – Purpose & History

  Available since July 2011

  Oracle Configuration Audit Tool

  Formerly known as "RACCheck"

  Supported on Unix, Linux and Windows (Cygwin/Standalone version)

  Checks your installation against more than 7.000 Oracle Best Practices

  Results can be stored in a database

RAC Diagnostics 10 30.11.15

ORAchk – Not a RAC or database tool only

ORAchk includes checks for

–  Oracle Database (Single Instance + RAC)

–  MAA Validation

–  Upgrade Readiness

–  Golden Gate

–  Enterprise Manager 12c Cloud Control

–  E-Business Suite

–  Oracle Sun Server

RAC Diagnostics 11 30.11.15

ORAchk - Installation

Clusterware 11.2.0.4 and 12.1.0.2

–  Installed with the software (into $ORACLE_HOME/suptools/orachk)

–  So far not updated with the PSUs L

  For older versions

–  Install TFA Collector 12.1.2.3.0 or higher

–  Download ORAchk via MOS 1268927.2

RAC Diagnostics 12 30.11.15

ORAchk – Basic Command Line Options

Option Meaning -a Run all Checks -b Best Practice Check only -p Patch Check Only -u –o pre|post Pre or Post Upgrade Checks

-dbnames run for a subset of databases only -clusternodes run for a subset of nodes only

-h Help on all available parameters (long list)

RAC Diagnostics 13 30.11.15

ORAchk – Sample Output (1) – at runtime

ORAchk checks O/S, clusterware and databases on all nodes

  Result: ZIP-File and HTML-Report

RAC Diagnostics 14 30.11.15

ORAchk – Sample Output (2) – final HTML report

RAC Diagnostics 15 30.11.15

ORAchk – Sample Output (3) – final HTML report

RAC Diagnostics 16 30.11.15

ORAchk – Advanced Command Line Options

Option Meaning -diff Compare 2 reports

-d Manage ORAchk daemon

-profile Run for specific components or applications like: •  ASM •  Clusterware •  EBS •  MAA •  Goldengate •  Enterprise Manager 12c .. And more

RAC Diagnostics 17 30.11.15

ORAchk – Collection Manager (1)

ORAchk results can be stored in a repository database

  Collection Manager is a GUI for the repository database

  APEX application (4.2.0 or higher)

–  Import.sql is delivered with ORAchk software

  Installation

–  Create database user for ORAchk

–  create 3 tables (see Appendix F of the OraChk Users Guide)

–  Install APEX application

RAC Diagnostics 18 30.11.15

ORAchk – Collection Manager (2)

  Set environment

  Run ORAchk

–  If the environment is set, then the data will be inserted into the repository database

export RAT_UPLOAD_CONNECT_STRING="(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=dbserver)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=EMREP)))" export RAT_UPLOAD_TABLE=auditcheck_result export RAT_PATCH_UPLOAD_TABLE=auditcheck_patch_result export RAT_ZIP_UPLOAD_TABLE=RCA13_DOCS export RAT_UPLOAD_USER=orachk export RAT_UPLOAD_PASSWORD=orachk export RAT_UPLOAD_ORACLE_HOME=/u00/app/oracle/product/11.2.0.4

RAC Diagnostics 19 30.11.15

ORAchk – Collection Manager (3) – some screenshots

RAC Diagnostics 20 30.11.15

ORAchk – Collection Manager (4) – some screenshots

RAC Diagnostics 21 30.11.15

ORAchk – Collection Manager (5) – some screenshots

RAC Diagnostics 22 30.11.15

ORAchk – Collection Manager (6) – some screenshots

RAC Diagnostics 23 30.11.15

RAC Diagnostics 24 30.11.15

Cluster Health Monitor

Cluster Health Monitor (CHM)

  Available since 11.2.0.2

  Collects OS information of the cluster nodes –  CPU load –  Memory –  Top Processes –  File Systems –  System information

  Components –  sysmond (on every cluster node) –  loggerd

  Cluster Resource crf

RAC Diagnostics 25 30.11.15

Cluster Health Monitor (CHM) – CLI oclumon

grid@rac1node1:~/ oclumon –h For help in interactive mode : <verb> -h Currently supported verbs are : dumpnodeview, manage, version, debug, analyze, quit, exit, and help

Option Dumpnodeview Shows collected data (for specific nodes and/or a specific timewindow

Manage Manages the CHM repository and show Version Shows version information Debug Debugs CHM components Analyze Deprecated, will be ignored

RAC Diagnostics 26 30.11.15

Cluster Health Monitor (CHM) – CLI show data

grid@rac1node1:~/ [grid12102] oclumon dumpnodeview dumpnodeview: Node name not given. Querying for the local host ---------------------------------------- Node: rac1node1 Clock: '15-02-22 18.05.43 ' SerialNo:1440 ---------------------------------------- SYSTEM: #pcpus: 1 #vcpus: 2 cpuht: N chipname: Intel(R) cpu: 20.59 cpuq: 0 physmemfree: 393676 physmemtotal: 4958228 mcache: 2506540 swapfree: 3956548 swaptotal: 3964924 hugepagetotal: 0 hugepagefree: 0 hugepagesize: 2048 ior: 156 iow: 78 ios: 32 swpin: 0 swpout: 0 pgin: 155 pgout: 59 netr: 102.554 netw: 75.683 procs: 323 procsoncpu: 2 rtprocs: 13 rtprocsoncpu: N/A #fds: 20704 #sysfdlimit: 6815744 #disks: 9 #nics: 4 nicErrors: 0 TOP CONSUMERS: topcpu: 'mdb_vktm_-mgmtd(5402) 4.39' topprivmem: 'java(2046) 171088' topshm: 'ora_mman_raccdb(5479) 300808' topfd: 'oraagent.bin(4891) 251' topthread: 'console-kit-dae(3254) 64' [..]

RAC Diagnostics 27 30.11.15

Cluster Health Monitor (CHM) – -MGMTDB (1)

  In Oracle 12c CHM data is stored in the Grid Infrastructure Management Repository (GIMR), SID=-MGMTDB

–  Mandatory with 12.1.0.2

–  Single instance database

–  Multitenant database with 12.1.0.2 (PDB-name = clustername)

–  Basic installation needs about 5 GB in the diskgroup with OCR and voting files

–  Additional listener MGMTLSNR

  Required size depends on number of nodes and retention time

–  About 1,3 GB + 500 MB/node

–  Check and configure with "oclumon"

RAC Diagnostics 28 30.11.15

Cluster Health Monitor (CHM) – -MGMTDB (2) - Tools

mgmtca (for initial configuration only) Srvctl oclumon

–  Oracle recommends a retention time of 72 h ( = 259200 seconds)

grid@rac1node2:~/ oclumon manage -h Manage verb usage ================= manage -repos {checkretentiontime <time> | changerepossize <memsize>} | -get {<key1> [<key2> ...] | alllogger [-details] | mylogger [-details]} .. grid@rac1node2:~/ oclumon manage -repos checkretentiontime 259200 The Cluster Health Monitor repository is too small for the desired retention. Please first resize the repository to 5844 MB

RAC Diagnostics 29 30.11.15

Cluster Health Monitor (CHM) – EM 12c Cloud Control

  CHM data can be displayed in EM 12c Cloud Control

RAC Diagnostics 30 30.11.15

Cluster Health Monitor (CHM) – Memory Guard

  Evaluates the memory usage on the cluster nodes based on data collected by Cluster Health Monitor (CHM)

  Automatically stops database services (transactional) in case of memory pressure on a cluster node

–  .. or even kills database sessions

  .. and automatically reactivates the services when enough memory is available

  Starting with Oracle12.1.0.2 Memory Guard is automatically activated

RAC Diagnostics 31 30.11.15

RAC Diagnostics 32 30.11.15

Trace File Analyzer (TFA) Collector

Real life experience ..

  26 node cluster

–  5 databases

  Strange ASM issue

  Oracle Support requested

–  Clusterware logs

–  ASM alert.logs

–  Database alert.logs

For each of the 26 servers!!

RAC Diagnostics 33 30.11.15

Trace File Analyzer Collector

  Initial release in January 2013

  Current version 12.1.2.3.1

  Collects trace and log files and system information from all nodes into a cluster with a single command initiated on one cluster node

  Centralized output

  Real-time scanning for specific error messages possible

–  è Automatic collection of diagnostic information

  Included in Clusterware 11.2.0.4 and 12.1.0.2

  For other versions (10.2 or higher): –  Download from MOS: 1513912.1 –  RAC and DB Support Tools Bundle is included in current TFA package

RAC Diagnostics 34 30.11.15

TFA Collector – Installation

  For Clusterware 11.2.0.4 and 12.1.0.2: No additional installation required

  For older versions: [root@rac1node1 tmp]# ./installTFALite.sh Starting TFA installation Enter a location for installing TFA [/tmp]: /u00/app/oracle Checking for available space in /u00/app/oracle Enter a Java Home that contains Java 1.6 or later : /usr/java/jre1.7.0_13 Running Auto Setup for TFA as user root… Would you like to do a [L]ocal only or [C]lusterwide installation ? [L|l|C|c] [C] : C The following installation requires temporary use of SSH. If SSH is not configured already then we will remove SSH when complete. Do you wish to Continue ? [Y|y|N|n] [N] y Installing TFA at /u00/app/oracle in all hosts Discovering Nodes and Oracle resources Checking whether CRS is up and running ..

RAC Diagnostics 35 30.11.15

TFA Collector – Architecture

  JAVA-based tool

  TFA-daemon “TFAMain” running on all cluster nodes

  Data Storage

–  File-Repository for Diagnostic Information

–  Berkeley Database for metadata, file inventory, event history, etc.

  Command Line Interface

–  tfactl (perl)

–  Communication with daemon using secure sockets

oracle@rac1node1:~/ [rdbms12102] ps -ef |grep tfa |grep –v grep root 2325 1 0 10:14 ? 00:00:03 /bin/sh /etc/init.d/init.tfa run root 3631 1 0 10:16 ? 00:05:10 /u00/app/grid/product/12.1.0.2/jdk/jre/bin/java – [..] oracle.rat.tfa.TFAMain /u00/app/grid/product/12.1.0.2/tfa/rac1node1/tfa_home

RAC Diagnostics 36 30.11.15

TFA Collector – Commands (1) – Command Overview

oracle@rac1node1:/home/grid/ $ORACLE_HOME/tfa/bin/tfactl Usage : /u00/app/grid/product/12.1.0.2/bin/tfactl <command> [options] <command> = print Print requested details analyze List events summary and search strings in alert logs. diagcollect Collect logs from across nodes in cluster collection Manage TFA collections directory Add or Remove or Modify directory in TFA toolstatus Prints the status of TFA Support Tools run <tool> Run the desired support tool start <tool> Starts the desired support tool stop <tool> Stops the desired support tool restart <tool> Restarts the desired support tool For help with a command: /oracle/u00/app/oracle/tools/tfa/bin/tfactl <command> -help

RAC Diagnostics 37 30.11.15

TFA Collector – Commands (2) – commands for root

  Configuration tasks must be done by root

  The following additional commands are available: <command> = start Starts TFA stop Stops TFA enable Enable TFA Auto restart disable Disable TFA Auto restart access Add or Remove or List TFA Users and Groups purge Delete collections from TFA repository directory Add or Remove or Modify directory in TFA host Add or Remove host in TFA set Turn ON/OFF or Modify various TFA features uninstall Uninstall TFA from this node diagnosetfa Collect TFA Diagnostics ..

RAC Diagnostics 38 30.11.15

TFA Collector – Commands (3) – print config

root@rac1node1:/home/grid/ $ORACLE_HOME/tfa/bin/tfactl print config +--------------------------------------------+------------+ | Configuration Parameter | Value | +---------------------------------------------+------------+ | TFA version | 12.1.2.3.1 | | Automatic diagnostic collection | OFF | | Trimming of files during diagcollection | ON | | Repository current size (MB) | 7 | | Repository maximum size (MB) | 10240 | | Inventory Trace level | 1 | | Collection Trace level | 1 | | Scan Trace level | 1 | | Other Trace level | 1 | | Max Size of TFA Log (MB) | 50 | | Max Number of TFA Logs | 10 | | Max Size of Core File (MB) | 20 | | Max Collection Size of Core Files (MB) | 200 | | Automatic Purging | ON | | Minimum Age of Collections to Purge (Hours) | 12 | '---------------------------------------------+------------'

RAC Diagnostics 39 30.11.15

TFA Collector – Commands (4) - diagcollect

  Collects trace and log files from the cluster nodes grid@rac1node1:~/ [grid12102] $ORACLE_HOME/tfa/bin/tfactl diagcollect Collecting data for the last 4 hours for all components... Collecting data for all nodes Repository Location in rac1node1 : /u00/app/oracle/tfa/repository 2015/02/21 20:28:24 CET : Running an inventory clusterwide ... 2015/02/21 20:28:24 CET : Collection Name : tfa_Sat_Feb_21_20_28_06_CET_2015.zip 2015/02/21 20:28:24 CET : Sending diagcollect request to host : rac1node2 2015/02/21 20:28:24 CET : Sending diagcollect request to host : rac1node3 .. Logs are being collected to: /u00/app/oracle/tfa/repository/collection_Sat_Feb_21_20_28_06_CET_2015_node_all/rac1node1.tfa_Sat_Feb_21_20_28_06_CET_2015.zip /u00/app/oracle/tfa/repository/collection_Sat_Feb_21_20_28_06_CET_2015_node_all/rac1node2.tfa_Sat_Feb_21_20_28_06_CET_2015.zip /u00/app/oracle/tfa/repository/collection_Sat_Feb_21_20_28_06_CET_2015_node_all/rac1node3.tfa_Sat_Feb_21_20_28_06_CET_2015.zip

RAC Diagnostics 40 30.11.15

TFA Collector – Commands (5) - diagcollect

  Which data is collected by default? –  alert.log from all databases –  ASM log files –  listener.log files –  Clusterware logs –  CHM information –  Patch information

  Components, node list and time window can be specified

  Automatic diagnostic collection

–  Tfa scans the alert.log files and runs "diagcollect" automatically

–  Disabled by default

root@rac1node1:~/ tfactl set autodiagcollect=<ON|OFF> [-c]

RAC Diagnostics 41 30.11.15

TFA Collector – Commands (6) - analyze

  Checks system log files and Oracle log files on all nodes root@rac1node1:~/ [grid12102] $ORACLE_HOME/tfa/bin/tfactl analyze INFO: analyzing all (Alert and Unix System Logs) logs for the last 60 minutes... Please wait... INFO: analyzing host: rac1node1 Report title: Analysis of Alert,System Logs Report date range: last ~1 hour(s) Report (default) time zone: CET - Central European Time Analysis started at: 21-Feb-2015 09:02:34 PM CET [..] Message types for last ~1 hour(s) Occurrences percent server name type ----------- ------- -------------------- ----- 2 66.7% rac1node1 WARNING 1 33.3% rac1node1 generic [..]

RAC Diagnostics 42 30.11.15

RAC Diagnostics 43 30.11.15

Other Tools

Cluster Verification Utility (cluvfy) (1)

  Not only a tool for checking installation requirements J

  Can check the integrity of all cluster components (OCR, OLR, ohasd, …)

  Can run a healthcheck for cluster and databases

  Can collect and compare configuration baselines

  Output: Text or HTML

RAC Diagnostics 44 30.11.15

Cluster Verification Utility (cluvfy) (2) - Healthcheck

grid@rac1node1: $ORACLE_HOME/bin/cluvfyrac.sh comp healthcheck -bestpractice Verifying OS Best Practice [..] ****************************************************************************************** Clusterware recommendations ****************************************************************************************** Verification Check : CSS misscount parameter Verification Description : Checks if the CSS misscount is set correctly on the system Verification Result : PASSED Verification Summary : Check for CSS misscount parameter passed Additional Details : The CSS misscount parameter represents the maximum time, in seconds, that a network heartbeat can be missed before entering into a cluster reconfiguration to evict the node References (URLs/Notes) : https://support.oracle.com/CSP/main/article?cmd=show&type=N OT&id=294430.1 Node Status Expected Value Actual Value ------------------------------------------------------------------------------------------ rac1node1 PASSED 30 30 rac1node3 PASSED 30 30 rac1node2 PASSED 30 30 [..] RAC Diagnostics 45 30.11.15

OSWatcher (1)

  Collects OS statistics in the background

–  CPU

–  Memory

–  Disk I/O

  Installed and activated with TFA collector

  Can generate nice graphics

OSWatcher vs. CHM

–  CHM CPU overhead lower

–  OSWatcher runs with user priority (CHM: Realtime)

–  OSWatcher collects more information

RAC Diagnostics 46 30.11.15

OSWatcher (2)

oracle> /u00/app/oracle/tools/tfa/bin/tfactl run oswbb Starting OSW Analyzer V7.3.1 OSWatcher Analyzer Written by Oracle Center of Expertise Copyright (c) 2014 by Oracle Corporation Parsing Data. Please Wait... Scanning file headers for version and platform info... Parsing file dbserver.markusflechtner.vm_iostat_15.02.22.0800.dat ... Parsing file dbserver.markusflechtner.vm_iostat_15.02.22.0900.dat ... [..] Parsing Completed. Enter 1 to Display CPU Process Queue Graphs Enter 2 to Display CPU Utilization Graphs Enter 3 to Display CPU Other Graphs Enter 4 to Display Memory Graphs Enter 5 to Display Disk IO Graphs [..] Enter Q to Quit Program Please Select an Option:

RAC Diagnostics 47 30.11.15

OSWatcher (3)

RAC Diagnostics 48 30.11.15

RAC Diagnostics 49 30.11.15

Summary

Summary

  Oracle provides a lot of tools to keep a cluster in a healthy state

  There are multiple ways to install the same tool

–  The toolset is not complete integrated in the PSU lifecycle so far

  Overlapping functionality

–  Healthchecks: OraChk vs. cluvfy

–  System performance data: CHM vs. OSWatcher

Σ

RAC Diagnostics 50 30.11.15

RAC Diagnostics 51 30.11.15

Further Information

Some MOS-Notes: •  TFA Collector - Tool for Enhanced Diagnostic Gathering (Doc ID 1513912.1) •  ORAchk - Health Checks for the Oracle Stack (Doc ID 1268927.2) •  oratop - Utility for Near Real-time Monitoring of Databases (Doc ID 1500864.1) •  SQLT Diagnostic Tool (Doc ID 215187.1) •  Procwatcher: Script to Monitor and Examine Oracle DB and Clusterware (Doc ID 459694.1) •  Cluster Verification Utility (CLUVFY) FAQ (Doc ID 316817.1)

Questions and Answers Markus Flechtner Senior Consultant

Phone +49 211 5866 6470 [email protected] @markusdba http://markusdba.de Download the slides from http://www.slideshare.net/markusdba Please don‘t forget the session evaluation – Thank you!

30.11.15 RAC Diagnostics 52