Upload
markus-flechtner
View
835
Download
1
Embed Size (px)
Citation preview
BASLE BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH
RAC Diagnostics .. with OraCHK, CHM and TFA
Markus Flechtner Senior Consultant
Our company.
RAC Diagnostics 2 30.11.15
Trivadis is a market leader in IT consulting, system integration, solution engineering and the provision of IT services focusing on and technologies in Switzerland, Germany, Austria and Denmark. We offer our services in the following strategic business fields: Trivadis Services takes over the interactive operation of your IT systems.
O P E R A T I O N
COPENHAGEN
MUNICH
LAUSANNE BERN
ZURICH BRUGG
GENEVA
HAMBURG
DÜSSELDORF
FRANKFURT
STUTTGART
FREIBURG
BASLE
VIENNA
With over 600 specialists and IT experts in your region.
RAC Diagnostics 3 30.11.15
14 Trivadis branches and more than 600 employees
200 Service Level Agreements
Over 4,000 training participants
Research and development budget: CHF 5.0 / EUR 4 million
Financially self-supporting and sustainably profitable
Experience from more than 1,900 projects per year at over 800 customers
About Markus Flechtner
Senior Consultant, Trivadis, Duesseldorf/Germany, since April 2008
Discipline Infrastructure Database @Trivadis
Working with Oracle since the 1990’s – Development (Forms, Reports, PL/SQL) – Support – Database Administration
Focus – Oracle Real Application Clusters – Database Migration Projects
Teacher – O-RAC – Oracle Real Application Clusters – O-NF12CDBA – Oracle 12c New Features for the DBA
Blog: http://markusdba.de/ @markusdba
RAC Diagnostics 4 30.11.15
Our database doctors ..
Dr. ORAchk – Regular screening examination
Dr. CHM – Electrocardiogram (ECG)
Dr. TFA – In case of emergency
RAC Diagnostics 5 30.11.15
Oracle Support Tools Bundle
Collection of database and RAC support tools
Includes: – ORAchk – ExaChk (*) – like OraChk, but for Engineered Systems – OSWatcher – ProcWatcher (*) – tool to examine and monitor Oracle database and/or clusterware
processes – ORATOP (*) - near real-time monitoring of databases – SQLT (*) – helps in tuning SQL statements – DARDA (*) - Diagnostic Assistant - interface for other diagnostic tools
Integrated in TFA collector since release 12.1.2.3.0
– Tools can be downloaded separately, too
(*) not covered by this talk
RAC Diagnostics 6 30.11.15
Running the tools from TFA collector
oracle> /u00/app/oracle/tools/tfa/bin/tfactl toolstatus .---------------------------------------. | External Support Tools | +----------+--------------+-------------+ | Host | Tool | Status | +----------+--------------+-------------+ | dbserver | alertsummary | DEPLOYED | | dbserver | darda | DEPLOYED | | dbserver | oratop | DEPLOYED | | dbserver | rgrep | DEPLOYED | | dbserver | exachk | DEPLOYED | | dbserver | orachk | DEPLOYED | | dbserver | oswbb | RUNNING | | dbserver | sqlt | DEPLOYED | | dbserver | prw | NOT RUNNING | | dbserver | tail | DEPLOYED | | dbserver | ncdvi | DEPLOYED | '----------+--------------+-------------' oracle> /u00/app/oracle/tools/tfa/bin/tfactl run alertsummary
RAC Diagnostics 7 30.11.15
Agenda
RAC Diagnostics 8 30.11.15
1. ORAchk
2. Cluster Health Monitor (CHM)
3. Trace File Analyzer (TFA) Collector
4. Other tools
ORAchk – Purpose & History
Available since July 2011
Oracle Configuration Audit Tool
Formerly known as "RACCheck"
Supported on Unix, Linux and Windows (Cygwin/Standalone version)
Checks your installation against more than 7.000 Oracle Best Practices
Results can be stored in a database
RAC Diagnostics 10 30.11.15
ORAchk – Not a RAC or database tool only
ORAchk includes checks for
– Oracle Database (Single Instance + RAC)
– MAA Validation
– Upgrade Readiness
– Golden Gate
– Enterprise Manager 12c Cloud Control
– E-Business Suite
– Oracle Sun Server
RAC Diagnostics 11 30.11.15
ORAchk - Installation
Clusterware 11.2.0.4 and 12.1.0.2
– Installed with the software (into $ORACLE_HOME/suptools/orachk)
– So far not updated with the PSUs L
For older versions
– Install TFA Collector 12.1.2.3.0 or higher
– Download ORAchk via MOS 1268927.2
RAC Diagnostics 12 30.11.15
ORAchk – Basic Command Line Options
Option Meaning -a Run all Checks -b Best Practice Check only -p Patch Check Only -u –o pre|post Pre or Post Upgrade Checks
-dbnames run for a subset of databases only -clusternodes run for a subset of nodes only
-h Help on all available parameters (long list)
RAC Diagnostics 13 30.11.15
ORAchk – Sample Output (1) – at runtime
ORAchk checks O/S, clusterware and databases on all nodes
Result: ZIP-File and HTML-Report
RAC Diagnostics 14 30.11.15
ORAchk – Advanced Command Line Options
Option Meaning -diff Compare 2 reports
-d Manage ORAchk daemon
-profile Run for specific components or applications like: • ASM • Clusterware • EBS • MAA • Goldengate • Enterprise Manager 12c .. And more
RAC Diagnostics 17 30.11.15
ORAchk – Collection Manager (1)
ORAchk results can be stored in a repository database
Collection Manager is a GUI for the repository database
APEX application (4.2.0 or higher)
– Import.sql is delivered with ORAchk software
Installation
– Create database user for ORAchk
– create 3 tables (see Appendix F of the OraChk Users Guide)
– Install APEX application
RAC Diagnostics 18 30.11.15
ORAchk – Collection Manager (2)
Set environment
Run ORAchk
– If the environment is set, then the data will be inserted into the repository database
export RAT_UPLOAD_CONNECT_STRING="(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=dbserver)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=EMREP)))" export RAT_UPLOAD_TABLE=auditcheck_result export RAT_PATCH_UPLOAD_TABLE=auditcheck_patch_result export RAT_ZIP_UPLOAD_TABLE=RCA13_DOCS export RAT_UPLOAD_USER=orachk export RAT_UPLOAD_PASSWORD=orachk export RAT_UPLOAD_ORACLE_HOME=/u00/app/oracle/product/11.2.0.4
RAC Diagnostics 19 30.11.15
Cluster Health Monitor (CHM)
Available since 11.2.0.2
Collects OS information of the cluster nodes – CPU load – Memory – Top Processes – File Systems – System information
Components – sysmond (on every cluster node) – loggerd
Cluster Resource crf
RAC Diagnostics 25 30.11.15
Cluster Health Monitor (CHM) – CLI oclumon
grid@rac1node1:~/ oclumon –h For help in interactive mode : <verb> -h Currently supported verbs are : dumpnodeview, manage, version, debug, analyze, quit, exit, and help
Option Dumpnodeview Shows collected data (for specific nodes and/or a specific timewindow
Manage Manages the CHM repository and show Version Shows version information Debug Debugs CHM components Analyze Deprecated, will be ignored
RAC Diagnostics 26 30.11.15
Cluster Health Monitor (CHM) – CLI show data
grid@rac1node1:~/ [grid12102] oclumon dumpnodeview dumpnodeview: Node name not given. Querying for the local host ---------------------------------------- Node: rac1node1 Clock: '15-02-22 18.05.43 ' SerialNo:1440 ---------------------------------------- SYSTEM: #pcpus: 1 #vcpus: 2 cpuht: N chipname: Intel(R) cpu: 20.59 cpuq: 0 physmemfree: 393676 physmemtotal: 4958228 mcache: 2506540 swapfree: 3956548 swaptotal: 3964924 hugepagetotal: 0 hugepagefree: 0 hugepagesize: 2048 ior: 156 iow: 78 ios: 32 swpin: 0 swpout: 0 pgin: 155 pgout: 59 netr: 102.554 netw: 75.683 procs: 323 procsoncpu: 2 rtprocs: 13 rtprocsoncpu: N/A #fds: 20704 #sysfdlimit: 6815744 #disks: 9 #nics: 4 nicErrors: 0 TOP CONSUMERS: topcpu: 'mdb_vktm_-mgmtd(5402) 4.39' topprivmem: 'java(2046) 171088' topshm: 'ora_mman_raccdb(5479) 300808' topfd: 'oraagent.bin(4891) 251' topthread: 'console-kit-dae(3254) 64' [..]
RAC Diagnostics 27 30.11.15
Cluster Health Monitor (CHM) – -MGMTDB (1)
In Oracle 12c CHM data is stored in the Grid Infrastructure Management Repository (GIMR), SID=-MGMTDB
– Mandatory with 12.1.0.2
– Single instance database
– Multitenant database with 12.1.0.2 (PDB-name = clustername)
– Basic installation needs about 5 GB in the diskgroup with OCR and voting files
– Additional listener MGMTLSNR
Required size depends on number of nodes and retention time
– About 1,3 GB + 500 MB/node
– Check and configure with "oclumon"
RAC Diagnostics 28 30.11.15
Cluster Health Monitor (CHM) – -MGMTDB (2) - Tools
mgmtca (for initial configuration only) Srvctl oclumon
– Oracle recommends a retention time of 72 h ( = 259200 seconds)
grid@rac1node2:~/ oclumon manage -h Manage verb usage ================= manage -repos {checkretentiontime <time> | changerepossize <memsize>} | -get {<key1> [<key2> ...] | alllogger [-details] | mylogger [-details]} .. grid@rac1node2:~/ oclumon manage -repos checkretentiontime 259200 The Cluster Health Monitor repository is too small for the desired retention. Please first resize the repository to 5844 MB
RAC Diagnostics 29 30.11.15
Cluster Health Monitor (CHM) – EM 12c Cloud Control
CHM data can be displayed in EM 12c Cloud Control
RAC Diagnostics 30 30.11.15
Cluster Health Monitor (CHM) – Memory Guard
Evaluates the memory usage on the cluster nodes based on data collected by Cluster Health Monitor (CHM)
Automatically stops database services (transactional) in case of memory pressure on a cluster node
– .. or even kills database sessions
.. and automatically reactivates the services when enough memory is available
Starting with Oracle12.1.0.2 Memory Guard is automatically activated
RAC Diagnostics 31 30.11.15
Real life experience ..
26 node cluster
– 5 databases
Strange ASM issue
Oracle Support requested
– Clusterware logs
– ASM alert.logs
– Database alert.logs
For each of the 26 servers!!
RAC Diagnostics 33 30.11.15
Trace File Analyzer Collector
Initial release in January 2013
Current version 12.1.2.3.1
Collects trace and log files and system information from all nodes into a cluster with a single command initiated on one cluster node
Centralized output
Real-time scanning for specific error messages possible
– è Automatic collection of diagnostic information
Included in Clusterware 11.2.0.4 and 12.1.0.2
For other versions (10.2 or higher): – Download from MOS: 1513912.1 – RAC and DB Support Tools Bundle is included in current TFA package
RAC Diagnostics 34 30.11.15
TFA Collector – Installation
For Clusterware 11.2.0.4 and 12.1.0.2: No additional installation required
For older versions: [root@rac1node1 tmp]# ./installTFALite.sh Starting TFA installation Enter a location for installing TFA [/tmp]: /u00/app/oracle Checking for available space in /u00/app/oracle Enter a Java Home that contains Java 1.6 or later : /usr/java/jre1.7.0_13 Running Auto Setup for TFA as user root… Would you like to do a [L]ocal only or [C]lusterwide installation ? [L|l|C|c] [C] : C The following installation requires temporary use of SSH. If SSH is not configured already then we will remove SSH when complete. Do you wish to Continue ? [Y|y|N|n] [N] y Installing TFA at /u00/app/oracle in all hosts Discovering Nodes and Oracle resources Checking whether CRS is up and running ..
RAC Diagnostics 35 30.11.15
TFA Collector – Architecture
JAVA-based tool
TFA-daemon “TFAMain” running on all cluster nodes
Data Storage
– File-Repository for Diagnostic Information
– Berkeley Database for metadata, file inventory, event history, etc.
Command Line Interface
– tfactl (perl)
– Communication with daemon using secure sockets
oracle@rac1node1:~/ [rdbms12102] ps -ef |grep tfa |grep –v grep root 2325 1 0 10:14 ? 00:00:03 /bin/sh /etc/init.d/init.tfa run root 3631 1 0 10:16 ? 00:05:10 /u00/app/grid/product/12.1.0.2/jdk/jre/bin/java – [..] oracle.rat.tfa.TFAMain /u00/app/grid/product/12.1.0.2/tfa/rac1node1/tfa_home
RAC Diagnostics 36 30.11.15
TFA Collector – Commands (1) – Command Overview
oracle@rac1node1:/home/grid/ $ORACLE_HOME/tfa/bin/tfactl Usage : /u00/app/grid/product/12.1.0.2/bin/tfactl <command> [options] <command> = print Print requested details analyze List events summary and search strings in alert logs. diagcollect Collect logs from across nodes in cluster collection Manage TFA collections directory Add or Remove or Modify directory in TFA toolstatus Prints the status of TFA Support Tools run <tool> Run the desired support tool start <tool> Starts the desired support tool stop <tool> Stops the desired support tool restart <tool> Restarts the desired support tool For help with a command: /oracle/u00/app/oracle/tools/tfa/bin/tfactl <command> -help
RAC Diagnostics 37 30.11.15
TFA Collector – Commands (2) – commands for root
Configuration tasks must be done by root
The following additional commands are available: <command> = start Starts TFA stop Stops TFA enable Enable TFA Auto restart disable Disable TFA Auto restart access Add or Remove or List TFA Users and Groups purge Delete collections from TFA repository directory Add or Remove or Modify directory in TFA host Add or Remove host in TFA set Turn ON/OFF or Modify various TFA features uninstall Uninstall TFA from this node diagnosetfa Collect TFA Diagnostics ..
RAC Diagnostics 38 30.11.15
TFA Collector – Commands (3) – print config
root@rac1node1:/home/grid/ $ORACLE_HOME/tfa/bin/tfactl print config +--------------------------------------------+------------+ | Configuration Parameter | Value | +---------------------------------------------+------------+ | TFA version | 12.1.2.3.1 | | Automatic diagnostic collection | OFF | | Trimming of files during diagcollection | ON | | Repository current size (MB) | 7 | | Repository maximum size (MB) | 10240 | | Inventory Trace level | 1 | | Collection Trace level | 1 | | Scan Trace level | 1 | | Other Trace level | 1 | | Max Size of TFA Log (MB) | 50 | | Max Number of TFA Logs | 10 | | Max Size of Core File (MB) | 20 | | Max Collection Size of Core Files (MB) | 200 | | Automatic Purging | ON | | Minimum Age of Collections to Purge (Hours) | 12 | '---------------------------------------------+------------'
RAC Diagnostics 39 30.11.15
TFA Collector – Commands (4) - diagcollect
Collects trace and log files from the cluster nodes grid@rac1node1:~/ [grid12102] $ORACLE_HOME/tfa/bin/tfactl diagcollect Collecting data for the last 4 hours for all components... Collecting data for all nodes Repository Location in rac1node1 : /u00/app/oracle/tfa/repository 2015/02/21 20:28:24 CET : Running an inventory clusterwide ... 2015/02/21 20:28:24 CET : Collection Name : tfa_Sat_Feb_21_20_28_06_CET_2015.zip 2015/02/21 20:28:24 CET : Sending diagcollect request to host : rac1node2 2015/02/21 20:28:24 CET : Sending diagcollect request to host : rac1node3 .. Logs are being collected to: /u00/app/oracle/tfa/repository/collection_Sat_Feb_21_20_28_06_CET_2015_node_all/rac1node1.tfa_Sat_Feb_21_20_28_06_CET_2015.zip /u00/app/oracle/tfa/repository/collection_Sat_Feb_21_20_28_06_CET_2015_node_all/rac1node2.tfa_Sat_Feb_21_20_28_06_CET_2015.zip /u00/app/oracle/tfa/repository/collection_Sat_Feb_21_20_28_06_CET_2015_node_all/rac1node3.tfa_Sat_Feb_21_20_28_06_CET_2015.zip
RAC Diagnostics 40 30.11.15
TFA Collector – Commands (5) - diagcollect
Which data is collected by default? – alert.log from all databases – ASM log files – listener.log files – Clusterware logs – CHM information – Patch information
Components, node list and time window can be specified
Automatic diagnostic collection
– Tfa scans the alert.log files and runs "diagcollect" automatically
– Disabled by default
root@rac1node1:~/ tfactl set autodiagcollect=<ON|OFF> [-c]
RAC Diagnostics 41 30.11.15
TFA Collector – Commands (6) - analyze
Checks system log files and Oracle log files on all nodes root@rac1node1:~/ [grid12102] $ORACLE_HOME/tfa/bin/tfactl analyze INFO: analyzing all (Alert and Unix System Logs) logs for the last 60 minutes... Please wait... INFO: analyzing host: rac1node1 Report title: Analysis of Alert,System Logs Report date range: last ~1 hour(s) Report (default) time zone: CET - Central European Time Analysis started at: 21-Feb-2015 09:02:34 PM CET [..] Message types for last ~1 hour(s) Occurrences percent server name type ----------- ------- -------------------- ----- 2 66.7% rac1node1 WARNING 1 33.3% rac1node1 generic [..]
RAC Diagnostics 42 30.11.15
Cluster Verification Utility (cluvfy) (1)
Not only a tool for checking installation requirements J
Can check the integrity of all cluster components (OCR, OLR, ohasd, …)
Can run a healthcheck for cluster and databases
Can collect and compare configuration baselines
Output: Text or HTML
RAC Diagnostics 44 30.11.15
Cluster Verification Utility (cluvfy) (2) - Healthcheck
grid@rac1node1: $ORACLE_HOME/bin/cluvfyrac.sh comp healthcheck -bestpractice Verifying OS Best Practice [..] ****************************************************************************************** Clusterware recommendations ****************************************************************************************** Verification Check : CSS misscount parameter Verification Description : Checks if the CSS misscount is set correctly on the system Verification Result : PASSED Verification Summary : Check for CSS misscount parameter passed Additional Details : The CSS misscount parameter represents the maximum time, in seconds, that a network heartbeat can be missed before entering into a cluster reconfiguration to evict the node References (URLs/Notes) : https://support.oracle.com/CSP/main/article?cmd=show&type=N OT&id=294430.1 Node Status Expected Value Actual Value ------------------------------------------------------------------------------------------ rac1node1 PASSED 30 30 rac1node3 PASSED 30 30 rac1node2 PASSED 30 30 [..] RAC Diagnostics 45 30.11.15
OSWatcher (1)
Collects OS statistics in the background
– CPU
– Memory
– Disk I/O
Installed and activated with TFA collector
Can generate nice graphics
OSWatcher vs. CHM
– CHM CPU overhead lower
– OSWatcher runs with user priority (CHM: Realtime)
– OSWatcher collects more information
RAC Diagnostics 46 30.11.15
OSWatcher (2)
oracle> /u00/app/oracle/tools/tfa/bin/tfactl run oswbb Starting OSW Analyzer V7.3.1 OSWatcher Analyzer Written by Oracle Center of Expertise Copyright (c) 2014 by Oracle Corporation Parsing Data. Please Wait... Scanning file headers for version and platform info... Parsing file dbserver.markusflechtner.vm_iostat_15.02.22.0800.dat ... Parsing file dbserver.markusflechtner.vm_iostat_15.02.22.0900.dat ... [..] Parsing Completed. Enter 1 to Display CPU Process Queue Graphs Enter 2 to Display CPU Utilization Graphs Enter 3 to Display CPU Other Graphs Enter 4 to Display Memory Graphs Enter 5 to Display Disk IO Graphs [..] Enter Q to Quit Program Please Select an Option:
RAC Diagnostics 47 30.11.15
Summary
Oracle provides a lot of tools to keep a cluster in a healthy state
There are multiple ways to install the same tool
– The toolset is not complete integrated in the PSU lifecycle so far
Overlapping functionality
– Healthchecks: OraChk vs. cluvfy
– System performance data: CHM vs. OSWatcher
Σ
RAC Diagnostics 50 30.11.15
RAC Diagnostics 51 30.11.15
Further Information
Some MOS-Notes: • TFA Collector - Tool for Enhanced Diagnostic Gathering (Doc ID 1513912.1) • ORAchk - Health Checks for the Oracle Stack (Doc ID 1268927.2) • oratop - Utility for Near Real-time Monitoring of Databases (Doc ID 1500864.1) • SQLT Diagnostic Tool (Doc ID 215187.1) • Procwatcher: Script to Monitor and Examine Oracle DB and Clusterware (Doc ID 459694.1) • Cluster Verification Utility (CLUVFY) FAQ (Doc ID 316817.1)
Questions and Answers Markus Flechtner Senior Consultant
Phone +49 211 5866 6470 [email protected] @markusdba http://markusdba.de Download the slides from http://www.slideshare.net/markusdba Please don‘t forget the session evaluation – Thank you!
30.11.15 RAC Diagnostics 52