Upload
kellyn-potvin
View
112
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Joint session with JB from Oracle at OOW13/Oracle Open World 2013
Citation preview
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12 of the corporate presentation template 2
ASH Deep Dive: Advanced Performance Analysis Tips John Beresniewicz, Oracle America Kellyn Pot’vin, Enkitec
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 3
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be
incorporated into any contract. It is not a commitment to deliver any material, code, or functionality,
and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality
described for Oracle’s products remains at the sole discretion of Oracle.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 4
Program Agenda
What is ASH?
How does ASH work?
How do we use ASH data?
Enterprise Manager: ASH Analytics
ASH in Action: Kellyn Pot’vin
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 5
What is ASH?
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 6
What is ASH?
Time-based sampling of foreground session state – Highly multi-dimensional view of database activity and therefore DB Time
Observations of specific values of the (DB Time/time) function – This function is called: Average Active Sessions
An instrumentation mechanism that actualizes an important concept
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 7
Important Properties of ASH
Samples represent “snapshots” of session activity at “same time” – Not really true since using latchless mechanism
Sampling is time independent of session activity – Important since otherwise sessions may be over or under-sampled
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 8
Active Session Sampling Time-based captures of state information for active sessions
Sample_t1
Session 1 Session 2 Session 3
Sample_t2 Sample_t3
Session Time State Wait Class SQL_ID Object
t1 1 ON CPU null 53qkkf6yzc2x0 null
t1 2 WAITING User I/O 0naxkcasaz162 EMP
t1 3 WAITING User I/O cs4qrt8kr3uhx EMP
t2 3 WAITING Application 4uh6zm2wg03mx DEPT
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 9
ASH is Highly Multi-dimensional Most of these represent useful investigative paths in some context
desc v$active_session_history
Name Null Type ------------------------------ -------- ---------------- SAMPLE_ID NUMBER SAMPLE_TIME TIMESTAMP(3) IS_AWR_SAMPLE VARCHAR2(1) SESSION_ID NUMBER SESSION_SERIAL# NUMBER SESSION_TYPE VARCHAR2(10) FLAGS NUMBER USER_ID NUMBER . . . 93 rows selected
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 10
SQL Dimensions SQL_ID VARCHAR2(13) IS_SQLID_CURRENT VARCHAR2(1) SQL_CHILD_NUMBER NUMBER SQL_OPCODE NUMBER SQL_OPNAME VARCHAR2(64) FORCE_MATCHING_SIGNATURE NUMBER TOP_LEVEL_SQL_ID VARCHAR2(13) TOP_LEVEL_SQL_OPCODE NUMBER SQL_PLAN_HASH_VALUE NUMBER SQL_PLAN_LINE_ID NUMBER SQL_PLAN_OPERATION VARCHAR2(30) SQL_PLAN_OPTIONS VARCHAR2(30) SQL_EXEC_ID NUMBER SQL_EXEC_START DATE PLSQL_ENTRY_OBJECT_ID NUMBER PLSQL_ENTRY_SUBPROGRAM_ID NUMBER PLSQL_OBJECT_ID NUMBER PLSQL_SUBPROGRAM_ID NUMBER QC_INSTANCE_ID NUMBER QC_SESSION_ID NUMBER QC_SESSION_SERIAL# NUMBER
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 11
Wait Event Dimensions
EVENT VARCHAR2(64) EVENT_ID NUMBER EVENT# NUMBER SEQ# NUMBER P1TEXT VARCHAR2(64) P1 NUMBER P2TEXT VARCHAR2(64) P2 NUMBER P3TEXT VARCHAR2(64) P3 NUMBER WAIT_CLASS VARCHAR2(64) WAIT_CLASS_ID NUMBER WAIT_TIME NUMBER SESSION_STATE VARCHAR2(7) TIME_WAITED NUMBER
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 12
Application Dimensions Instrumented applications can benefit greatly SERVICE_HASH NUMBER PROGRAM VARCHAR2(48) MODULE VARCHAR2(48) ACTION VARCHAR2(32) CLIENT_ID VARCHAR2(64) MACHINE VARCHAR2(64) PORT NUMBER ECID VARCHAR2(64) CONSUMER_GROUP_ID NUMBER TOP_LEVEL_CALL# NUMBER TOP_LEVEL_CALL_NAME VARCHAR2(64) CONSUMER_GROUP_ID NUMBER XID RAW(8) REMOTE_INSTANCE# NUMBER TIME_MODEL NUMBER
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 13
How does ASH work?
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 14
ASH Key Architecture Concepts
In-memory ASH sampling: – Dedicated background process: MMNL – Circular SGA memory buffer: one writer; many readers – Lean and robust mechanism: no locking or latching – Default 1000ms (1 sec) sampling interval
ASH sub-sampling to disk: – Flush to AWR with snapshot or on emergency flush – Default: 1-in-10 of the 1-sec samples are persisted – Future: continuous sub-sampling
Session activity sampled efficiently into memory and onto disk
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 15
• MMNL writes to ASH circular buffer one way
• Readers of V$ASH start at current write pointer
• Readers proceed in opposite direction of MMNL through buffer
• Stop when current sample_id > last read sample_id
• SELECT from V$ASH returned recent-last order
Reading / Writing in Opposite Directions
MMNL
SALLY start
SALLY finish
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 16
Sampling Pseudo-code (lean and mean, but there is a hole)
1) FOR ALL SESSION STATE OBJECTS
2) IS SESSION CONNECTED? NO => NEXT SESSION YES:
3) IS SESSION ACTIVE? NO => NEXT SESSION YES:
4) MEMCPY SESSION STATE OBJ 5) CHECK CONSISTENCY OF COPY WITH LIVE SESSION 6) IS COPY CONSISTENT? YES: WRITE ASH ROW FROM COPY NO: IF FIRST COPY, REPEAT STEPS 4-6 ELSE => NEXT SESSION (NO ASH ROW WRITTEN)
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 17
Default Settings
Sampling interval = 1000ms = 1 sec
Disk filter ratio = 10 = 1 in 10 samples written to AWR
ASH buffer size: – Min( Max (5% shared pool, 2% SGA), 2MB per CPU) – Absolute Max of 256MB
These are carefully chosen for maximum general utility
NOTE: the MMNL sampler session is not sampled
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 18
Control Parameters
_ash_size : size of ASH buffer in bytes – K/M notation works (e.g. 200M)
_ash_sampling_interval : in milliseconds – Min = 100, Max = 10,000
_ash_disk_filter_ratio : every Nth sample to AWR – MOD(sample_id, N) = 0 where N=disk filter ratio
_sample_all : samples idle and active sessions
(geeks want underscores)
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 19
V$ASH_INFO New in 11.2 (but unfortunately un-documented)
desc v$ash_info Name Null Type ------------------------------ -------- -------------- TOTAL_SIZE NUMBER FIXED_SIZE NUMBER SAMPLING_INTERVAL NUMBER OLDEST_SAMPLE_ID NUMBER OLDEST_SAMPLE_TIME TIMESTAMP(9) LATEST_SAMPLE_ID NUMBER LATEST_SAMPLE_TIME TIMESTAMP(9) SAMPLE_COUNT NUMBER SAMPLED_BYTES NUMBER SAMPLER_ELAPSED_TIME NUMBER DISK_FILTER_RATIO NUMBER AWR_FLUSH_BYTES NUMBER AWR_FLUSH_ELAPSED_TIME NUMBER AWR_FLUSH_COUNT NUMBER AWR_FLUSH_EMERGENCY_COUNT NUMBER
Compute buffer time window size
Compute average time per sample
DROPPED_SAMPLE_COUNT NUMBER
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 20
ASH is Robust when CPU-constrained
1. ASH sampler is very efficient and does not lock – Should complete a sample within a single CPU slice
2. After sampling, the sampler computes next scheduled sample time and sleeps until then
3. Upon scheduled wake-up, it waits for CPU (runq) and samples again – CPU bound sample times are shifted by one runq but intervals stay close
to 1 second
(These are precisely times when reliable data is necessary)
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 21
ASH Sampler and Run-queue Sampling interval is consistent under CPU-starvation
S_t0 S_t2 S_t1
Run queue Run queue
A_t1 A_t0
Run queue
A_t2
Sleep until next
time Sleep until next
Sample Sample Sample
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 22
The ASH “Fix-up”
ASH column values may be unknown at sampling time – TIME_WAITED: session is still waiting – PLAN_HASH: session is still optimizing SQL – GC events: event details unknown at event initiation
ASH “fixes up” data during subsequent sampling – TIME_WAITED fixed up in first sample after event completes – Long events: last sample gets correct TIME_WAITED (all others 0)
Querying V$ASH may return un-fixed rows – Should not be a problem generally
A unique and very important feature
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 23
How do we use ASH data?
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 24
How do we use ASH data?
Estimate DB Time and Average Active Sessions – For specific time intervals – Decomposed and filtered many ASH dimensions
Investigate tuning opportunities – Excesses of DB Time in tune-able areas
ASH Forensics – Figure out “what happened to SID?”
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 25
ASH Math: Estimating DB Time from ASH
Each ASH row counts for :INTERVAL of active session time
Default for :INTERVAL is 1 second (1000 ms)
Therefore COUNT(*) = DB Time in seconds
This is what I call “ASH Math”
An estimate because it is computed over a sample of true reality
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 26
ASH Math and DB Time The count of sampled rows is an estimate (unbiased) of DB time
Estimate DB Time COUNT (ASH SAMPLED ROWS)
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 27
Computing Average Active Sessions
AAS = DELTA(DB TIME) / DELTA(elapsed_time) – Over some time interval(s) of sampled workload
SUM(:sampling_interval) / [ MAX(sample_time) – MIN(sample_time) ] – Normalized to common time units, e.g. seconds
COUNT(*) / [ (MAX(sample_id) – MIN(sample_id) ] – This works for default sampling interval and one time interval
The centerpiece measure for EM Activity charts and ASH Analytics
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 28
Bad ASH Math and TIME_WAITED These mistakes are very common and very wrong
AVG(TIME_WAITED) This does not estimate average event latencies because sampling is biased toward longer events
SUM(TIME_WAITED)
This does not compute total wait time in the database since ASH does not contain all waits.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 29
ASH Timing for Nano-operations
Some important operations are still too frequent and short-lived for timing – E.g. no wait event for “bind” operations
A session-level bit vector is updated in binary fashion before/after such operations
– Much cheaper than timer calls
The session bit vector is sampled into ASH
“ASH Math” used to estimate time spent in un-timed transient operations
Magic trick: timing what cannot be timed
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 30
“ON CPU” and ASH
ASH session status ‘ON CPU’ is derived, not observed – Session is in a database call – Session is NOT in a wait event (idle or non-idle)
Un-instrumented waits => ‘ON CPU’ – These are bugs and should be rare, but have happened
Sessions on run queue may be ‘WAITING’ or ‘ON CPU’ – Depends on state prior to going onto run queue
ASH CPU and Time Model CPU don’t always agree
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 31
Enterprise Manager: ASH Analytics
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 32
Display AAS by wait class over time 5-minute Time Selector for details Top SQL and Top Sessions
– Broken down by wait class – Additional fact columns
User-selectable Top dimension
Average Active Sessions
Origin: EM Top Activity
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 33
Top Lists not graphically comparable
– “% Activity” depends on sample count
Time Series by Wait Class only
– What about SQL, User, etc? Lots of wasted visual real-estate
Design Issues
Top Activity
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 34
EM ASH Analytics
Logical extension of EM Top Activity page Average Active Sessions (AAS) over time
– Decomposed by user-selectable ASH dimension (“parent” dimension) “Top” Lists by two other user-selectable ASH dimensions
– With breakdown by “parent” dimension ASH Analytics Loadmap
– AAS decomposed into Treemap of up to 3 ASH dimensions – Investigate skew and/or balance of load over dimension combinations – Investigate possible cause-effect relationships
Flexible multi-dimensional ASH-based performance analysis tool
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 35
Load (AAS) over time with time selector
– Selected time broken down by ASH dimension
2 “Top” lists by other dimensions – Broken down by parent
dimension also 4 Charts with a shared dimension
– Extremely powerful
EM ASH Analytics
Average Active Sessions
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 36
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 37
Space-filling, scales well Decompose load (AAS) by
multiple ASH dimensions Hierarchical decomposition Some hierarchies natural,
others investigative
EM ASH Analytics: Loadmap
ASH Treemap Visualization
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 38
Graphic Section Divider
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 39
Tanel Poder Consultant, Enkitec
THIRD PARTY COMPANY LOGO
Active Session History has radically changed the performance diagnosis of Oracle Databases, by design. With ASH you have detailed performance data always and immediately available… This translates to much faster problem solution times and also more accurate diagnosis…
About Us I am… • Oracle ACE Director • Sr. Technical Consultant, Enkitec
Enkitec is… Oracle Platinum Partner specializing in:
Oracle Exadata Oracle Database, including RAC Oracle Database Performance Tuning Oracle APEX and so much more!
The Consultant’s Challenge “Hybrid” workload environment:
Transactional, ETL, Reporting Upgraded to 11g in previous year Consistent degradation since upgrade
ETL down from 400 “businesses” per hour to 2‐300 ETL code review and enhancement in works
“What can you do for us now outside of that effort?”
Goal: Load 700 businesses per hour!!
Oracle Tools of the Trade AWR Reports:
First offered by onsite DBA, always available “Averaging” effect of large snapshot times hiding issues
ASH Reports: Help identify problem times with finer granularity Target reports to problem times, gives clearer picture
Enterprise Manager 12c Used to enhance ASH findings and do further research Top Activity, SQL Details, ASH Analytics
Why AWR Wasn’t the Answer The “problem” was not visible
We expect to use CPU and to do I/O Did not want to alter AWR snapshot timing but needed finer‐grained time view
Problem not related to workload change or data volumes, ETL just degrading over time
Why ASH Was… Exposed competing PL/SQL procedures More definitive breakdown of data Zero‐in on problem time Session level information Interested in impacts not frequencies
ASH Report Targets CPU Spike Breakdown by the minute, by interval
CPU spikes in four minute period
ASH Top SQL Exposes OddiFes STATS_ADMIN?? SQL Analyze??
} What does this SQL Originate from?
EM Exposes problem SQL Profiles EM Search SQL found multiple plans for critical ETL statements with vastly different performance (?)
Click‐through bad plan to expose existence of SQL Profile Oops, profiles are supposed to fix plans!
What Caused This? High profile environment, very sensitive to change
Stats collection using custom wrapper over deprecated Oracle package dating back prior to 9i (DBMS_ADMIN)
Also using 11g stats collection (DBMS_STATS) DBMS_ADMIN was deprecated for a reason!
Analysis of object stats providing poor data to CBO Other automated maintenance window tasks were expensive and competing for resources at exactly the wrong time (i.e. ETL time)
Steps to Correct Migrated to DBMS_STATS for all stats collection
Disable jobs using custom wrapper over DBMS_ADMIN
Removed SQL Profiles impacting bad ETL plans Additional steps taken:
Migrated select b‐tree indexes to bitmap indexes, also much needed disk space.
Continued to review ASH, AWR and Session SQL performance for improvement.
Victory within reach… Throughput improvement after Stats gathering changes
Where They Are Today… With further physical and logical tuning:
750 !! GOAL ACHIEVED
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 52
Graphic Section Divider
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 53
A SHORT SEQUENCE OF USING THE TOOL ON A REAL SYSTEM
JB’S ASH ANALYTICS ADVENTURE
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 54
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 55
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 56
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 57
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 58
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 59
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 60
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 61
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 62
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 63
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 64
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 65