16
Common Shared Pool Issues

Shared Pool

Embed Size (px)

Citation preview

Page 1: Shared Pool

Common Shared Pool Issues

Page 2: Shared Pool

Why Bother?

• Shared pool is the core to make Oracle to run smoothly and efficiently.

• ORA-04031 error– When free shared pool memory is low– When free memory is fragmented

• Database Hangs– When data dictionary cache, or row cache, runs into problem, for

example, large number of purge activities

• How are the issues dealt with?– Flush shared pool– Bounce the instance– Repeat above processes later

Page 3: Shared Pool

Shared Pool Top Consumers

First thing first: find the top consumers.

Data Dictionary Cache (Row Cache)

Partition maintenance related

SQL CursorsRAC related

Page 4: Shared Pool

Row Cache – KQR L PONext step: the devil is in the details.

1. KQR L PO: KQR - kernel query layer row cache, L – large (S – small, M - medium, X – extra large), PO - parent object (SO – subordinate object).

2. Size per entry: S – 256 bytes, M – 512 bytes, L – 1024 bytes, X - not sure3. Typical KQR L PO: dc_objects(dc_object_ids), dc_segments, dc_histogram_defs.4. KQR L PO too big?

• dc_objects: too many table/index partitions and subpartitions• dc_segments: too many table/index partitions and subpartitions• dc_histogram_defs: looks like too many histograms. Further research points to

column statistics, at global, partition and subpartition level.

Page 5: Shared Pool

The Impact of Very Large Row Cache

• When free memory is low or fragmented: instance slow down or hang. GV$ view hangs.

• This happens when one or more operations or queries need large number of shared pool memory or want to bring a large number of data dictionary entries into row cache.

• Oracle tries to purge row cache entries to free memory, LCK0 process will become busy.

• Combination of the following waits:– latch: row cache objects– Latch: shared pool– row cache lock– library cache lock– DFS lock handle– reliable messages

Page 6: Shared Pool

Find the Source of ROW CACHE

• dc_objects, dc_object_ids and dc_segments– Using dba_objects, count and group by owner and object_name.

• dc_histogram_defs– Using hist_head$ count to have a rough idea.– Using aggregated view dba_part_col_statistics or

dba_subpart_col_statistics, count and group by owner and table name.

– Check if global stats for concerned tables are missing.

• Dc_histogram_data– Using dba_part_histograms and dba_subpart_histograms, count

and group by owner and table name– Check stats gather procedures, especially for partition and sub

partition level stats.

Page 7: Shared Pool

Who Triggered Large Row Cache Activities?

• Using fixed view– Looking for SQLs with high execution counts, related to obj$, seg$ and

hist_head$, usually with parsing schema as sys.– Tracking the change of KQR L PO and the execution count of those

recursive queries.– Because those are recursive calls, we need find out the source user

queries. If we suspect some queries, we should try to find a chance to trace without other types of querying running at the same time.

• Using AWR– dba_hist_sgastat: looking for large changes by snaps for individual

entries.– dba_hist_rowcache_summary: looking for large changes of

total_usages by snaps for individual row cache, also pay attention to dlm_releases (purge) and dlm_requests.

– For the snaps with large dlm_releases, using dba_hist_sqlstats to check if any new queries with large SQL area, or any new queries could access the tables related to the tables with large number of partitions/subpartitions.

Page 8: Shared Pool

1. Here dc_histogram_defs, DLM_REQUESTS count is very close to SQL 96g93hntrzjtr execution count. Its usage count is smaller because we also have DML_RELEASES count, which usually means row cache purge. If the SQL execution count and those row cache counts increases/decrease very quickly in very short period, we should check what type of queries are possible troublemakers.

2. The data from dc_segments is linked to another query 9tgj4g8y4wy8.

Page 9: Shared Pool

1. Here dc_segments DLM_REQUESTS count is very close to SQL 2ym6hhar30r73 execution count. Its usage count is smaller because we also have DML_RELEASES count, which usually means row cache purge. If the SQL execution count and those row cache counts increases/decrease very quickly in very short period, we should check what type of queries are possible troublemakers.

2. The data from dc_object_ids is linked to another query 5ngzsfstg8tmy.3. I am not sure why I got service SYS$BACKGROUND for SQL

6cr55dpp3n44a. Looks like it links to dc_objects

Page 10: Shared Pool

New Entries were added. Pay attention to the time

with large DLM_REQUESTS_DIFF

and/or DLM_RELEASE_DIFF

Short of free memory, started to purge older entries. When the purge count is large, process LCK0 will be very busy, hence DB slow down of hang.

Stat job started during this snap.

Page 11: Shared Pool

Possible Work Around

• Reduce the sudden large shared pool memory requests, for example, in PIEDB, the issue is large parallel query cursors cross instances. Reducing DOP or forcing the query not crossing instances will reduce sharable memory usages.

• Use node affinity to split the tasks to different nodes. One task will always run on one node except failover. This way, a single node will not get all the requests for all the (unnecessary) row cache entries. On the other hand, the possibility for one similar task to squeeze the row caches from other tasks will be much smaller.

Page 12: Shared Pool

Large PRTMV

1. Bug 9578670 - ORA-4031 from frequent Partition Maintenance Operation [ID 9578670.8]

2. When Oracle adds/drops a partition, the library cache object related to the table becomes invalid.

3. Inside v$db_object_cache, we will see multiple counts of the table name with type as MULTI-VERSIONED OBJECT. (In 10g, they will have type as invalid.)

4. Unfortunately, memory leaks cause these invalid objects stay inside shared pool forever.

Page 13: Shared Pool

SQLA - High Version Counts

1. SQLA of the instance (acpm51p1) is at 1,717MB. SQL 193t90vhkylj4 uses 1,534MB, with version count of 18,191.

2. High shared memory usage SQL can be found using v$sqlarea3. The reason of high version count can be found using v$sql_shared_cursor. It will

take pretty long time if a SQL has very high version count, so it is better to check individual child cursor by specifying child_number.

4. In this case, the high version is caused by PX_MISMATCH, an issue when using gv$ views. Other important reasons include PQ_SLAVE_MISMATCH, ROLL_INVALID_MISMATCH and BIND_MISMATCH. 11g also introduced another annoying mismatch: USE_FEEDBACK_STATS from cardinality feedback.

5. One type of PX_MISMATCH and PQ_SLAVE_MISMATCH is caused by default parallel degrees on small table or its indexes and the query is also very small.

Page 14: Shared Pool

1. No dominant SQL or high version count for high shared memory usages.

2. The major consumers of SQL area are CTAS DDL queries. Not sure why Oracle wants to cache so many DDL queries.

3. The impact: the normal user queries are aged out very quickly.

Page 15: Shared Pool

1. A lot of queries look similar, with same plan hash value, but different SQL_ID, a typical case without using bind variables.

2. The query patterns can be identified using v$db_object_cache.

Page 16: Shared Pool

ORA-04301 When Shared Pool Has Large Free Memory

• Oracle divides shared pool into sub pools, depending on the CPU count (one sub pool for each 4 CPUs, up to 7) and available memory. The purpose is to have more shared pool latches to reduce latch contentions.

• The related hidden parameter is _kghdsidx_count.• The memory usages might not be balanced among sub pools.• The example here (PIEDB) which has 4 subpool. Index 0 is common or

unallocated memory. Once its free memory reaches zero, it is possible ORA-04301 error will happen, when one subpool (for example 2) has very low free memory, even the other subpool (4 here) has plenty of free memory.

• The work around is to reduce subpool count, even to a single one.