An Unlucky Query

An Unlucky Query

Impacts of Full Table Refresh on Queries Requring FTS

Query Environment

• DB: taxext (sp2-taoextdb)• Table: TAO.DIM_TAO_ADVERTISER, list partitioned

by column IS_CURRENT. Totally two partitions.• Table stats (as of 2012/03/25, old, but very close):

rows: 2,118,910, blocks: 311,297. More than 90% of data is from partition P1 (IS_CURRENT=1)

• Parameters: db_block_size=8192, db_file_multiblock_count=32

The Query

select buyer_line_crt_id, to_char(surrogate_key) surrogate_key, null as dummy from TAO.DIM_TAO_ADVERTISER where is_current= 1

The Plan----------------------------------------------------------------------------------------

--------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time

| Pstart| Pstop |----------------------------------------------------------------------------------------

--------------------| 0 | SELECT STATEMENT | | | | 58381 (100)|

| | || 1 | PARTITION LIST SINGLE| | 2062K| 45M| 58381 (1)|

00:11:41 | KEY | KEY || 2 | TABLE ACCESS FULL | DIM_TAO_ADVERTISER | 2062K| 45M| 58381 (1)|

00:11:41 | 1 | 1 |---------------------------------------------------------------------------

---------------------------------

Query Stats From AWR

DATE TIME (SEC) DISK_READS

08/07 24,276 3,555,310

08/09 25,225 3,799,893

08/11 35,056 3,734,663

08/13 38,365 4,072,043

08/14 237 301,607

Questions

• Why the elapse times and disk_reads are so different when the query plan stays the same, when compared to one run on 08/14?

• Why did the query read more than 10 times of data as the table has? Note the query has only a single table scan.

ASH Wait Events

Real time tracking, top ASH count (about 2.5 hours)

db file sequential read 9578 CPU 57 gc cr disk read 41

direct path read 25 Why are there so many “db file sequential read”?

Shouldn’t FTS uses “direct path reads” or “db

file scattered read”?

What is the Query Reading With “db file sequential read”?

v$session snapshot

row_wait_obj#: 0

p1: 3 (file#)

1. There is no object with object_id as 0. Usually it means some system related data.

2. From dba_data_files, file_id=3 is related to tablespace UNDOTBS1

3. ASH from AWR for the query shows “db file sequential read” with current_obj#=0 is the top 1 events, and far more than other events.

Why did the query need so many UNDO reads?

• UNDO is used by Oracle for consistent reads: the query only reads the committed data just before the query starts. If any data block has newer SCN, Oracle will lookup UODO records to reconstruct the data back to the value just before the query starts (not past image or original block). If Oracle cannot find appropriate UNDO for this purpose, it will raise ORA-01555 error.

• Note for this query for almost each UNDO records applied, there is a physical block read. Here is a snapshot from v$sesstat.

• The “physical reads direct” is the actual read from disk for the table data itself.

Statistics Value Unit

data blocks consistent reads - undo records applied

4,643,835 Records

physical reads 3,605,637 Blocks

physical reads direct 281,237 blocks

Data Block and UNDO Block

Header And Summary

ITL List

Space Summary

Row Directory

Free Spaces

Row Heap

Control Section

Record Directory

Free Spaces

Rrecord Heap

Data Block UNDO Block

ITL (Interested Transaction List)

Column Description

Itl The array index for the list.

Xid The transaction id of a recent transaction that has modified this block. (undo segment).(undo slot).(undo seq number)

Uba Undo record address. (Absolute block address).(block sequence number).(record within block)

Flag Transaction state.

Lck Number of rows locked by this transaction in this block.

Scn/Fsc Committed SCN or the number of bytes of free space that would be available if this transaction committed.

How UNDO is Used With Consistent Read?

• Given a DBA (data block address), Oracle reads the data from disk or buffer cache

• If the block has SCN newer than the query requires, Oracle will check the ITL list of the block.

• Oracle clones the block in memory, using the content of the first ITL, to find an UNDO block (may be in memory or disk), and apply its content (the related ITL entry will be removed). The procedure will be repeated until the SCN satisfies the query requirement.

• It is possible that Oracle has to read multiple UNDO blocks from disk to construct one consistent read (CR) block.

A possible Scenario For the Unlucky Query

• Oracle reads a set of blocks with direct path read (for example, 32)

• Inside each block, approximately there is an different ITL entry for each row, referring to a different UNDO block (note the data is updated by rows, not blocks. So the rows inside a block could be updated at different time, by the same or different transactions).

• At the end, for each block, the number of UNDO blocks to be read by Oracle is at least close to the row count. If the same row is updated multiple times when the query reads it, Oracle then has to read a chain of UNDO blocks for the purpose to get original data for one row.

• The reads of UNDO block from disk are single block reads. They are inefficient because of too many trips.

Data From AWRDisk_reads Buffer_gets Direct_writes Rows Blocks/row Snap Start

time

560,142 1,026,784 0 807,001 0.69 08/13 19:00:00

553,892 933,813 0 465,000 1.19 08/13 20:00:00

494,847 661,918 0 210,000 2.36 08/13 21:00:00

484,243 661,642 0 150,000 3.22 08/13 22:00:00

480,869 653,183 0 117,000 4.11 08/13 23:00:00

100,370 244,330 0 36,000 2.79 08/14 00:00:00

223,669 301,585 0 45,000 4.97 08/14 01:00:00

334,939 453,387 0 63,000 5.31 08/14 02:00:00

449,861 594,926 0 69,000 6.52 08/14 03:00:00

489,225 639,530 0 66,000 7.41 08/14 04:00:00

308,843 392,100 0 37,677 8.2 08/14 05:00:00

Here is the snap by snap summary from AWR dba_hist_sqlstat for one execution. Except for one snap (08/14 00:00:00), to read one row from the table, the number of UNDO blocks to read from disk grows almost linear with one additional block every hour.

The Root Cause• Table TAO.DIM_TAO_ADVERTISER is refreshed each hour.• The refresh is not a simple update with a small number of records

added or updated.• The refresh almost updates every record for the concerned partition,

using MERGE.• The update order is not based on block, but on rows. So the rows of

the same block will end with different UNDO blocks.• The concerned query most likely started near the end of one round

of table refresh.• The longer the unlucky query runs, the more UNDO blocks it needs

to read to recover one row, because more and more changes have applied on a single row.

• The exception inside the table of last page was caused by the fact that the refresh job self run more than one hour.

Work Around

• Since the concerned query can complete within 4 minutes, here is the work around

• Aggressive one: lock the table with wait before run the query, and release it after done

• Conservative one: lock the table with wait, and release it and run the query immediately. The table refresh query will take a while for join operation, the first row update will be far after 4 minutes.

Documents

An Unlucky Query