Nested loop join technique - part2

Table of Contents

Background..................................................................................................................................................2

Test Recipes.................................................................................................................................................3

Test Cases and Results.................................................................................................................................4

It’s Number Time..........................................................................................................................................5

Conclusion....................................................................................................................................................6

References...................................................................................................................................................7

Nested Loop Join Technique – Part 2 (What’s the new thing in 11g?)

Background

Oracle introduces some improvements in 11g to optimize Nested Loop Join (NLJ). Apart from that, new join technique also has been introduced (known as Table Batching). The impact of this technique is as good as table prefetching (introduced in 9i) that we have seen in part 1 of this series. To be precise, this new technique will work more efficient in sorted non-unique index (in unique index, we won’t see much different between all those 3 techniques). In most of the cases, we won’t see again the classic NLJ (in the execution plan) in 11g until unless we specify it purposefully using SQL hint. In this exercise, we will try to see the improvements which have been done by Oracle in 11g and also see what is the different of batching and prefetching technique.

Before we move forward, let’s see the different of the execution plan diagram between those 3 techniques (classic, prefetching and batching) and how we instruct Oracle to use it (details are as below). In batching technique, Oracle creates 2 NLJs. The first NLJ (the inner one) is for joining outer (driving) table and available index of inner (or if I am allowed to call it: driven) table. The second NLJ is for joining above (previous) result with inner table. What make me confuse is that the second NLJ doesn’t have information of cost, rows, etc. It looks like the second NLJ is created for clarity reason (to make it more readable, compare to prefetching – even though the internal mechanism is different, we will see the details in attached XLS file).

Classic Technique

Prefetching Technique

Batching Technique

Test Recipes

As a starting point, 5 tables will be created with 10,000 rows each and exactly 10 rows per block, using “MINIMIZE RECORDS_PER_BLOCK” command. The purpose is to get a good figure of the number. In addition to those tables, 4 indexes will be created in the 4 inner tables (except DRIVER). The index itself will be having BLEVEL=2 (I have to use PCTFREE=99 to force it), so the index height is 3 (ROOT BRANCH LEAF). These steps are exactly the same steps that I had followed during 10g testing.

1. DRIVER, driving (outer) table.

2. T_UNIQ_SORTED, inner table with unique index on ID column and sorted data.3. T_UNIQ_UNSORTED, inner table with unique index on ID column and scattered data/

random ordered.4. T_NON_UNIQ_SORTED, inner table with non-unique index on ID column and sorted data.5. T_NON_UNIQ_UNSORTED, inner table with non-unique index on ID column and scattered

data.

Test Cases and Results

To be able to make “fair-enough” comparison, I am following these steps in this exercise. The idea is to put as much as block in the buffer to minimize or remove physical IO completely.

1. Flush buffer_cache2. Warm up the buffer by:

a. Select all data from outer table, DRIVER (full table scan)b. Scan inner table using index access (full index scan)

3. Begin snapper process from separate session4. Execute each test case and turn on event 10046 to trace SQL wait event and event 10200 to

dump consistent gets activity.

5. End snapper process

It’s Number Time

Below table give us enough information to see that there is only small different between 3 techniques in 11g. Oracle makes an optimization in the code level which impact in all 3 techniques. It is not the same for 10g case, where we can see the different in few statistics. Let’s have a quick look on below session statistics:

1. The result for all 3 techniques in 11g is equal or very close to the result of prefetching in 10g.2. “consistent gets” is reduced from 42,000 (10g classic method) to 34,000 (10g prefetching

and all techniques in 11g). The same thing also happened for “cache buffers chains”.3. Even though the result of “consistent gets” related statistics are similar each other, we can

see that “sql execute elapsed time” is varying for 11g. Batching technique is the fastest one while classic NLJ is the slowest (it is slower compare to 10g as well)

4. In 11g, the result of “buffer is pinned count” is varying and prefetching technique is able to pin more buffer compare to the other 2 techniques.

5. The new “consistent gets from cache (fastpath)” in 11g has relation with system parameter “_fastpin_enable” (default to 1 in 11g). This parameter control how Oracle handles repeated access to particular buffer for optimization.

I ran all these exercise once so I might miss something here. In this case, you always have a chance to rerun all these exercises and share it with me

Apart from that optimization, I am going to highlight one more statistic which is also impacting total consistent get. That statistic is “SQL*Net roundtrips to/from client”. During 10g and 11g test, this statistic always gives similar result. If we look further, we will see the same number for consistent get for

ROOT index. The number (668) is close to the result of “SQL*Net roundtrips to/from client”. Below are the details.

So, from where the 668 is coming? It has relation with array size in sqlplus. During the test, I use default

array size, which is 15 in my test environment Since I have 10,000 records in my table, and the size of array is 15, so Oracle has to send the result-set in:

ceil(10,000 / 15) = 667 times

But have 1 extra in the result. Don’t worry, it is common in Oracle world that sometimes there is “plus one, +1” in the calculation (for example, see “_table_scan_cost_plus_one” parameter), so 667 + 1 = 668. When I reran the test with array size of 100, it gave me below result.

Simple calculation: ceil(10,000 / 100) + 1 = 100 + 1 = 101. So, we need to consider also the size of array or fetch size (when we have bulk operation) since it has impact on the number of consistent get.

Conclusion

1. Table prefetching brings a significant improvement for non-unique index in nested loop join.

2. Oracle has done some improvement in 11g and also has introduced new technique for NLJ, this makes huge differences compare to 10g.

3. In 11g, Oracle introduces new statistic, “consistent gets from cache (fastpath)” which has relation with system parameter “_fastpin_enable” (default to 1 in 11g). This parameter control how Oracle handles repeated access to particular buffer for optimization. But the impact of this optimization will be more efficient only for sorted data. For scattered data, Oracle won’t be able to optimize much, because consecutive number might be stored in different data block. So index with low clustering_factor will get benefit from this optimization.

4. Consider array size or fetch size since it has impact on the consistent get.

References

http://hoopercharles.wordpress.com/2011/01/24/watching-consistent-gets-10200-trace-file-parser/

http://dioncho.wordpress.com/2010/08/16/batching-nlj-optimization-and-ordering/ http://blog.tanelpoder.com/2013/02/18/manual-before-and-after-snapshot-support-in-

snapper-v4/

-heri-

Technology

Nested loop join technique - part2