Index Aware

Raising Index Awareness

Date Version Notes20th Jan 2005 Listed full details all ERs and bugs that have been raised as a result of this

document’s creationRemoved some of the TODO (outstanding items) but left some there for later checking and to make sure readers knows there is some ambiguity

19th Jan 2005 Corrected note on bottom of page 29 to clarify when GROUP BY clause is added

25th Oct 2004 First version released

Permission to use this document is authorized, provided that A) the use of the document is for informational and non-commercial purposes, and B) the document retain the copyright notice contained in this disclaimer. Business Objects

may modify this document at any time and without notice. THIS DOCUMENT IS DELIVERED “AS IS” AND WITHOUT WARRANTY OF ANY KIND INCLUDING, BUT NOT LIMITED TO, ALL IMPLIED WARRANTIES AND CONDITIONS OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL BUSINESS OBJECTS, ITS SUPPLIERS OR OTHER THIRD PARTIES MENTIONED HEREIN BE LIABLE FOR

ANY DAMAGES ARISING FROM THE USE OF THIS DOCUMENT INCLUDING, BUT NOT LIMITED TO, SPECIAL, INDIRECT, PUNATIVE OR CONSEQUENTIAL DAMAGES.

The information in this document is subject to change without notice. If you find any problems with this article, please report them using the feedback link on this site or at [email protected]. Business Objects does not warrant

that this document is error free.

Copyright © Business Objects 2000. All rights reserved.

Portions © Copyright 1996, Microsoft Corporation. All rights reserved.

Trademarks:

The Business Objects logo, BusinessMiner, BusinessQuery, and WebIntelligence are registered trademarks of Business Objects S.A.

The Business Objects tagline, Broadcast Agent, BusinessObjects, Personal Trainer, Rapid Deployment Templates, and Set Analyzer are trademarks of Business Objects S.A.

Microsoft, Windows, Windows NT, Access, Microsoft VBA, the Visual Basic Logo and other names of Microsoft products referenced herein are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or

other countries.

Oracle is a registered trademark of Oracle Corporation. All other names of Oracle products referenced herein are trademarks or registered trademarks of Oracle Corporation.

All other product and company names mentioned herein are the trademarks of their respective owners.

This software and documentation is commercial computer software under Federal Acquisition regulations, and is provided only under the Restricted Rights of the Federal Acquisition Regulations applicable to commercial computer

software provided at private expense. The use, duplication, or disclosure by the U.S. Government is subject to restrictions set forth in subdivision (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at 252.227-

7013.

U.S. Patent No. 5,555,403

Steven White Raising Index Awareness Page 2 of 71

Table of Contents

Introduction...................................................................................4

Database performance concepts.....................................5

Examples in this document.............................................7

Example 1 – Use of a table’s primary key........................8

Example 2 – Avoiding joins or tables.............................14

Example 3 – Multiple Foreign Key entries......................19

Example 4 – Unique and non-unique values..................20

Example 5 – Using a Primary Key object in RESULTS pane29

Example 6 – Using an Index Awareness WHERE clause. . .32

Example 7 – Multi-column primary keys.........................35

Example 8 –Affect of Row Based Restrictions.................37

KEYS dialog explained..................................................39

SQL editor dialog.........................................................45

Limitations and rules....................................................46

Appendix A – Bugs and Enhancement Requests.............48

Appendix B – Accessing Index Aware information..........49

Appendix C – Outstanding Items.................................................51


IntroductionThis document explores the new feature available in Designer 6.5.1 called Index Awareness. This feature can help make SQL queries faster by redirecting WHERE clause restrictions to columns that are indexed and/or to generate a query that references less tables.

Who should read this?Business Objects universe and report designers who wish to improve the performance on their reports. Database schema designers and DBAs who support Business Objects developers would also benefit from understanding the concepts in this document.

Environments used for this article Windows 2000 SP3 Business Objects 6.5.1 GA (no Hotfixes or CSPs were installed) Microsoft Access 2002 SP2 (for most db interactions) SQL*Server 2000 (for some database interactions)

Pre-requisitesThe reader should be very familiar with Business Objects universe and report building concepts. A familiarization with RDBMS and database performance tuning would also be advantageous.

Disclaimer The functionality and features detailed herein are based on the author’s understanding of the software version stated earlier. Every attempt has been made to create an accurate document by testing workflows and confirming understanding with corporate development, but the reader assumes all risk in the using the information contained herein.

Resources neededThe following documents or files will be needed to perform the task:

Item description Filename Universe for reporting on security and universe domains of the repository Combined.unvDemo Islands Resort Marketing MS Access database that is delivered with Business Objects 5.x and 6.x Club.mdb

Other universe performance techniquesIndex Awareness is only one of the many techniques available in universe design to improve the performance of queries. Please check out the following in the relevant documentation and Knowledge Base articles:

Aggregate Awarenes Short cut joins


Database performance conceptsThis document does not intend to explain database performance tuning concepts in any depth but terminology used in later sections of this documents should be explained to avoid any confusion.

Database index Primary Key Foreign Key Surrogate keys Partitions Referential Integrity Influencing the database optimizer Tuning philosophy

Database IndexA database construct which allows faster access to rows in a table. Indexes are defined by the database designer and maintained automatically by the RDBMS. Since indexes are maintained in a sorted sequence, the database can quickly locate a row or rows in the table.

Primary KeyA database designer defined column or columns which uniquely identify a row in a table. If the RDBMS is told about the Primary Key during database design, it will automatically maintain an index to help it guarantee uniqueness and provide referential integrity.

Foreign KeyA database designer defined column or columns which match the Primary Key of another table. This facilitates referential integrity and provides likely columns for joins to a table that contains the Primary Key or other tables with Foreign Key instances.

Surrogate keysA common Data Warehouse type of column where the actual Primary Key and Foreign Key values are referenced via a smaller, faster column type i.e. a number instead of a longer character type

This allows joins and indexes to be faster because indexes are smaller in size (same rows, less width) and therefore less memory and/or disk space is consumed during their navigation.

PartitionsA way of dividing the physical layout of a table across separate disk locations to avoid contention when accessing the table via the disk system. Usually the partitioning is based on a commonly accessed dimension such as time e.g. break a fact table into separate partitions based on the year\month of the transactions.

Referential IntegrityThis is not really a performance tuning term but it is related because this RDBMS feature requires Foreign Key and Primary Keys to exist.

Influencing the database optimizerThe optimizer, if it exists at all in your database, follows a set of internal rules which help it govern which path would result in the fastest query execution. It uses the database constructs detailed earlier and knowledge of the tables and query to makes its decision.

There is no guarantee that the database optimizer will use any specific database performance improving construct just because the user has provided the opportunity for it to do so e.g. just because a index could be used does not mean it will be used. The database optimizer may decide that other rules, limits and conditions apply and it will choose a different route to the data.

Although Index Awareness will help the optimizer make a better decision, it is no guarantee that a Primary Key or Foreign Key entry (see later in document) will cause the index to be used.


As with any performance tuning exercise, monitoring of the RDBMS path to the data (sometimes called the ‘explain plan’) is vital. The DBA must also keep RDBMS statistics up to date e.g. how large are the tables? What is the cardinality of the values?

Tuning philosophyAlthough no one will argue that using the Primary Key of a table is a good thing there seems to be 2 opinions on whether tuning should avoid dimension tables or use them explicitly

Avoid dimension tablesThis method suggests that all conditions should be directed to the ‘weightiest table’ i.e. largest which is usually the fact table in database. This helps reduce the number of tables and therefore joins in a query because joins are considered very database resource intensive.

Once the restrictions have been processed by the fact table, the database will join out to the smaller dimension tables.

If this is your intention that you should setup your Index Awareness with Foreign Key entries that will reduce the number of tables when the weightiest tables are involved.

Pro : less joins and tablesCon : possibly more rows in the join between fact and dimension tables

Use dimension tablesAlternatively this methods suggests that aim of the optimizer is to reduce the number of rows that join into the fact table as a priority over the number of tables and joins.

Leaving conditions on the dimension tables causes the database to restrict the smaller dimension tables first and then join into the fact table with less rows.

Pro : less rows in the join to the larger fact tableCon : more tables and joins in the query

If this is your intention that you should setup your Index Awareness with less Foreign Key entries that directly point to the larger tables in the schema

Getting more information on tuningAsk your DBA (database administrator) for more information on these concepts.


Examples in this documentMost of the examples in this document use a variation of the ‘club.mdb’ (aka Islands Resort Marketing) and ‘beach.unv’ the latter has been which has been renamed ‘IndxAwre.unv’

The performance improvements discussed in this document are theoretical. Because the ‘club.mdb’ is so small and Microsoft Access is not considered a production capable database, it is unlikely you will notice any performance improvement in running the small queries that ‘club.mdb’ is capable of.

Any LOV that are shown in this document that involve editing the LOV (DESIGNER\OBJECT PROPERTIES\PROPERTIES\EDIT) have sorts applied to the objects to allow the hierarchical view of the LOV dialog to display values without duplicates. This step is discussed in another document by the same author.

Note : MS Access may have a ‘explain plan’ feature (mentioned earlier) but this was not investigated by the author

Schema and its indexesThe following schema (club.mdb) will be used for the examples:


club.mdb

Example 1 – Use of a table’s primary key

Create a query that returns Customer and Revenue

The SQL generated will be:SELECT Customer.last_name, sum(Invoice_Line.days * Invoice_Line.nb_guests * Service.price)FROM Customer, Invoice_Line, Service, SalesWHERE ( Customer.cust_id=Sales.cust_id ) AND ( Sales.inv_id=Invoice_Line.inv_id ) AND ( Invoice_Line.service_id=Service.service_id )GROUP BY Customer.last_name

Now add a condition that restricts it to return only customers from Dallas, Houston, Los Angeles, San Diego and San Francisco.



The SQL will be:SELECT Customer.last_name, sum(Invoice_Line.days * Invoice_Line.nb_guests * Service.price)FROM Customer, Invoice_Line, Service,

City, SalesWHERE ( City.city_id=Customer.city_id ) AND ( Customer.cust_id=Sales.cust_id ) AND ( Sales.inv_id=Invoice_Line.inv_id ) AND ( Invoice_Line.service_id=Service.service_id ) AND (

City.city IN ('Dallas', 'Houston', 'Los Angeles', 'San Diego', 'San Francisco') )GROUP BY Customer.last_name

Everything is as it should be.

But what if we knew that the ‘city’ column is not indexed and therefore is not a good candidate for WHERE clause restrictions? (see schema description earlier)

We also know that column ‘city_id’ is the primary key of the city table but more so, it is indexed. From our discussions on good performance earlier we determined that using the primary key of a table is one of the fastest ways of retrieving the rows of that table.

So if we could get the SQL to automatically use the Primary Key of the city table, we could achieve a faster retrieval of the records.

Note : Of course we could change the object’s SELECT definition to point to that Primary Key column (city_id) but that means we expect users to know what those Primary Key values mean. Since Primary Key and Foreign Key values are typically based on surrogate keys, and surrogate keys are automatically allocated by the ETL or RDBMS itself, an end user will not normally know what Primary Key or Foreign Key value relates to a real live value that they understand


Setting up the Index Awareness1. Open Designer2. Open the relevant universe e.g. IndxAwre.UNV3. Locate the City object (in the Customer class)4. Right click on the City object and choose ‘Object Properties’ 5. Select KEYS tab6. Press INSERT7. In SELECT column, select the ‘…’ (3 dots) to open the ‘Edit SELECT statement’ dialog8. Select or type the SQL ‘City.city_id’ note : more on this dialog later in the document 9. OK


Now let’s return to the query we had before, edit data provider and then view SQL and we discover that the SQL has changed:

SELECT Customer.last_name, sum(Invoice_Line.days * Invoice_Line.nb_guests * Service.price)FROM Customer, Invoice_Line, Service, City, SalesWHERE ( City.city_id=Customer.city_id ) AND ( Customer.cust_id=Sales.cust_id ) AND ( Sales.inv_id=Invoice_Line.inv_id ) AND ( Invoice_Line.service_id=Service.service_id ) AND (

City.city_id IN (11, 10, 13, 14, 12) )GROUP BY Customer.last_name

Notice that the ‘city.city’ column reference has been replaced by the ‘city.city_id’ column and that the city names have been replaced with the primary key values that represent their name.

This query will likely run much faster because the main condition is now on an indexed column.

Note : if the SQL does not show any change this could be a sporadic bug noticed by the author where the SQL would only change when the LOV dialog was visited and OK’d whether any changes were made to the LOV or not.

What happened behind the scenes?When we told the City object about its Primary Key, we offered Business Objects an alternative piece of SQL to use in any WHERE clause should that object be involved in a condition. It would have used ‘City.city’ but with Index Awareness it used ‘City.city_id’ instead.

It determined this from the KEYS entries you made.

But how did Business Objects convert the City names you selected in the LOV into the primary key values necessary for the SQL? Remember that the City table has the following values (that are relevant to our example) :

City tableCity_id (Primary Key) City Region11 Dallas 2010 Houston 2013 Los Angeles 2114 San Diego 2112 San Francisco 21

It did this by adding the Primary Key SELECT to the query it normally generates for the LOV and when you selected the City names, Business Objects matched the City names to the Primary Key values then.


The List of Values SQL

The SQL generated for a non-Index Aware LOV involving City would be:SELECT DISTINCT City.city FROM City

But the SQL generated for the LOV on the Index Aware City object was:SELECT DISTINCT City.city, City.city_id FROM City

Notice that the ‘City.city_id’ column has been added.


This SQL is visible by either editing the LOV for the City object and/or tracing SQL that passes thru the middleware to the database.

So when you selected the City names from the LOV dialog, you were indirectly selecting the Primary Key values as well. Think of the Primary Key values as a hidden column in the LOV dialog.

SummaryIndex Awareness allowed us to automatically redirect a WHERE clause condition to another column (on the same table for this example) that we know would provide better performance at query time.

We determined which column to choose as an alternative based on our knowledge of the database schema and the RDBMS optimizer.

The LOV values we select actually tell Business Objects what Primary Key values to substitute in final query SQL. The KEYS tab tells Business Objects which SQL syntax to substitute in the final query SQL.

Rule – when an Index Aware object is used in the CONDITIONS pane, the Primary Key entry will replace the object’s SELECT

Rule – the operand ‘Show list of values’ dialog returns the Primary Key values that match the visible values in the LOV dialog

Example 2 – Avoiding joins or tables


City_id can come from more than one table

Another use of Index Awareness is to reduce the number of tables or joins involved in a query. This reduces the number of joins in a query which can improve query performance. From previous sections we discussed that joins are very expensive in performance terms for the RDBMS, so any possibility at avoiding them should be sought. The SQL from the last example improved performance by restricting on the Primary Key (indexed) of the city table. SELECT Customer.last_name, sum(Invoice_Line.days * Invoice_Line.nb_guests * Service.price)FROM Invoice_Line, Sales, City, Customer, ServiceWHERE ( City.city_id=Customer.city_id ) AND ( Customer.cust_id=Sales.cust_id ) AND ( Sales.inv_id=Invoice_Line.inv_id ) AND ( Invoice_Line.service_id=Service.service_id ) AND ( City.city_id IN (11, 10, 13, 14, 12) )GROUP BY Customer.last_name

This query has the Customer and City tables in it which both can provide the City_id column.

The City table is only needed to satisfy the WHERE clause and is not needed in the SELECT or GROUP BY clauses.

Is it possible to remove the City table from the query completely? i.e. tell Business Objects to use the Customer table to get City_id from if it can

1. Locate the City object (in the Customer class)

2. Right click on the City object and choose ‘Object Properties’


3. Select KEYS tab

4. Skip to step 9 if the Primary Key from the previous example is still in place

5. Press INSERT

6. In the SELECT column, select the ‘…’ (3 dots) to open the ‘Edit SELECT statement’ dialog

7. Select or type the SQL ‘City.city_id’

8. OK

9. Start here if the Primary Key of ‘City.City_id’ is still in place from the previous example

10. Press INSERT

11. In the Key Type column, choose ‘Foreign Key’ if not already chosen

12. In the SELECT column, click on the ‘…’ (3 dots) to open the ‘Edit SELECT statement’ dialog

13. Select or type the SQL ‘Customer.city_id’

14. OK the dialog


Let’s return to the query panel and see what changes have taken place in the SQL:

SELECT Customer.last_name, sum(Invoice_Line.days * Invoice_Line.nb_guests * Service.price)FROM Invoice_Line, Sales, Customer, ServiceWHERE ( Customer.cust_id=Sales.cust_id ) AND ( Sales.inv_id=Invoice_Line.inv_id ) AND ( Invoice_Line.service_id=Service.service_id ) AND (

Customer.city_id IN (11, 10, 13, 14, 12) )GROUP BY Customer.last_name

Note that the City table is no longer referenced in the query and that the City_id is being restricted using the Customer table instead.

What affect did this have on our LOV?None

When we check the SQL of the LOV for the City object we see it remains as it was in example 1 i.e. it has the Primary Key SELECT added to it.

So we can conclude that the LOV SQL is only affected by the Primary Key entry and that the LOV selection will always return the Primary Key entries ‘behind the scenes’. Therefore all columns you provide in Foreign Key entries must have the same type of values and column type as the column referenced by the Primary Key entry.


Will the Foreign Key entry always be applied?No

If the Primary Key table is still needed in the query’s RESULTS, SORTS or CONDITIONS which don’t involve Index Aware on the same table, then the table will remain in the query and the Foreign Key entry will not be applied

But if the Primary Key table (the original object’s table) is not needed anywhere but the WHERE clause, then the Foreign Key entry will be applied:

Rule – a Foreign Key entry will be ignored if it does not result in less tables being used. Business Objects can only use less tables if the Primary Key table is referenced only inside the WHERE clause.


City object used in the RESULTS causing the Foreign Key entry not to be used i.e. City table remains in the FROM clause and is used in the WHERE clause (restricting on its Primary Key)

Example of Foreign Key being applied because Primary Key table not being needed anywhere else in the query

Example 3 – Multiple Foreign Key entriesThe KEYS dialog allows more than one Foreign Key entry which allows us to deal with values which are repeated (denormalized in DBA speak) in multiple tables.

Database designers may denormalize Primary Key values beyond whats necessary to satisfy constraints to assist in performance and simplification of SQL generation.

It is expected to see an entity repeated twice. Initially in the entity own table as its Primary Key and then in a table that refers to this Primary Key thru its Foreign Key.

Denormalization implies that the database designer has gone beyond this duplication and has more than 2 instances of the value in a schema.

If we search the ‘club.mdb’ (Islands Resorts Marketing) database to find such entities, we will find only one i.e. sales person

Tables listSponsor.sales_idSales_Person.sales_idCustomer.sales_id

That gives us three places that we can get sales_id from.

So if we setup the KEYS for the ‘Sales Person’ object (in Sales class) properly we can give Business Objects 3 choices from which it can retrieve these values for the purposes of restricting data in the WHERE clause (and hopefully speeding up the query).


KEYS entries for Sales Person object

The Sponsor table is used to restrict on Sales Person (sales_id) because it is one of the Foreign Key entries and the Sales_Person table is not used anywhere else in the query so can be dropped by Business Objects.

But why does Business Objects use the Sponsor table and not the Customer table? Both tables exist in the FROM clause? Perhaps it’s the sequence of Foreign Key entries in the KEYS dialog? If we rearrange them so that they are listed as follows:

And check the query SQL we see:

The SQL has changed again, this time using the Customer table to restrict Sales Person via sales_id.

Rule – the sequence of Foreign Key entries in the KEYS dialog determines which entry is given preference i.e. the last enabled entry in the list that results in the least number of tables.

Rule - If the ‘best’ Foreign Key entry does not result in a table count reduction compared to the original query, the Primary Key entry will be the only one that applies


Customer, Sponsor and Revenue object with Sales Person condition

Example 4 – Unique and non-unique valuesIn the previous examples all the values that were restricted were unique e.g. Sales Person and City. But what happens when we use a value that is not unique e.g. Service

First setup the Primary Key entry:

And the build an example query (using Customer and Revenue as RESULTS and Service IN LIST <list of values):

Why does the LOV has duplicates? Normally LOV have a DISTINCT on them and no duplicates are allowed as shown in the screenshot on the right:


Let’s look at the LOV SQL and note that as expected (as we discovered in example 1) the Primary Key entry has been added to the SELECT of the LOV SQL.

When we look at the service table, we see the duplicates there as well e.g. Hotel Room is repeated for SL_ID = 21, 31 and 41. SL_ID refers to Service Line.


This tell us that Service is not completely unique in the Service table. In fact it is only unique within a Service Line (SL_ID).

So how can we help a user navigate to the correct value in a LOV when there are duplicates? Easy, we use normal customized LOV in Designer.

This results in a LOV that looks like this:


EDIT the LOV, add the Service Line object, apply a primary ascending sort to the Service Line and a secondary ascending sort to the Service object note : the sorts are only necessary if the LOV is to be viewed in Hierarchical View

As expected the Primary Key value does not show in the LOV dialog but note that the Hierarchical View and the Tabular View show a different number of values. The Tabular View lists the values that are returned from the LOV query, excluding the Primary Key entry ofcourse. The Hierarchical View applies a further ‘distinct’ on the display values before it allows the user to navigate the LOV hierarchy. This means that hierarchy view ignores the impact of the hidden Primary Key value in the LOV.

If we use this LOV as it is with the Primary Key entry, we’ll see the following SQL generated (based on original query RESULTS=Customer, Revenue CONDITIONS=Service IN LIST <values>):

Although the duplication is annoying, selecting all the Accomodation values while in Tabular View (in this example) does result in the correct SQL and therefor results.


The Primary Key values from the Service table has been selected (behind the scenes) and put into the SQL. These values represent all the ‘Accommodation’ Service Line values. This is as expected.

Service Line table

Let’s see what happens when we use the Hierarchical View of the LOV dialog.

Clearly it would be unacceptable to users that depending on whether they used Tabular View or Hierarchical View they got the correct results or not.

Note : Even before Index Aware, the Hierarchical View would always apply a further DISTINCT on the display values. The problem only arises now because using Index Aware we are not actually limting the query by the values we select in the dialog but by the Primary Key values indirectly being selected. See a separate article by the author on the LOV dialog

So how do we make this LOV work correctly every time?

We look at the data and see that the Service name is not really unique in its own table. That table is actually a Service within Service Line table. So we look at Service Line table and discover the Service Line name is not unique without being qualified by which Resort the Service Line is available at.

We could look beyond Resort because it is shown with its country but the Resort name is acutally unique across all countries. Although we determined this by looking at the data, we really should confirm it by talking to the schema designer as well. Since this may be test data which not reflect all the true relationships in the lifetime of the database


This time only 3 Primary Key values have been selected e.g. 212, 211 and 213 which represent Bungalow, Hotel Room and Hotel Suite where SL_ID=21!! This means using the Hierarchical View compared to the Tabular View would give different results.

So we need to determine what values should be displayed in the LOV to help guarantee that the unique Primary Key values the user indirectly selects from the hidden column represent the values that the user can see on screen.

We do this by further customizing the LOV for the Service object to the point that the Service name is fully qualified by all the values we think are necessary i.e. Service Line and Resort


Tabular View – selecting the Bungalow, Hotel Room and Hote Suite values for Bahamas Beach and French Riveria generates the correct SQL much like before.

1

But note what happens you when you return the LOV to review your selections i.e. only the Bahamas Beach values are highlighted and if you OK the dialog only the Primary Key values representing those values are put in the SQL.

Rule - The LOV dialog matches the values on the first match it can find and does not take into consideration that previous Primary Key values were remembered.


With Hierarchical View things are no different. Let’s see the affect of Hierarchical View on the LOV and SQL now that we’ve customized the LOV to allow the user to navigate correctly to the values.

But when we return to the LOV dialog:

The same thing happens.

Clearly the problem is that the LOV dialog matches on the first occurrence of the values selected and does not take the Primary Key value into account.

This limits our use of the Primary Key feature because a. depending on Tabular View or Hierarchical View we could get different resultsb. we can navigate to the correct values select them and then use them but the next time we return to the

LOV dialog, it will remove duplicates and only select the first instance of our value selected

Rule – use of Primary Key entries in LOV can only be used with accuracy when the value is unique e.g. customer social security number, product code.


Hierarchical View gives the correct SQL

Example 5 – Using a Primary Key object in RESULTS pane

So far in our examples we’ve placed Index Aware objects in the CONDITIONS pane of the qeury. This is the main use of Index Awareness but what happens when those same objects are used in the RESULTS pane?

Let’s build a query with Customer and Revenue in the RESULTS pane.

First of all we’ll see what the SQL will be without Index Aware being active:

Now, let’s set the Customer object (in the Customer class) to have a Primary Key entry of ‘Customer.cust_id’ i.e.

And the SQL is now:

Notice that the SELECT of the query has changed, a MAX function has been added to the Customer.last_name (the Customer object did not have a MAX function in its SELECT definition).Also the GROUP BY clause has been changed i.e. Customer.last_name replaced with Customer.cust_id


SQL without Index Aware being active

Note : If a query does not have a GROUP BY clause (no measures etc) then the MAX function is not applied i.e. the Index Awareness itself does NOT cause a GROUP BY to be generated, that is done by normal Business Objects SQL generation


What affect does this have on our results? None for this query. With or without Index Aware this query will return the same results. Why? Because Customer.last_name is unique across the whole data set i.e. cust_id has a 1-to-1 relationship to last_name.

Now in a real database (I apologize to the designer of the ‘club.mdb’ demo database!) would have last names that are not unique.

What happens when values are not unique?Let’s make the last name of existing record in the Customer table equal to another last name:

Then run the same query with and without Index Awareness and see what the row counts are:

20 rows are returned when Index Awareness is not applied but 21 are returned when it is applied. The difference is the result set is that Okumura last name has been aggregated at the database level when Index Aware is not enabled but when Index Awareness was enabled, both Okumura records were returned from the database and optionally aggregated locally within the report.


The row counts for the 2 refreshes are indentical. The first refresh had Index Aware disabled and the second has it enabled.

Customer with last name ‘Oneda’ was changed to ‘Okumura’

So this means that Index Awareness can affect result sets and can cause less aggregation to be performed on the database, which is exactly where we’d want aggregation to be peformed to reduce the load on Reporter (BUSOBJ.EXE).

Rule – when Index Aware objects are used in the RESULTS pane their Primary Key entries can cause additional rows to be returned from the database. Report results should remain the same if no local reporter calculations depend on the row count and level of uniqueness of the result set


The reporter feature ‘Avoid duplicate row aggregation’ is used in the screenshot below. The reporter tables below with the header ‘Dupicates Aggregated’ have this feature unchecked i.e. OFF

Example 6 – Using an Index Awareness WHERE clause

So far all our examples have ignored the WHERE clause part of the Primary Key and Foreign Key entries within the KEYS dialog.

First, let’s see what happens when you add a WHERE clause to the Primary Key entry.

The WHERE clause SQL gets added to the LOV query for that object:


And it gets added to the query that the object is used in, whether the Index Aware object is used in the RESULTS pane or the CONDITIONS pane:

So far I cannot think of a use for adding WHERE clauses to the Primary Key entries. I would assume if the original object has a WHERE clause you would want the Primary Key WHERE clause to replace that much as the Primary Key SELECT clause replaces the original SELECT clause but this means the universe developer would need to look up the Primary Key values in advance

Note : TODO CHECK THIS STATEMENT If these Primary Key values are based on surrogate keys there is no guarantee that a database reload will maintain the same surrogate key values.


What happens to the Foreign Key WHERE clause?Assuming the Foreign Key entry is applied (depending on rules discussed elsewhere) its WHERE clause is simply ANDed to the query once the Foreign Key replacement has taken place.

Based on:

We get:

Again, right now I cannot think of a use for the addition of the WHERE clause entry.

Rules:

WHERE clause entries for Primary Key and Foreign Key entries are ANDed to the query once the SELECT entry has been implemented.

WHERE clauses must reference the same tables as the WHERE clause for Primary Key entries

WHERE clauses should try to reference the same tables as the WHERE clause for Foreign Key entries to avoid introducing more complexity into the query


Example 7 – Multi-column primary keys

Where a table does not have one column which uniquely identifies the rows but instead relies on more than one column, this is a ‘multi-column primary key’

Note : this is not be to confused with ‘concatenated index’ or ‘composite index’ which is a database index that is made up of more than one column in a table.

There are no examples of this in the ‘club.mdb’ so we have to create our own schema or make believe using ‘club.mdb’. To save time let’s make believe…

Let’s pretend that the Resort table has a multi-column primary key based on Country_id and Resort_id.

Since the KEYS dialog only allows one Primary Key entry we have to make that one entry relate to our multi-column primary key. So we define a Primary Key entry as “Country_id & Resort.Resort_id” (the & is MS

Access syntax for concatenation).


The value of 61 comes from the concantenation of ‘6’ and ‘1’. ‘6’ represents the country_id for Australia and ‘1’ represents the resort_id for ‘Australian Reef’

Since this concatenation will be stated in the final query, it will likely negate any indexes that exist (typically any manipulation of values at SQL query run time negate indexes that could have been used on the columns affected by the manipulation) unless that index has been setup as a ‘function based index’ (see Database Performance section earlier for an explanation)

So although the Index Awareness feature allows SELECT clauses that span multiple columns (or even tables) it is unlikely that the database indexes will do the same. You should avoid multi-column primary key entries and/or discuss indexes with your DBA. Surrogate keys are a good way around multi-column primary keys in database design and your database designer should consider them

Rule – Index Awareness is best used with single column Primary Keys. Multi-column primary keys should be avoided and/or your DBA should consider ‘function based indexes’ or surrogate keys


Example 8 –Affect of Row Based Restrictions

Row Based Restrictions (RBR) are settings created in Supervisor that allow additional WHERE clauses to be ANDed to the final query SQL should that table be involved in the query. See Designer documentation for more information.

Let’s see what happens:


City object with Primary Key and Foreign Key entries and no RBR applied to the City table. As expected the restriction of ‘City EQUAL TO Albertville’ has been converted to a Primary Key value (25) pointing to the Customer.City_id Foreign Key

Now with a RBR active on the City table:

Rule - If an Index Aware object tables list references a table that has RBR, we cannot use any Foreign Key entries. Why? because the RBR in Supervisor will be invalid for the Foreign Key table.


The Customer.City_id is not used, instead the original Primary Key entry is used and the RBR is applied to the City table (as expected)

KEYS dialog explained

The KEYS tab of the Object Properties dialog contains the following controls: ‘type of values in PK and FK’ DETECT PARSE Key Type Select Where Enable OK Cancel Apply Help Insert Delete

Note : Buttons INSERT, DELETE, Cancel and Help are not detailed below because they are standard Designer (and Windows) buttons and their purpose should be self explanatory


Drop down - Type of values in PK and FK

This drop down lists all the standard data types that Business Objects handles i.e. Character, Date, Long Text and Number. You have to inform Business Objects which data type is the relevant for the series of Primary Key and Foreign Keys because the data type may be different from the object you are defining Index Awareness for e.g. column ‘City.city’ is a character column and the object ‘City’ is also a Character type but its Primary Key column ‘City.city_id’ is a numeric type.

All Primary Key and Foreign Key entries must be of the same value type because there is only one drop down for all entries. This makes sense because depending on the rules applied, any of the Primary Key and Foreign Key entries could be used in the final query.

If you select the wrong type and then try to PARSE then you will receive the error ‘The expression type is not compatible with the object type’ as shown below:


DETECT

Detects Primary Key and Foreign Keys if the middleware and database support their detection. Some databases do not support their detection e.g. MS Access.

Should the database be unable to supply key information, the error ‘Key is not supported’ is raised by Business Objects.

Should the Index Aware object reference no tables or more than one table, the error ‘No detected key for

objects involving several tables’ will be raised:

Note : this error description is slightly misleading because it is raised even when the object makes reference to no tables i.e. the Index Aware object SELECT and WHERE clauses result in no tables being referenced in the TABLES list

Any Primary Key detected will over-write without warning any existing Primary Key entry

Once the Primary Key has been detected then the Foreign Keys are sought. All Foreign Key entries will be added to the list if they didn’t previously exist i.e. no duplicates will be allowed


Knowing if the database allows key detectionThe PRM parameter KEY_INFO_SUPPORTED states whether the target database allows key detection. This parameter has a default value of Y (for Yes) so if it is missing from a PRM file, Business Objects will assume it can supply key info and attempt it.


PRM file parameter that states if database allows key detection

PARSESends the SELECT and WHERE SQL of each entry, enabled or not, to the target database to confirm its syntax. Errors are listed in a single dialog:

Apart from the Business Objects caught error ‘The expression type is not compatible with the object type’ all other errors will be database specific, usually relating to errors in the building of the SELECT and WHERE SQL statements.

The following limitations of the PARSE detection should be noted:

a. Primary Key SELECT that does not reference any table is not detected (but the WHERE does reference a table)

b. Does not detect the potential Cartesian Product in the final query when the Primary Key or Foreign Key SELECT or WHERE clauses are not joined to at least one of the Index Aware object’s tables list

c. Does not detect the potential ‘Incompatible combination of objects’ in the final query when the tables referenced by the Primary Key or Foreign Key SELECT or WHERE clauses are not on the same context as the Index Aware object’s tables list.

Warning – In case you’re using MS Access for testing this you should note that MS Access does not always detect syntax errors e.g. WHERE clause with a single word ‘steven’ is not detected. Expressions with words separated by a space are detected as incorrect

Key TypeThere are only 2 key types i.e. Primary Key and Foreign Key. Only one Primary Key entry can exist for a single Index Aware object. There can be zero or more Foreign Key entries but before even 1 Foreign Key entry can be used a Primary Key entry must exist.

Note : Should you try to create more than one Primary Key entry, an existing Primary Key entry will be converted automatically without warning to a Foreign Key. Its SQL will remain unchanged only its type. This could lead to incorrect SQL in the final query if the changed Primary Key entry is not reviewed by the universe developer

SelectEnter or build the SELECT SQL which will be used to:

a. Replace the Index Aware object’s original SELECT SQL should the object be used in the CONDITIONS pane of a query (subject to other rules stated elsewhere)

b. Added to the SQL of the Index Aware object’s LOV query

c. Replace the Index Aware object’s SELECT SQL in a GROUP BY clause should the Index Aware object be used in the RESULTS pane of a query

Tip – add SQL comments to the Primary Key and Foreign Key entries so that they easily identifiable in the final query SQL e.g. [ Customer.City_id /* Primary Key */ ] for Oracle, where /* comment */ is Oracle comment syntax


WhereEnter or build the WHERE SQL which will be ANDed to any query where the SELECT part is also used.

ANDing additional conditions can cause unexpected results. You cannot chose whether the condition ORed instead. It is always ANDed. Consider the effect of ANDing the conditions carefully.

Tip – add SQL comments to the Primary Key and Foreign Key entries so that they are easily identifiable in the final query SQL e.g. [ Customer.City_id IN (10, 11, 12) /* Foreign Key */ ]

EnableOnly Primary Key or Foreign Key entries that are enabled will be taken into consideration during SQL generation.

But enabled or not, the entries will be checked when the PARSE button is used.

Apply or OKApplies the changes and optionally closes the dialog.

Further checking is done to detect the following errors:

“There is at least one key without a Select statement. Each key requires at least one Select statement. You need to verify your keys” this matches the rules discussed earlier

“You defined multiple foreign keys without a primary key. Each foreign key requires at least one primary key. You need to redefine your foreign keys” this wording is a little confusing because it should not say “requires at least one SELECT” but should say “requires one SELECT” instead

These errors should be self explanatory based on our understanding of the rules determined so far.


SQL editor dialogThe SQL editor available to the management of Index Aware is the same SQL editor dialog available to predefined conditions, objects SELECT and objects WHERE clauses.

This dialog provides normal SQL constructs as well as more complex items such as:

If your target database allows them, you can add SQL comments to the Primary Key and Foreign Key entries so that they easily identifiable in the final query SQL e.g. [ Customer.city_id /* Index Awareness Foreign Key */ ] for Oracle

note : for Primary Key and Foreign Key WHERE clauses, the PARSE button within this dialog always presents the error ‘Parse failed : Invalid Definition (UNV0023)’ not matter what you enter. Ignore this error for now.

TODO – check what affect @Aggregate_Aware which also changes tables involved in a query has on Index Awareness


Limitations and rules Like any product functionality there are limits to its design. The following issues have been identified below.

Limited set of operandsIndex Awareness will only apply with the following operands:

Equal to Different From In List Not In List

Right hand side of operand must be selectedWhen the operands above are used, only value(s) selected from the ‘Show list of values’ dialog will trigger Index Awareness functionality.

Typing or entering a value will not trigger it even if the values are identical to those in the LOV

Why? Because Index Awareness uses the selection from the LOV dialog to grab the Primary Key value for each entry. If you entered your selection using ‘Type a new constant’ (for example) Index Awareness would then have to go out to the database again and locate the matching Primary Key value. (or at least interrogate the cached LOV which it currently does not do)

Also, creating a condition by using the ‘Simple Condition’ toolbar button within the query panel does not permit Index Awareness either

Only works with Full Client reportsDisappointing but we have to start somewhere!

Does not work with promptsDisappointing but expected because of the following reasons:

a. Complexity – prompt population takes place after all the tables in a query has been decided upon. Whereas Index Awareness helps build the overall query, changing tables etc. The point in the query building workflow where these things happen are entirely different.

b. Shared LOVs – if more than one prompt exists in the report and/or universe with the same name it is reduced to a single prompt display and common LOV. But what if the prompt was related to different objects with completely different LOVs and KEY definitions? This complication has not been overcome in the current version of Index Awareness

c. Prompt definition – prompts can be defined in the universe with hard coded LOV or LOVs that come from another object entirely. This complication has not been overcome in this feature.


List of rules found in examples

The following are repetition of rules discovered and explained within the examples earlier in the document. They are listed here for reference:

A Foreign Key entry will be ignored if it does not result in less tables being used. Business Objects can only use less tables if the Primary Key table is referenced only inside the WHERE clause.

The sequence of Foreign Key entries in the KEYS dialog determines which entry is given preference i.e. the last enabled entry in the list that results in the least number of tables. But if the ‘best’ Foreign Key entry does not result in a table count reduction compared to the original query, the Primary Key entry will be the only one that applies

Use of Primary Key entries in LOV can only be used with accuracy when the value is unique e.g. customer social security number, product code.

When Index Aware objects are used in the RESULTS pane their Primary Key entries can cause additional rows to be returned from the database. Report results should remain the same if no local reporter calculations depend on the row count and level of uniqueness of the result set

WHERE clause entries for Primary Key and Foreign Key entries are ANDed to the query once the SELECT entry has been implemented.

For Primary Key entries the WHERE clauses must reference the same tables as the Index Aware object’s original tables list

For Foreign Key entries the WHERE clauses should try to reference the same tables as the Index Aware original object’s tables list avoid introducing more complexity into the query

Index Awareness is best used with single column Primary Keys. Multi-column primary keys should be avoided and/or your DBA should consider ‘function based indexes’ and multi-column indexes.

If an Index Aware object tables list references a table that has RBR, we cannot use any Foreign Key entries. Why? because the RBR in Supervisor will be invalid for the Foreign Key table.


Appendix A – Bugs and Enhancement RequestsPlease find below a list of bugs and enhancement requests that the author has raised during the creation of this document. If you notice the same problems, please contact Business Objects Customer Support at www.techsupport.businessobjects,com and have them raise support cases which are linked to the bugs and enhancement requests listed below.

The more feedback the developers receive, will help guide their coding efforts.

If they don’t know its broke, they wont fix it!

List of bugs noticed in productBug Descriptionn/a Query SQL sometimes not being updated when returning to query

NOTE : could not raise as a bug because could not replicate it consistently1099793 SQL editor dialog - for Primary Key and Foreign Key WHERE clauses, the PARSE

button within this dialog always presents the error ‘Parse failed : Invalid Definition (UNV0023)’ not matter what you enter. Ignore this error for now.

List of Enhancement RequestsER number Description30602 When editing WHERE the dialog says ‘edit SELECT’30615 Allow Primary Key and Foreign Key entries in KEYS tab to be reordered30613 Error on Apply/OK wording “You defined multiple foreign keys without a primary key.

Each foreign key requires at least one primary key. You need to redefine your foreign keys” this wording is a little confusing. Basically you cannot have one Foreign Key without first having a single Primary Key entry.

n/a Work with promptsNote : did not raise it as a ER because I realize the difficulty involved

30612 Work with BOTH and EXCEPT30614 Applying a simple filter to an object in results pane does not trigger Index Aware (only

LOV from ‘list of values’ operand does)30610 within KEYS property of an object, the DETECT button raises an error "No detected key

for objects involving several tables". but the error is also raised when the object does not refer to any table. Suggest the error text is "Keys can only be detected for objects that refer to one and only one table"

30616 Should you try to create more than one Primary Key entry, an existing Primary Key entry will be converted automatically without warning to a Foreign Key. Its SQL will remain unchanged only its type. This could lead to incorrect SQL in the final query if the changed Primary Key entry is not reviewed by the universe developer suggest a dialog is raised warning the user to review the changed PK entry (which is now a FK)


Appendix B – Accessing Index Aware information

There are 2 ways to automate access to universe information:1. Designer SDK2. Business Objects repository tables

Designer SDKThere seems to be no way using the SDK to access Index Aware information i.e. the properties and methods are not exposed.

Business Objects repository tablesThe new repository universe domain table that contains Index Aware information is UNV_OBJECT_KEY but existing UNV_OBJECT_DATA and UNIVERSE_OBJECT tables are also affected.

A universe exists, called ‘combined.unv’ which was written by the author of this document, will include a first attempt at reporting on the KEY information. The additions to the universe are not perfect, its limitations are:

1. outer joins are not handled – so trying to list objects whether they have KEY records or not will not work i.e. only objects with KEY records will be returned

2. returning the SELECT and WHERE clauses in a single Business Objects query will generate 2 SELECT statements because these are on separate contexts. This may not be correct but it works for now


Table UNV_OBJECT_KEY

The following is a piece of SQL that shows the SELECT clauses of any Index Aware objects:

It is suggested you experiment with the 3 tables highlighted and the combined.unv universe.


SELECT base_class.CLS_NAME, UNV_OBJECT.OBJ_NAME, object_key_links.KEY_STATES, object_key_links.KEY_POSITION, object_keys.OBJ_NAME, key_select_clause.OBJ_DATAVALUEFROM UNV_OBJECT_KEY object_key_links, UNV_CLASS base_class, UNV_OBJECT, UNV_OBJECT object_keys, UNV_OBJECT_DATA key_select_clauseWHERE ( UNV_OBJECT.OBJECT_ID=object_key_links.OBJECT_ID and UNV_OBJECT.UNIVERSE_ID=object_key_links.UNIVERSE_ID ) AND ( key_select_clause.OBJ_DATATYPE = 'S' ) AND ( key_select_clause.OBJ_SLICE=1 ) AND ( object_key_links.UNIVERSE_ID=object_keys.UNIVERSE_ID and object_key_links.KEY_ID=object_keys.OBJECT_ID ) AND ( object_keys.OBJECT_ID=key_select_clause.OBJECT_ID and object_keys.UNIVERSE_ID=key_select_clause.UNIVERSE_ID )AND base_class.CLASS_ID=UNV_OBJECT.CLASS_ID AND base_class.UNIVERSE_ID=UNV_OBJECT.UNIVERSE_ID

Appendix C – Outstanding ItemsThe list below are items that need to be clarified or completed in this document. This will not teach the reader anything about Index Awareness, they are simply a reminder to the author to complete them for the next release of the document.

1. The use Primary Key to solve the old problem of hierarchical prompts only returning the value selected and not the path

2. Consider prefixing object value with Primary Key value

3. What happens when you ignore ‘the expression type us not compatible with the object type’

The Secrets of Oracle Bitmap Indexes


Overview

Oracle's two major index types are Bitmap indexes and B-Tree indexes. B-Tree indexes are the regular type that OLTP systems make much use of, and bitmap indexes are a highly compressed index type that tends to be used primarily for data warehouses.

Characteristic of Bitmap Indexes

For columns with very few unique values (low cardinality)

Columns that have low cardinality are good candidates (if the cardinality of a column is <= 0.1 % that the column is ideal candidate, consider also 0.2% – 1%)

Tables that have no or little insert/update are good candidates (static data in warehouse)

Stream of bits: each bit relates to a column value in a single row of table

create bitmap index person_region on person (region);

Row Region North East West South 1 North 1 0 0 0 2 East 0 1 0 0 3 West 0 0 1 0 4 West 0 0 1 0 5 South 0 0 0 1 6 North 1 0 0 0

Advantage of Bitmap Indexes

The advantages of them are that they have a highly compressed structure, making them fast to readstructure makes it possible for the system to combine multiple indexes together for fast access to the underlying table.

Compressed indexes, like bitmap indexes, represent a trade-off between CPU usage and disk space usage. A compressed structure is faster to read from disk but takes additional CPU cycles to decompress for access - an uncompressed structure imposes a lower CPU load but requires more bandwidth to read in a short time.

One belief concerning bitmap indexes is that they are only suitable for indexing low-cardinality data. This is not necessarily true, and bitmap indexes can be used very successfully for indexing columns with many thousands of different values.

Disadvantage of Bitmap Indexes

The reason for confining bitmap indexes to data warehouses is that the overhead on maintaining them is enormous. A modification to a bitmap index requires a great deal more work on behalf of the system than a modification to a b-tree index. In addition, the concurrency for modifications on bitmap indexes is dreadful.

Bitmap Indexes and Deadlocks

Bitmap indexes are not appropriate for tables that have lots of single row DML operations (inserts) and especially concurrent single row DML operations. Deadlock situations are the result of concurrent inserts as the following example shows: Open two windows, one for Session 1 and one for Session 2

Session 1 Session 2

create table bitmap_index_demo ( value varchar2(20));


insert into bitmap_index_demoselect decode(mod(rownum,2),0,'M','F') from all_objects;

create bitmap index bitmap_index_demo_idx on bitmap_index_demo(value);

insert into bitmap_index_demo values ('M');1 row created.

insert into bitmap_index_demo values ('F');1 row created.

insert into bitmap_index_demo values ('F');...... waiting ......

ERROR at line 1:ORA-00060: deadlock detected while waiting for resource

insert into bitmap_index_demo values ('M');...... waiting ......

Bitmap Index vs. B-tree Index: Which and When?by Vivek Sharma

Understanding the proper application of each index can have a big impact on performance.

Conventional wisdom holds that bitmap indexes are most appropriate for columns having low distinct values—such as GENDER, MARITAL_STATUS, and RELATION. This assumption is not completely accurate, however. In reality, a bitmap index is always advisable for systems in which data is not frequently updated by many concurrent systems. In fact, as I'll demonstrate here, a bitmap index on a column with 100-percent unique values (a column candidate for primary key) is as efficient as a B-tree index.

In this article I'll provide some examples, along with optimizer decisions, that are common for both types of indexes on a low-cardinality column as well as a high-cardinality one. These examples will help DBAs understand that the usage of bitmap indexes is not in fact cardinality dependent but rather application dependent.

Comparing the Indexes

There are several disadvantages to using a bitmap index on a unique column—one being the need for sufficient space (and Oracle does not recommend it). However, the size of the bitmap index depends on the cardinality of the column on which it is created as well as the data distribution. Consequently, a bitmap index on the GENDER column will be smaller than a B-tree index on the same column. In contrast, a bitmap index on EMPNO (a candidate for primary key) will be much larger than a B-tree index on this column. But


because fewer users access decision-support systems (DSS) systems than would access transaction-processing (OLTP) ones, resources are not a problem for these applications.

To illustrate this point, I created two tables, TEST_NORMAL and TEST_RANDOM. I inserted one million rows into the TEST_NORMAL table using a PL/SQL block, and then inserted these rows into the TEST_RANDOM table in random order: Create table test_normal (empno number(10), ename varchar2(30), sal number(10));

BeginFor i in 1..1000000Loop Insert into test_normal values(i, dbms_random.string('U',30), dbms_random.value(1000,7000)); If mod(i, 10000) = 0 then Commit; End if;End loop;End;/ Create table test_random as select /*+ append */ * from test_normal order by dbms_random.random;

SQL> select count(*) "Total Rows" from test_normal;

Total Rows---------- 1000000

Elapsed: 00:00:01.09

SQL> select count(distinct empno) "Distinct Values" from test_normal;

Distinct Values--------------- 1000000

Elapsed: 00:00:06.09SQL> select count(*) "Total Rows" from test_random;

Total Rows---------- 1000000

Elapsed: 00:00:03.05SQL> select count(distinct empno) "Distinct Values" from test_random;

Distinct Values--------------- 1000000

Elapsed: 00:00:12.07

Note that the TEST_NORMAL table is organized and that the TEST_RANDOM table is randomly created and hence has disorganized data. In the above table, column EMPNO has 100-percent distinct values and is a good candidate to become a primary key. If you define this column as a primary key, you will create a B-tree index and not a bitmap index because Oracle does not support bitmap primary key indexes.

To analyze the behavior of these indexes, we will perform the following steps:

1. On TEST_NORMAL:


A. Create a bitmap index on the EMPNO column and execute some queries with equality predicates.

B. Create a B-tree index on the EMPNO column, execute some queries with equality predicates, and compare the logical and physical I/Os done by the queries to fetch the results for different sets of values.

2. On TEST_RANDOM:

A. Same as Step 1A.

B. Same as Step 1B.

3. On TEST_NORMAL:

A. Same as Step 1A, except that the queries are executed within a range of predicates.

B. Same as Step 1B, except that the queries are executed within a range of predicates. Now compare the statistics.

4. On TEST_RANDOM:

A. Same as Step 3A.

B. Same as Step 3B.

5. On TEST_NORMAL:

A. Create a bitmap index on the SAL column, and then execute some queries with equality predicates and some with range predicates.

B. Create a B-tree index on the SAL column, and then execute some queries with equality predicates and some with range predicates (same set of values as in Step 5A). Compare the I/Os done by the queries to fetch the results.

6. Add a GENDER column to both of the tables, and update the column with three possible values: M for male, F for female, and null for N/A. This column is updated with these values based on some condition.

7. Create a bitmap index on this column, and then execute some queries with equality predicates.

8. Create a B-tree index on the GENDER column, and then execute some queries with equality predicates. Compare to results from Step 7.

Steps 1 to 4 involve a high-cardinality (100-percent distinct) column, Step 5 a normal-cardinality column, and Steps 7 and 8 a low-cardinality column.

Step 1A (on TEST_NORMAL)

In this step, we will create a bitmap index on the TEST_NORMAL table and then check for the size of this index, its clustering factor, and the size of the table. Then we will run some queries with equality predicates and note the I/Os of these queries using this bitmap index. SQL> create bitmap index normal_empno_bmx on test_normal(empno);

Index created.

Elapsed: 00:00:29.06


SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns;

Table analyzed.

Elapsed: 00:00:19.01SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB" 2 from user_segments 3* where segment_name in ('TEST_NORMAL','NORMAL_EMPNO_BMX'); SEGMENT_NAME Size in MB------------------------------------ ---------------TEST_NORMAL 50NORMAL_EMPNO_BMX 28

Elapsed: 00:00:02.00SQL> select index_name, clustering_factor from user_indexes;

INDEX_NAME CLUSTERING_FACTOR------------------------------ ---------------------------------NORMAL_EMPNO_BMX 1000000

Elapsed: 00:00:00.00

You can see in the preceding table that the size of the index is 28MB and that the clustering factor is equal to the number of rows in the table. Now let's execute the queries with equality predicates for different sets of values: SQL> set autotrace onlySQL> select * from test_normal where empno=&empno;Enter value for empno: 1000old 1: select * from test_normal where empno=&empnonew 1: select * from test_normal where empno=1000

Elapsed: 00:00:00.01

Execution Plan---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=4 Car d=1 Bytes=34) 2 1 BITMAP CONVERSION (TO ROWIDS) 3 2 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_EMPNO_BMX'

Statistics---------------------------------------------------------- 0 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 515 bytes sent via SQL*Net to client 499 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed

Step 1B (on TEST_NORMAL)

Now we will drop this bitmap index and create a B-tree index on the EMPNO column. As before, we will check for the size of the index and its clustering factor and execute the same queries for the same set of values, to compare the I/Os. SQL> drop index NORMAL_EMPNO_BMX;


Index dropped.

SQL> create index normal_empno_idx on test_normal(empno);

Index created.


Table analyzed.

SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB" 2 from user_segments 3 where segment_name in ('TEST_NORMAL','NORMAL_EMPNO_IDX');

SEGMENT_NAME Size in MB---------------------------------- ---------------TEST_NORMAL 50NORMAL_EMPNO_IDX 18

SQL> select index_name, clustering_factor from user_indexes;

INDEX_NAME CLUSTERING_FACTOR---------------------------------- ----------------------------------NORMAL_EMPNO_IDX 6210

It is clear in this table that the B-tree index is smaller than the bitmap index on the EMPNO column. The clustering factor of the B-tree index is much nearer to the number of blocks in a table; for that reason, the B-tree index is efficient for range predicate queries.

Now we'll run the same queries for the same set of values, using our B-tree index. SQL> set autot traceSQL> select * from test_normal where empno=&empno;Enter value for empno: 1000old 1: select * from test_normal where empno=&empnonew 1: select * from test_normal where empno=1000

Elapsed: 00:00:00.01

Execution Plan---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=4 Car d=1 Bytes=34) 2 1 INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_IDX' (NON-UNIQUE) (C ost=3 Card=1)

Statistics---------------------------------------------------------- 29 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 515 bytes sent via SQL*Net to client 499 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processedAs you can see, when the queries are executed for different set of values, the number of consistent gets and physical reads are identical for bitmap and B-tree indexes on a 100-percent unique column.


BITMAP

EMPNO

B-TREE

Consistent Reads

Physical Reads

Consistent Reads

Physical Reads

5 0 1000 5 0

5 2 2398 5 2

5 2 8545 5 2

5 2 98008 5 2

5 2 85342 5 2

5 2 128444 5 2

5 2 858 5 2

Step 2A (on TEST_RANDOM)

Now we'll perform the same experiment on TEST_RANDOM: SQL> create bitmap index random_empno_bmx on test_random(empno);

Index created.

SQL> analyze table test_random compute statistics for table for all indexes for all indexed columns;

Table analyzed.

SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB" 2 from user_segments 3* where segment_name in ('TEST_RANDOM','RANDOM_EMPNO_BMX'); SEGMENT_NAME Size in MB------------------------------------ ---------------TEST_RANDOM 50RANDOM_EMPNO_BMX 28


INDEX_NAME CLUSTERING_FACTOR------------------------------ ---------------------------------RANDOM_EMPNO_BMX 1000000

Again, the statistics (size and clustering factor) are identical to those of the index on the TEST_NORMAL table: SQL> select * from test_random where empno=&empno;Enter value for empno: 1000old 1: select * from test_random where empno=&empnonew 1: select * from test_random where empno=1000

Elapsed: 00:00:00.01

Execution Plan---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=4 Card=1 Bytes=34) 2 1 BITMAP CONVERSION (TO ROWIDS) 3 2 BITMAP INDEX (SINGLE VALUE) OF 'RANDOM_EMPNO_BMX'

Statistics----------------------------------------------------------


0 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 515 bytes sent via SQL*Net to client 499 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed

Step 2B (on TEST_RANDOM)

Now, as in Step 1B, we will drop the bitmap index and create a B-tree index on the EMPNO column. SQL> drop index RANDOM_EMPNO_BMX;

Index dropped.

SQL> create index random_empno_idx on test_random(empno);

Index created.

SQL> analyze table test_random compute statistics for table for all indexes for all indexed columns;

Table analyzed.

SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB" 2 from user_segments 3 where segment_name in ('TEST_RANDOM','RANDOM_EMPNO_IDX');

SEGMENT_NAME Size in MB---------------------------------- ---------------TEST_RANDOM 50RANDOM_EMPNO_IDX 18


INDEX_NAME CLUSTERING_FACTOR---------------------------------- ----------------------------------RANDOM_EMPNO_IDX 999830

This table shows that the size of the index is equal to the size of this index on TEST_NORMAL table but the clustering factor is much nearer to the number of rows, which makes this index inefficient for range predicate queries (which we'll see in Step 4). This clustering factor will not affect the equality predicate queries because the rows have 100-percent distinct values and the number of rows per key is 1.

Now let's run the queries with equality predicates and the same set of values. SQL> select * from test_random where empno=&empno;Enter value for empno: 1000old 1: select * from test_random where empno=&empnonew 1: select * from test_random where empno=1000

Elapsed: 00:00:00.01

Execution Plan---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=4 Card=1 Bytes=34)


2 1 INDEX (RANGE SCAN) OF 'RANDOM_EMPNO_IDX' (NON-UNIQUE) (Cost=3 Card=1)


Again, the results are almost identical to those in Steps 1A and 1B. The data distribution did not affect the amount of consistent gets and physical reads for a unique column.


In this step, we will create the bitmap index (similar to Step 1A). We know the size and the clustering factor of the index, which equals the number of rows in the table. Now let's run some queries with range predicates. SQL> select * from test_normal where empno between &range1 and &range2;Enter value for range1: 1Enter value for range2: 2300old 1: select * from test_normal where empno between &range1 and &range2new 1: select * from test_normal where empno between 1 and 2300

2300 rows selected.

Elapsed: 00:00:00.03

Execution Plan---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=451 Card=2299 Bytes=78166) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=451 Card=2299 Bytes=78166) 2 1 BITMAP CONVERSION (TO ROWIDS) 3 2 BITMAP INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_BMX'


Step 3B (on TEST_NORMAL)

In this step, we'll execute the queries against the TEST_NORMAL table with a B-tree index on it. SQL> select * from test_normal where empno between &range1 and &range2;Enter value for range1: 1


Enter value for range2: 2300old 1: select * from test_normal where empno between &range1 and &range2new 1: select * from test_normal where empno between 1 and 2300

2300 rows selected.

Elapsed: 00:00:00.02

Execution Plan---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=23 Card=2299 Bytes=78166) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=23 Card=2299 Bytes=78166) 2 1 INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_IDX' (NON-UNIQUE) (Cost=8 Card=2299)

Statistics---------------------------------------------------------- 0 recursive calls 0 db block gets 329 consistent gets 15 physical reads 0 redo size 111416 bytes sent via SQL*Net to client 2182 bytes received via SQL*Net from client 155 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 2300 rows processedWhen these queries are executed for different sets of ranges, the results below show:

BITMAPEMPNO (Range)

B-TREE

Consistent Reads

Physical Reads

Consistent Reads

Physical Reads

331 0 1-2300 329 0

285 0 8-1980 283 0

346 19 1850-4250 344 16

427 31 28888-31850 424 28

371 27 82900-85478 367 23

2157 149 984888-1000000 2139 35

As you can see, the number of consistent gets and physical reads with both indexes is again nearly identical. The last range (984888-1000000) returned almost 15,000 rows, which was the maximum number of rows fetched for all the ranges given above. So when we asked for a full table scan (by giving the hint /*+ full(test_normal) */ ), the consistent read and physical read counts were 7,239 and 5,663, respectively.

Step 4A (on TEST_RANDOM)

In this step, we will run the queries with range predicates on the TEST_RANDOM table with bitmap index and check for consistent gets and physical reads. Here you'll see the impact of the clustering factor. SQL>select * from test_random where empno between &range1 and &range2;Enter value for range1: 1Enter value for range2: 2300old 1: select * from test_random where empno between &range1 and &range2new 1: select * from test_random where empno between 1 and 2300


2300 rows selected.

Elapsed: 00:00:08.01

Execution Plan---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=453 Card=2299 Bytes=78166) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=453 Card=2299 Bytes=78166) 2 1 BITMAP CONVERSION (TO ROWIDS) 3 2 BITMAP INDEX (RANGE SCAN) OF 'RANDOM_EMPNO_BMX'


Step 4B (on TEST_RANDOM)

In this step, we will execute the range predicate queries on TEST_RANDOM with a B-tree index on it. Recall that the clustering factor of this index was very close to the number of rows in a table (and thus inefficient). Here's what the optimizer has to say about that: SQL> select * from test_random where empno between &range1 and &range2;Enter value for range1: 1Enter value for range2: 2300old 1: select * from test_random where empno between &range1 and &range2new 1: select * from test_random where empno between 1 and 2300

2300 rows selected.

Elapsed: 00:00:03.04

Execution Plan---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=613 Card=2299 Bytes=78166) 1 0 TABLE ACCESS (FULL) OF 'TEST_RANDOM' (Cost=613 Card=2299 Bytes=78166)


The optimizer opted for a full table scan rather than using the index because of the clustering factor:


BITMAPEMPNO (Range)

B-TREE

Consistent Reads

Physical Reads

Consistent Reads

Physical Reads

2463 1200 1-2300 6415 4910

2114 31 8-1980 6389 4910

2572 1135 1850-4250 6418 4909

3173 1620 28888-31850 6456 4909

2762 1358 82900-85478 6431 4909

7254 3329 984888-1000000 7254 4909

For the last range (984888-1000000) only, the optimizer opted for a full table scan for the bitmap index, whereas for all ranges, it opted for a full table scan for the B-tree index. This disparity is due to the clustering factor: The optimizer does not consider the value of the clustering factor when generating execution plans using a bitmap index, whereas for a B-tree index, it does. In this scenario, the bitmap index performs more efficiently than the B-tree index.

The following steps reveal more interesting facts about these indexes.


Create a bitmap index on the SAL column of the TEST_NORMAL table. This column has normal cardinality. SQL> create bitmap index normal_sal_bmx on test_normal(sal);

Index created.


Table analyzed.

Now let's get the size of the index and the clustering factor. SQL>select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB" 2* from user_segments 3* where segment_name in ('TEST_NORMAL','NORMAL_SAL_BMX');

SEGMENT_NAME Size in MB------------------------------ --------------TEST_NORMAL 50NORMAL_SAL_BMX 4


INDEX_NAME CLUSTERING_FACTOR------------------------------ ----------------------------------NORMAL_SAL_BMX 6001

Now for the queries. First run them with equality predicates: SQL> set autot traceSQL> select * from test_normal where sal=&sal;Enter value for sal: 1869old 1: select * from test_normal where sal=&salnew 1: select * from test_normal where sal=1869

164 rows selected.


Elapsed: 00:00:00.08

Execution Plan---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=39 Card=168 Bytes=4032) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=39 Card=168 Bytes=4032) 2 1 BITMAP CONVERSION (TO ROWIDS) 3 2 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'


and then with range predicates: SQL> select * from test_normal where sal between &sal1 and &sal2;Enter value for sal1: 1500Enter value for sal2: 2000old 1: select * from test_normal where sal between &sal1 and &sal2new 1: select * from test_normal where sal between 1500 and 2000

83743 rows selected.

Elapsed: 00:00:05.00

Execution Plan---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=83376 Bytes =2001024) 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=83376 Bytes=2001024)


Now drop the bitmap index and create a B-tree index on TEST_NORMAL. SQL> create index normal_sal_idx on test_normal(sal);

Index created.


Table analyzed.


Take a look at the size of the index and the clustering factor. SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB" 2 from user_segments 3 where segment_name in ('TEST_NORMAL','NORMAL_SAL_IDX');

SEGMENT_NAME Size in MB------------------------------ ---------------TEST_NORMAL 50NORMAL_SAL_IDX 17


INDEX_NAME CLUSTERING_FACTOR------------------------------ ----------------------------------NORMAL_SAL_IDX 986778

In the above table, you can see that this index is larger than the bitmap index on the same column. The clustering factor is also near the number of rows in this table.

Now for the tests; equality predicates first: SQL> set autot traceSQL> select * from test_normal where sal=&sal;Enter value for sal: 1869old 1: select * from test_normal where sal=&salnew 1: select * from test_normal where sal=1869

164 rows selected.

Elapsed: 00:00:00.01

Execution Plan---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=169 Card=168 Bytes=4032) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=169 Card=168 Bytes=4032) 2 1 INDEX (RANGE SCAN) OF 'NORMAL_SAL_IDX' (NON-UNIQUE) (Cost=3 Card=168)


...and then, range predicates: SQL> select * from test_normal where sal between &sal1 and &sal2;Enter value for sal1: 1500Enter value for sal2: 2000old 1: select * from test_normal where sal between &sal1 and &sal2new 1: select * from test_normal where sal between 1500 and 2000


Elapsed: 00:00:04.03

Execution Plan


---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=83376 Bytes =2001024) 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=83376 Bytes=2001024)

Statistics---------------------------------------------------------- 0 recursive calls 0 db block gets 11778 consistent gets 3891 physical reads 0 redo size 4123553 bytes sent via SQL*Net to client 61901 bytes received via SQL*Net from client 5584 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 83743 rows processedWhen the queries were executed for different set of values, the resulting output, as shown in the tables below, reveals that the numbers of consistent gets and physical reads are identical.

BITMAPSAL

(Equality)

B-TREERows Fetched Consistent

ReadsPhysical Reads

Consistent ReadsPhysical Reads

165 0 1869 177 164

169 163 3548 181 167

174 166 6500 187 172

75 69 7000 81 73

177 163 2500 190 175

BITMAPSAL

(Range)

B-TREERows Fetched Consistent

ReadsPhysical Reads

Consistent Reads

Physical Reads

11778 5850 1500-2000 11778 3891 83743

11765 5468 2000-2500 11765 3879 83328

11753 5471 2500-3000 11753 3884 83318

17309 5472 3000-4000 17309 3892 166999

39398 5454 4000-7000 39398 3973 500520

For range predicates the optimizer opted for a full table scan for all the different set of values—it didn't use the indexes at all—whereas for equality predicates, the optimizer used the indexes. Again, the consistent gets and physical reads are identical.

Consequently, you can conclude that for a normal-cardinality column, the optimizer decisions for the two types of indexes were the same and there were no significant differences between the I/Os.

Step 6 (add a GENDER column)

Before performing the test on a low-cardinality column, let's add a GENDER column to this table and update it with M, F, and null values. SQL> alter table test_normal add GENDER varchar2(1);


Table altered.

SQL> select GENDER, count(*) from test_normal group by GENDER;

S COUNT(*)- ----------F 333769M 499921 166310

3 rows selected.

The size of the bitmap index on this column is around 570KB, as indicated in the table below: SQL> create bitmap index normal_GENDER_bmx on test_normal(GENDER);

Index created.

Elapsed: 00:00:02.08

SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB" 2 from user_segments 3 where segment_name in ('TEST_NORMAL','NORMAL_GENDER_BMX');

SEGMENT_NAME Size in MB------------------------------ ---------------TEST_NORMAL 50NORMAL_GENDER_BMX .5625

2 rows selected.

In contrast, the B-tree index on this column is 13MB in size, which is much bigger than the bitmap index on this column. SQL> create index normal_GENDER_idx on test_normal(GENDER);

Index created.

SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB" 2 from user_segments 3 where segment_name in ('TEST_NORMAL','NORMAL_GENDER_IDX');

SEGMENT_NAME Size in MB------------------------------ ---------------TEST_NORMAL 50NORMAL_GENDER_IDX 13

2 rows selected.

Now, if we execute a query with equality predicates, the optimizer will not make use of this index, be it a bitmap or a B-tree. Rather, it will prefer a full table scan. SQL> select * from test_normal where GENDER is null;


Elapsed: 00:00:06.08

Execution Plan---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=166310 Bytes=4157750) 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=166310 Bytes=4157750)

SQL> select * from test_normal where GENDER='M';



Elapsed: 00:00:16.07

Execution Plan---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=499921 Bytes=12498025) 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=499921Bytes=12498025)

SQL>select * from test_normal where GENDER='F' /


Elapsed: 00:00:12.02

Execution Plan---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=333769 Byte s=8344225) 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=333769 Bytes=8344225)

Conclusions

Now that we understood how the optimizer reacts to these techniques, let's examine a scenario that clearly demonstrates the best respective applications of bitmap indexes and B-tree indexes.

With a bitmap index on the GENDER column in place, create another bitmap index on the SAL column and then execute some queries. The queries will be re-executed with B-tree indexes on these columns.

From the TEST_NORMAL table, you need the employee number of all the male employees whose monthly salaries equal any of the following values: 10001500200025003000350040004500

Thus: SQL>select * from test_normal where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M';

This is a typical data warehouse query, which, of course, you should never execute on an OLTP system. Here are the results with the bitmap index in place on both columns: SQL>select * from test_normal where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M';

1453 rows selected.

Elapsed: 00:00:02.03

Execution Plan---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=198 Card=754 Bytes=18850)


1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=198 Card=754 Bytes=18850) 2 1 BITMAP CONVERSION (TO ROWIDS) 3 2 BITMAP AND 4 3 BITMAP OR 5 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' 6 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' 7 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' 8 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' 9 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' 10 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' 11 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' 12 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' 13 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' 14 3 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_GENDER_BMX'


And with the B-tree index in place: SQL>select * from test_normal where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M';

1453 rows selected.

Elapsed: 00:00:03.01

Execution Plan---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=754 Bytes=18850) 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=754 Bytes=18850)


As you can see here, with the B-tree index, the optimizer opted for a full table scan, whereas in the case of the bitmap index, it used the index to answer the query. You can deduce performance by the number of I/Os required to fetch the result.

In summary, bitmap indexes are best suited for DSS regardless of cardinality for these reasons:


With bitmap indexes, the optimizer can efficiently answer queries that include AND, OR, or XOR. (Oracle supports dynamic B-tree-to-bitmap conversion, but it can be inefficient.)

With bitmaps, the optimizer can answer queries when searching or counting for nulls. Null values are also indexed in bitmap indexes (unlike B-tree indexes).

Most important, bitmap indexes in DSS systems support ad hoc queries, whereas B-tree indexes do not. More specifically, if you have a table with 50 columns and users frequently query on 10 of them—either the combination of all 10 columns or sometimes a single column—creating a B-tree index will be very difficult. If you create 10 bitmap indexes on all these columns, all the queries can be answered by these indexes, whether they are queries on all 10 columns, on 4 or 6 columns out of the 10, or on a single column. The AND_EQUAL hint provides this functionality for B-tree indexes, but no more than five indexes can be used by a query. This limit is not imposed with bitmap indexes.

In contrast, B-tree indexes are well suited for OLTP applications in which users' queries are relatively routine (and well tuned before deployment in production), as opposed to ad hoc queries, which are much less frequent and executed during nonpeak business hours. Because data is frequently updated in and deleted from OLTP applications, bitmap indexes can cause a serious locking problem in these situations.

The data here is fairly clear. Both indexes have a similar purpose: to return results as fast as possible. But your choice of which one to use should depend purely on the type of application, not on the level of cardinality.


Documents

Index Aware